A Microsoft Excel® 2010 Based Tool for Calculating Interobserver Agreement
Azulay, Richard L
2011-01-01
This technical report provides detailed information on the rationale for using a common computer spreadsheet program (Microsoft Excel®) to calculate various forms of interobserver agreement for both continuous and discontinuous data sets. In addition, we provide a brief tutorial on how to use an Excel spreadsheet to automatically compute traditional total count, partial agreement-within-intervals, exact agreement, trial-by-trial, interval-by-interval, scored-interval, unscored-interval, total duration, and mean duration-per-interval interobserver agreement algorithms. We conclude with a discussion of how practitioners may integrate this tool into their clinical work. PMID:22649578
A microsoft excel(®) 2010 based tool for calculating interobserver agreement.
Reed, Derek D; Azulay, Richard L
2011-01-01
This technical report provides detailed information on the rationale for using a common computer spreadsheet program (Microsoft Excel(®)) to calculate various forms of interobserver agreement for both continuous and discontinuous data sets. In addition, we provide a brief tutorial on how to use an Excel spreadsheet to automatically compute traditional total count, partial agreement-within-intervals, exact agreement, trial-by-trial, interval-by-interval, scored-interval, unscored-interval, total duration, and mean duration-per-interval interobserver agreement algorithms. We conclude with a discussion of how practitioners may integrate this tool into their clinical work.
An overview of clinical tools used to assess neonatal abstinence syndrome.
Orlando, Susan
2014-01-01
Several clinical tools have been developed to quantify the severity of withdrawal signs and symptoms exhibited by infants born to substance-using mothers. Scores from the systematic assessments are used to guide treatment of infants with moderate to severe clinical signs. This article provides an overview of published assessment tools developed for infants with neonatal abstinence syndrome. Nurses caring for infants at risk for neonatal abstinence syndrome should be knowledgeable about the tools used to evaluate these infants and guide their treatment. The ideal assessment tool should be published and include item definitions and a protocol for administering the tool. Nurses need education and training to achieve competency and interobserver reliability in the use of a selected tool. Tool-specific materials should be used to standardize training and improve accuracy in assessments. Competent and knowledgeable nurses play a critical role in improving outcomes for infants with neonatal abstinence syndrome.
Chang, Shang-Jen; Yang, Stephen S D
2008-12-01
To evaluate the inter-observer and intra-observer agreement on the interpretation of uroflowmetry curves of children. Healthy kindergarten children were enrolled for evaluation of uroflowmetry. Uroflowmetry curves were classified as bell-shaped, tower, plateau, staccato and interrupted. Only the bell-shaped curves were regarded as normal. Two urodynamists evaluated the curves independently after reviewing the definitions of the different types of uroflowmetry curve. The senior urodynamist evaluated the curves twice 3 months apart. The final conclusion was made when consensus was reached. Agreement among observers was analyzed using kappa statistics. Of 190 uroflowmetry curves eligible for analysis, the intra-observer agreement in interpreting each type of curve and interpreting normalcy vs abnormality was good (kappa=0.71 and 0.68, respectively). Very good inter-observer agreement (kappa=0.81) on normalcy and good inter-observer agreement (kappa=0.73) on types of uroflowmetry were observed. Poor inter-observer agreement existed on the classification of specific types of abnormal uroflowmetry curves (kappa=0.07). Uroflowmetry is a good screening tool for normalcy of kindergarten children, while not a good tool to define the specific types of abnormal uroflowmetry.
Hobbelen, Johannes S M; Koopmans, Raymond T C M; Verhey, Frans R J; Habraken, Kitty M; de Bie, Rob A
2008-08-01
Paratonia is one of the associated movement disorders characteristic of dementia. The aim of this study was to develop an assessment tool (the Paratonia Assessment Instrument, PAI), based on the new consensus definition of paratonia. An additional aim was to investigate the reliability and validity of the PAI. A three-phase cross-sectional survey was conducted. In the first two phases, the PAI was developed and validated. In the third phase, the inter-observer reliability and feasibility of the instrument was tested. The original PAI consisted of five criteria that all needed to be met in order to make the diagnosis. On the basis of a qualitative analysis, one criterion was reformulated and another was removed. Following this, inter-observer reliability between the two assessors resulted in an improvement of Cohen's kappa from 0.532 in the initial phase to 0.677 in the second phase. This improvement was substantiated in the third phase by two independent assessors with Cohen's kappa ranging from 0.625 to 1. The PAI is a reliable and valid assessment tool for diagnosing paratonia in elderly people with dementia that can be applied easily in daily practice.
Validity and inter-observer reliability of subjective hand-arm vibration assessments.
Coenen, Pieter; Formanoy, Margriet; Douwes, Marjolein; Bosch, Tim; de Kraker, Heleen
2014-07-01
Exposure to mechanical vibrations at work (e.g., due to handling powered tools) is a potential occupational risk as it may cause upper extremity complaints. However, reliable and valid assessment methods for vibration exposure at work are lacking. Measuring hand-arm vibration objectively is often difficult and expensive, while often used information provided by manufacturers lacks detail. Therefore, a subjective hand-arm vibration assessment method was tested on validity and inter-observer reliability. In an experimental protocol, sixteen tasks handling powered tools were executed by two workers. Hand-arm vibration was assessed subjectively by 16 observers according to the proposed subjective assessment method. As a gold standard reference, hand-arm vibration was measured objectively using a vibration measurement device. Weighted κ's were calculated to assess validity, intra-class-correlation coefficients (ICCs) were calculated to assess inter-observer reliability. Inter-observer reliability of the subjective assessments depicting the agreement among observers can be expressed by an ICC of 0.708 (0.511-0.873). The validity of the subjective assessments as compared to the gold-standard reference can be expressed by a weighted κ of 0.535 (0.285-0.785). Besides, the percentage of exact agreement of the subjective assessment compared to the objective measurement was relatively low (i.e., 52% of all tasks). This study shows that subjectively assessed hand-arm vibrations are fairly reliable among observers and moderately valid. This assessment method is a first attempt to use subjective risk assessments of hand-arm vibration. Although, this assessment method can benefit from some future improvement, it can be of use in future studies and in field-based ergonomic assessments. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
A Probabilistic Method for Estimation of Bowel Wall Thickness in MR Colonography
Menys, Alex; Jaffer, Asif; Bhatnagar, Gauraang; Punwani, Shonit; Atkinson, David; Halligan, Steve; Hawkes, David J.; Taylor, Stuart A.
2017-01-01
MRI has recently been applied as a tool to quantitatively evaluate the response to therapy in patients with Crohn’s disease, and is the preferred choice for repeated imaging. Bowel wall thickness on MRI is an important biomarker of underlying inflammatory activity, being abnormally increased in the acute phase and reducing in response to successful therapy; however, a poor level of interobserver agreement of measured thickness is reported and therefore a system for accurate, robust and reproducible measurements is desirable. We propose a novel method for estimating bowel wall-thickness to improve the poor interobserver agreement of the manual procedure. We show that the variability of wall thickness measurement between the algorithm and observer measurements (0.25mm ± 0.81mm) has differences which are similar to observer variability (0.16mm ± 0.64mm). PMID:28072831
Training improves interobserver reliability for the diagnosis of scaphoid fracture displacement.
Buijze, Geert A; Guitton, Thierry G; van Dijk, C Niek; Ring, David
2012-07-01
The diagnosis of displacement in scaphoid fractures is notorious for poor interobserver reliability. We tested whether training can improve interobserver reliability and sensitivity, specificity, and accuracy for the diagnosis of scaphoid fracture displacement on radiographs and CT scans. Sixty-four orthopaedic surgeons rated a set of radiographs and CT scans of 10 displaced and 10 nondisplaced scaphoid fractures for the presence of displacement, using a web-based rating application. Before rating, observers were randomized to a training group (34 observers) and a nontraining group (30 observers). The training group received an online training module before the rating session, and the nontraining group did not. Interobserver reliability for training and nontraining was assessed by Siegel's multirater kappa and the Z-test was used to test for significance. There was a small, but significant difference in the interobserver reliability for displacement ratings in favor of the training group compared with the nontraining group. Ratings of radiographs and CT scans combined resulted in moderate agreement for both groups. The average sensitivity, specificity, and accuracy of diagnosing displacement of scaphoid fractures were, respectively, 83%, 85%, and 84% for the nontraining group and 87%, 86%, and 87% for the training group. Assuming a 5% prevalence of fracture displacement, the positive predictive value was 0.23 in the nontraining group and 0.25 in the training group. The negative predictive value was 0.99 in both groups. Our results suggest training can improve interobserver reliability and sensitivity, specificity and accuracy for the diagnosis of scaphoid fracture displacement, but the improvements are slight. These findings are encouraging for future research regarding interobserver variation and how to reduce it further.
Carroll, Kristen L; Murray, Kathleen A; MacLeod, Lynne M; Hennessey, Theresa A; Woiczik, Marcella R; Roach, James W
2011-06-01
Numerous studies underscore the poor intraobserver and interobserver reliability of both the center edge angle (CEA) and the Severin classification using plain film measurements. In this study, experienced observers applied a computer-assisted measurement program to determine the CEA in digital pelvic radiographs of adults who had been previously treated for dysplasia of the hip (DDH). Using a teaching aid/algorithm of the Severin classification, the observers then assigned a Severin rating to these hips. Intraobserver and interobserver errors were then calculated on both the CEA measurements and the Severin classifications. Four pediatric orthopaedic surgeons and 1 pediatric radiologist calculated the CEAs using the OrthoView TM planning system and then determined the Severin classification on 41 blinded digital pelvic radiographs. The radiographs were evaluated by each examiner twice, with evaluations separated by 2 months. All examiners reviewed a Severin classification algorithm before making their Severin assignments. The intraobserver and interobserver reliability for both the CEA and the Severin classification were calculated using the interclass correlation coefficients and Cohen and Fleiss κ scores, respectively. The intraobserver and interobserver reliability for CEA measurement was moderate to almost perfect. When we separated the Severin classification into 3 clinically relevant groups of good (Severin I and II), dysplastic (Severin III), and poor (Severin IV and above), our interobserver reliability neared almost perfect. The Severin classification is an extremely useful and oft-used radiographic measure for the success of DDH treatment. Our research found digital radiography, computer-aided measurement tools, the use of a Severin algorithm, and separating the Severin classification into 3 clinically relevant groups significantly increased the intraobserver and interobserver reliability of both the CEA and Severin classification. This finding will assist future studies using the CEA and Severin classification in the radiographic assessment of DDH treatment outcomes.
Shah, Rajal B; Leandro, Gioacchino; Romerocaces, Gloria; Bentley, James; Yoon, Jiyoon; Mendrinos, Savvas; Tadros, Yousef; Tian, Wei; Lash, Richard
2016-10-01
One of the major goals of an anatomic pathology laboratory quality program is to minimize unwarranted diagnostic variability and equivocal reporting. This study evaluated the utility of Miraca Life Sciences' "Disease-Focused Diagnostic Review" (DFDR) quality program in improving interobserver diagnostic reproducibility associated with classification of "atypical glands suspicious for adenocarcinoma" (ATYP) in prostate biopsies. Seventy-one selected prostate biopsies with a focus of ATYP were reviewed by 8 pathologists. Participants were blinded to the original diagnosis and were first asked to classify the ATYP as benign, atypical, or limited adenocarcinoma. DFDR comprised a "theoretical consensus" (in which pathologists first reached consensus on the morphological features they considered relevant for the diagnosis of limited prostatic adenocarcinoma), a didactic review including relevant literature, and "practical consensus" (pathologists performed joint microscopic sessions, reconciling each other's observations and positions evaluating a separate unique slide set). Participants were finally asked to reclassify the original 71 ATYP cases based on knowledge gleaned from DFDR. Pre- and post-DFDR interobserver reproducibility of overall diagnostic agreement was assessed. Interobserver reproducibility measured by Fleiss κ values of pre- and post-DFDR was 0.36 and 0.59, respectively (P=.006). Post-DFDR, there were significant improvement for "100% concordance" (P=.011) and reduction for "no consensus" (P=.0004) categories. Despite a lower pre-DFDR reproducibility for non-uropathology fellowship-trained (n=3, κ=0.38) versus uropathology fellowship-trained (n=5, κ=0.43) pathologists, both groups achieved similarly high post-DFDR κ levels (κ=0.58 and 0.56, respectively). DFDR represents an effective tool to formally achieve diagnostic consensus and reduce variability associated with critical diagnoses in an anatomic pathology practice. Copyright © 2016 Elsevier Inc. All rights reserved.
Engelberg, Jesse A; Retallack, Hanna; Balassanian, Ronald; Dowsett, Mitchell; Zabaglo, Lila; Ram, Arishneel A; Apple, Sophia K; Bishop, John W; Borowsky, Alexander D; Carpenter, Philip M; Chen, Yunn-Yi; Datnow, Brian; Elson, Sarah; Hasteh, Farnaz; Lin, Fritz; Moatamed, Neda A; Zhang, Yanhong; Cardiff, Robert D
2015-11-01
Hormone receptor status is an integral component of decision-making in breast cancer management. IHC4 score is an algorithm that combines hormone receptor, HER2, and Ki-67 status to provide a semiquantitative prognostic score for breast cancer. High accuracy and low interobserver variance are important to ensure the score is accurately calculated; however, few previous efforts have been made to measure or decrease interobserver variance. We developed a Web-based training tool, called "Score the Core" (STC) using tissue microarrays to train pathologists to visually score estrogen receptor (using the 300-point H score), progesterone receptor (percent positive), and Ki-67 (percent positive). STC used a reference score calculated from a reproducible manual counting method. Pathologists in the Athena Breast Health Network and pathology residents at associated institutions completed the exercise. By using STC, pathologists improved their estrogen receptor H score and progesterone receptor and Ki-67 proportion assessment and demonstrated a good correlation between pathologist and reference scores. In addition, we collected information about pathologist performance that allowed us to compare individual pathologists and measures of agreement. Pathologists' assessment of the proportion of positive cells was closer to the reference than their assessment of the relative intensity of positive cells. Careful training and assessment should be used to ensure the accuracy of breast biomarkers. This is particularly important as breast cancer diagnostics become increasingly quantitative and reproducible. Our training tool is a novel approach for pathologist training that can serve as an important component of ongoing quality assessment and can improve the accuracy of breast cancer prognostic biomarkers. Copyright © 2015 Elsevier Inc. All rights reserved.
2011-01-01
Background The aim of this study was to develop a child-specific classification system for long bone fractures and to examine its reliability and validity on the basis of a prospective multicentre study. Methods Using the sequentially developed classification system, three samples of between 30 and 185 paediatric limb fractures from a pool of 2308 fractures documented in two multicenter studies were analysed in a blinded fashion by eight orthopaedic surgeons, on a total of 5 occasions. Intra- and interobserver reliability and accuracy were calculated. Results The reliability improved with successive simplification of the classification. The final version resulted in an overall interobserver agreement of κ = 0.71 with no significant difference between experienced and less experienced raters. Conclusions In conclusion, the evaluation of the newly proposed classification system resulted in a reliable and routinely applicable system, for which training in its proper use may further improve the reliability. It can be recommended as a useful tool for clinical practice and offers the option for developing treatment recommendations and outcome predictions in the future. PMID:21548939
RELIABILITY AND VALIDITY OF A BIOMECHANICALLY BASED ANALYSIS METHOD FOR THE TENNIS SERVE
Kibler, W. Ben; Lamborn, Leah; Smith, Belinda J.; English, Tony; Jacobs, Cale; Uhl, Tim L.
2017-01-01
Background An observational tennis serve analysis (OTSA) tool was developed using previously established body positions from three-dimensional kinematic motion analysis studies. These positions, defined as nodes, have been associated with efficient force production and minimal joint loading. However, the tool has yet to be examined scientifically. Purpose The primary purpose of this investigation was to determine the inter-observer reliability for each node between two health care professionals (HCPs) that developed the OTSA, and secondarily to investigate the validity of the OTSA. Methods Two separate studies were performed to meet these objectives. An inter-observer reliability study preceded the validity study by examining 28 videos of players serving. Two HCPs graded each video and scored the presence or absence of obtaining each node. Discriminant validity was determined in 33 tennis players using video taped records of three first serves. Serve mechanics were graded using the OSTA and categorized players into those with good ( ≥ 5) and poor ( ≤ 4) mechanics. Participants performed a series of field tests to evaluate trunk flexibility, lower extremity and trunk power, and dynamic balance. Results The group with good mechanics demonstrated greater backward trunk flexibility (p=0.02), greater rotational power (p=0.02), and higher single leg countermovement jump (p=0.05). Reliability of the OTSA ranged from K = 0.36-1.0, with the majority of all the nodes displaying substantial reliability (K>0.61). Conclusion This study provides HCPs with a valid and reliable field tool used to assess serve mechanics. Physical characteristics of trunk mobility and power appear to discriminate serve mechanics between players. Future intervention studies are needed to determine if improvement in physical function contribute to improved serve mechanics. Level of Evidence 3 PMID:28593098
Roberson, David W; Kentala, Erna; Forbes, Peter
2005-12-01
The goals of this project were 1) to develop and validate an objective instrument to measure surgical performance at tonsillectomy, 2) to assess its interobserver and interobservation reliability and construct validity, and 3) to select those items with best reliability and most independent information to design a simplified form suitable for routine use in otolaryngology surgical evaluation. Prospective, observational data collection for an educational quality improvement project. The evaluation instrument was based on previous instruments developed in general surgery with input from attending otolaryngologic surgeons and experts in medical education. It was pilot tested and subjected to iterative improvements. After the instrument was finalized, a total of 55 tonsillectomies were observed and scored during academic year 2002 to 2003: 45 cases by residents at different points during their rotation, 5 by fellows, and 5 by faculty. Results were assessed for interobserver reliability, interobservation reliability, and construct validity. Factor analysis was used to identify items with independent information. Interobserver and interobservation reliability was high. On technical items, faculty substantially outperformed fellows, who in turn outperformed residents (P < .0001 for both comparisons). On the "global" scale (overall assessment), residents improved an average of 1 full point (on a 5 point scale) during a 3 month rotation (P = .01). In the subscale of "patient care," results were less clear cut: fellows outperformed residents, who in turn outperformed faculty, but only the fellows to faculty comparison was statistically significant (P = .04), and residents did not clearly improve over time (P = .36). Factor analysis demonstrated that technical items and patient care items factor separately and thus represent separate skill domains in surgery. It is possible to objectively measure surgical skill at tonsillectomy with high reliability and good construct validity. Factor analysis demonstrated that patient care is a distinct domain in surgical skill. Although the interobserver reliability for some patient care items reached statistical significance, it was not high enough for "high stakes testing" purposes. Using reliability and factor analysis results, we propose a simplified instrument for use in evaluating trainees in otolaryngologic surgery.
Krause, Fabian G; Di Silvestro, Matthew; Penner, Murray J; Wing, Kevin J; Glazebrook, Mark A; Daniels, Timothy R; Lau, Johnny T C; Younger, Alastair S E
2012-02-01
End-stage ankle arthritis is operatively treated with numerous designs of total ankle replacement and different techniques for ankle fusion. For superior comparison of these procedures, outcome research requires a classification system to stratify patients appropriately. A postoperative 4-type classification system was designed by 6 fellowship-trained foot and ankle surgeons. Four surgeons reviewed blinded patient profiles and radiographs on 2 occasions to determine the interobserver and intraobserver reliability of the classification. Excellent interobserver reliability (κ = .89) and intraobserver reproducibility (κ = .87) were demonstrated for the postoperative classification system. In conclusion, the postoperative Canadian Orthopaedic Foot and Ankle Society (COFAS) end-stage ankle arthritis classification system appears to be a valid tool to evaluate the outcome of patients operated for end-stage ankle arthritis.
Niglis, L; Collin, P; Dosch, J-C; Meyer, N; Kempf, J-F
2017-10-01
The long-term outcomes of rotator cuff repair are unclear. Recurrent tears are common, although their reported frequency varies depending on the type and interpretation challenges of the imaging method used. The primary objective of this study was to assess the intra- and inter-observer reproducibility of the MRI assessment of rotator cuff repair using the Sugaya classification 10years after surgery. The secondary objective was to determine whether poor reproducibility, if found, could be improved by using a simplified yet clinically relevant classification. Our hypothesis was that reproducibility was limited but could be improved by simplifying the classification. In a retrospective study, we assessed intra- and inter-observer agreement in interpreting 49 magnetic resonance imaging (MRI) scans performed 10years after rotator cuff repair. These 49 scans were taken at random among 609 cases that underwent re-evaluation, with imaging, for the 2015 SoFCOT symposium on 10-year and 20-year clinical and anatomical outcomes of rotator cuff repair for full-thickness tears. Each of three observers read each of the 49 scans on two separate occasions. At each reading, they assessed the supra-spinatus tendon according to the Sugaya classification in five types. Intra-observer agreement for the Sugaya type was substantial (κ=0.64) but inter-observer agreement was only fair (κ=0.39). Agreement improved when the five Sugaya types were collapsed into two categories (1-2-3 and 4-5) (intra-observer κ=0.74 and inter-observer κ=0.68). Using the Sugaya classification to assess post-operative rotator cuff healing was associated with substantial intra-observer and fair inter-observer agreement. A simpler classification into two categories improved agreement while remaining clinically relevant. II, prospective randomised low-power study. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Schellhaas, Barbara; Hammon, Matthias; Strobel, Deike; Pfeifer, Lukas; Kielisch, Christian; Goertz, Ruediger S; Cavallaro, Alexander; Janka, Rolf; Neurath, Markus F; Uder, Michael; Seuss, Hannes
2018-04-19
We compared the interobserver agreement for the recently introduced contrast-enhanced ultrasound (CEUS)-based algorithm CEUS-LI-RADS (Liver Imaging Reporting and Data System) versus the well-established magnetic resonance imaging (MRI)-LI-RADS for non-invasive diagnosis of hepatocellular carcinoma (HCC) in high-risk patients. Focal liver lesions in 50 high-risk patients (mean age 66.2 ± 11.8 years; 39 male) were assessed retrospectively with CEUS and MRI. Two independent observers reviewed CEUS and MRI examinations, separately, classifying observations according to CEUS-LI-RADSv.2016 and MRI-LI-RADSv.2014. Interobserver agreement was assessed with Cohen's kappa. Forty-three lesions were HCCs; two were intrahepatic cholangiocarcinomas; five were benign lesions. Arterial phase hyperenhancement was perceived less frequently with CEUS than with MRI (37/50 / 38/50 lesions = 74%/78% [CEUS; observer 1/observer 2] versus 46/50 / 44/50 lesions = 92%/88% [MRI; observer 1/observer 2]). Washout appearance was observed in 34/50 / 20/50 lesions = 68%/40% with CEUS and 31/50 / 31/50 lesions = 62%/62%) with MRI. Interobserver agreement was moderate for arterial hyperenhancement (ĸ = 0.511/0.565 [CEUS/MRI]) and "washout" (ĸ = 0.490/0.582 [CEUS/MRI]), fair for CEUS-LI-RADS category (ĸ = 0.309) and substantial for MRI-LI-RADS category (ĸ = 0.609). Intermodality agreement was fair for arterial hyperenhancement (ĸ = 0.329), slight to fair for "washout" (ĸ = 0.202) and LI-RADS category (ĸ = 0.218) CONCLUSION: Interobserver agreement is substantial for MRI-LI-RADS and only fair for CEUS-LI-RADS. This is mostly because interobserver agreement in the perception of washout appearance is better in MRI than in CEUS. Further refinement of the LI-RADS algorithms and increasing education and practice may be necessary to improve the concordance between CEUS and MRI for the final LI-RADS categorization. • CEUS-LI-RADS and MRI-LIRADS enable standardized non-invasive diagnosis of HCC in high-risk patients. • With CEUS, interobserver agreement is better for arterial hyperenhancement than for "washout". • Interobserver agreement for major features is moderate for both CEUS and MRI. • Interobserver agreement for LI-RADS category is substantial for MRI, and fair for CEUS. • Interobserver-agreement for CEUS-LI-RADS will presumably improve with ongoing use of the algorithm.
Assessment of colon polyp morphology: Is education effective?
Kim, Jae Hyun; Nam, Kyoung Sik; Kwon, Hye Jung; Choi, Youn Jung; Jung, Kyoungwon; Kim, Sung Eun; Moon, Won; Park, Moo In; Park, Seun Ja
2017-01-01
AIM To determine the inter-observer variability for colon polyp morphology and to identify whether education can improve agreement among observers. METHODS For purposes of the tests, we recorded colonoscopy video clips that included scenes visualizing the polyps. A total of 15 endoscopists and 15 nurses participated in the study. Participants watched 60 video clips of the polyp morphology scenes and then estimated polyp morphology (pre-test). After education for 20 min, participants performed a second test in which the order of 60 video clips was changed (post-test). To determine if the effectiveness of education was sustained, four months later, a third, follow-up test was performed with the same participants. RESULTS The overall Fleiss’ kappa value of the inter-observer agreement was 0.510 in the pre-test, 0.618 in the post-test, and 0.580 in the follow-up test. The overall diagnostic accuracy of the estimation for polyp morphology in the pre-, post-, and follow-up tests was 0.662, 0.797, and 0.761, respectively. After education, the inter-observer agreement and diagnostic accuracy of all participants improved. However, after four months, the inter-observer agreement and diagnostic accuracy of expert groups were markedly decreased, and those of beginner and nurse groups remained similar to pre-test levels. CONCLUSION The education program used in this study can improve inter-observer agreement and diagnostic accuracy in assessing the morphology of colon polyps; it is especially effective when first learning endoscopy. PMID:28974894
Assessment of colon polyp morphology: Is education effective?
Kim, Jae Hyun; Nam, Kyoung Sik; Kwon, Hye Jung; Choi, Youn Jung; Jung, Kyoungwon; Kim, Sung Eun; Moon, Won; Park, Moo In; Park, Seun Ja
2017-09-14
To determine the inter-observer variability for colon polyp morphology and to identify whether education can improve agreement among observers. For purposes of the tests, we recorded colonoscopy video clips that included scenes visualizing the polyps. A total of 15 endoscopists and 15 nurses participated in the study. Participants watched 60 video clips of the polyp morphology scenes and then estimated polyp morphology (pre-test). After education for 20 min, participants performed a second test in which the order of 60 video clips was changed (post-test). To determine if the effectiveness of education was sustained, four months later, a third, follow-up test was performed with the same participants. The overall Fleiss' kappa value of the inter-observer agreement was 0.510 in the pre-test, 0.618 in the post-test, and 0.580 in the follow-up test. The overall diagnostic accuracy of the estimation for polyp morphology in the pre-, post-, and follow-up tests was 0.662, 0.797, and 0.761, respectively. After education, the inter-observer agreement and diagnostic accuracy of all participants improved. However, after four months, the inter-observer agreement and diagnostic accuracy of expert groups were markedly decreased, and those of beginner and nurse groups remained similar to pre-test levels. The education program used in this study can improve inter-observer agreement and diagnostic accuracy in assessing the morphology of colon polyps; it is especially effective when first learning endoscopy.
Kim, Sung Sun; Kook, Myeong-Cherl; Shin, Ok-Ran; Kim, Hee Sung; Bae, Han-Ik; Seo, An Na; Park, Do Youn; Choi, Il Ju; Kim, Young-Il; Nam, Byung Ho; Kim, Sohee
2018-04-01
Intestinal metaplasia and atrophy of the gastric mucosa are associated with Helicobacter pylori infection and are considered premalignant lesions. The updated Sydney system is used for these parameters, but experienced pathologists and consensus processes are required for interobserver agreement. We sought to determine the influence of the consensus process on the assessment of intestinal metaplasia and atrophy. Two study sets were used: consensus and validation. The consensus set was circulated and five gastrointestinal pathologists evaluated them independently using the updated Sydney system. The consensus of the definitions was then determined at the first consensus meeting. The same set was recirculated to determine the effect of the consensus. The second consensus meeting was held to standardise the grading criteria and the validation set was circulated to determine the influence. Two additional circulations were performed to assess the maintainance of consensus and intraobserver variability. Interobserver agreement of intestinal metaplasia and atrophy was improved through the consensus process (intestinal metaplasia: baseline κ = 0.52 versus final κ = 0.68, P = 0.006; atrophy: baseline κ = 0.19 versus final κ = 0.43, P < 0.001). Higher interobserver agreement in atrophy was observed after consensus regarding the definition (pre-consensus: κ = 0.19 versus post-consensus: κ = 0.34, P = 0.001). There was improved interobserver agreement in intestinal metaplasia after standardisation of the grading criteria (pre-standardisation: κ = 0.56 versus post-standardisation: κ = 0.71, P = 0.010). This study suggests that interobserver variability regarding intestinal metaplasia and atrophy may result from lack of a precise definition and fine criteria, and can be reduced by consensus of definition and standardisation of grading criteria. © 2017 John Wiley & Sons Ltd.
Systematic review of methods for quantifying teamwork in the operating theatre
Marshall, D.; Sykes, M.; McCulloch, P.; Shalhoub, J.; Maruthappu, M.
2018-01-01
Background Teamwork in the operating theatre is becoming increasingly recognized as a major factor in clinical outcomes. Many tools have been developed to measure teamwork. Most fall into two categories: self‐assessment by theatre staff and assessment by observers. A critical and comparative analysis of the validity and reliability of these tools is lacking. Methods MEDLINE and Embase databases were searched following PRISMA guidelines. Content validity was assessed using measurements of inter‐rater agreement, predictive validity and multisite reliability, and interobserver reliability using statistical measures of inter‐rater agreement and reliability. Quantitative meta‐analysis was deemed unsuitable. Results Forty‐eight articles were selected for final inclusion; self‐assessment tools were used in 18 and observational tools in 28, and there were two qualitative studies. Self‐assessment of teamwork by profession varied with the profession of the assessor. The most robust self‐assessment tool was the Safety Attitudes Questionnaire (SAQ), although this failed to demonstrate multisite reliability. The most robust observational tool was the Non‐Technical Skills (NOTECHS) system, which demonstrated both test–retest reliability (P > 0·09) and interobserver reliability (Rwg = 0·96). Conclusion Self‐assessment of teamwork by the theatre team was influenced by professional differences. Observational tools, when used by trained observers, circumvented this.
Alyusuf, Raja H; Prasad, Kameshwar; Abdel Satir, Ali M; Abalkhail, Ali A; Arora, Roopa K
2013-01-01
The exponential use of the internet as a learning resource coupled with varied quality of many websites, lead to a need to identify suitable websites for teaching purposes. The aim of this study is to develop and to validate a tool, which evaluates the quality of undergraduate medical educational websites; and apply it to the field of pathology. A tool was devised through several steps of item generation, reduction, weightage, pilot testing, post-pilot modification of the tool and validating the tool. Tool validation included measurement of inter-observer reliability; and generation of criterion related, construct related and content related validity. The validated tool was subsequently tested by applying it to a population of pathology websites. Reliability testing showed a high internal consistency reliability (Cronbach's alpha = 0.92), high inter-observer reliability (Pearson's correlation r = 0.88), intraclass correlation coefficient = 0.85 and κ =0.75. It showed high criterion related, construct related and content related validity. The tool showed moderately high concordance with the gold standard (κ =0.61); 92.2% sensitivity, 67.8% specificity, 75.6% positive predictive value and 88.9% negative predictive value. The validated tool was applied to 278 websites; 29.9% were rated as recommended, 41.0% as recommended with caution and 29.1% as not recommended. A systematic tool was devised to evaluate the quality of websites for medical educational purposes. The tool was shown to yield reliable and valid inferences through its application to pathology websites.
Jones, Nia W; Raine-Fenning, Nick J; Mousa, Hatem A; Bradley, Eileen; Bugg, George J
2011-03-01
Three-dimensional (3-D) power Doppler angiography (3-D-PDA) allows visualisation of Doppler signals within the placenta and their quantification is possible by the generation of vascular indices by the 4-D View software programme. This study aimed to investigate intra- and interobserver reproducibility of 3-D-PDA analysis of stored datasets at varying gestations with the ultimate goal being to develop a tool for predicting placental dysfunction. Women with an uncomplicated, viable singleton pregnancy were scanned at 12, 16 or 20 weeks gestational age groups. 3-D-PDA datasets acquired of the whole placenta were analysed using the VOCAL software processing tool. Each volume was analysed by three observers twice in the A plane. Intra- and interobserver reliability was assessed by intraclass correlation coefficients (ICCs) and Bland Altman plots. At each gestational age group, 20 low risk women were scanned resulting in 60 datasets in total. The ICC demonstrated a high level of measurement reliability at each gestation with intraobserver values >0.90 and interobserver values of >0.6 for the vascular indices. Bland Altman plots also showed high levels of agreement. Systematic bias was seen at 20 weeks in the vascular indices obtained by different observers. This study demonstrates that 3-D-PDA data can be measured reliably by different observers from stored datasets up to 18 weeks gestation. Measurements become less reliable as gestation advances with bias between observers evident at 20 weeks. Copyright © 2011 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Wong, Lih-Ming; Chum, Jia-Min; Maddy, Peter; Chan, Steven T F; Travis, Douglas; Lawrentschuk, Nathan
2010-07-01
Macroscopic hematuria is a common symptom and sign that is challenging to quantify and describe. The degree of hematuria communicated is variable due to health worker experience combined with lack of a reliable grading tool. We produced a reliable, standardized visual scale to describe hematuria severity. Our secondary aim was to validate a new laboratory test to quantify hemoglobin in hematuria specimens. Nurses were surveyed to ascertain current hematuria descriptions. Blood and urine were titrated at varying concentrations and digitally photographed in catheter bag tubing. Photos were processed and printed on transparency paper to create a prototype swatch or card showing light, medium, heavy and old hematuria. Using the swatch 60 samples were rated by nurses and laymen. Interobserver variability was reported using the generalized kappa coefficient of agreement. Specimens were analyzed for hemolysis by measuring optical density at oxyhemoglobin absorption peaks. Interobserver agreement between nurses and laymen was good (kappa = 0.51, p <0.001). Subgroup analysis showed substantial agreement for light hematuria (kappa = 0.71). Overall agreement improved when the moderate (kappa = 0.28) and heavy (kappa = 0.53) hematuria categories were combined (kappa = 0.70). Compared to known blood concentrations the assay of optical density at oxyhemoglobin absorption peaks showed a linear trend. A simple visual scale to grade and communicate hematuria with adequate interobserver agreement is feasible. The test for optical density at oxyhemoglobin absorption peaks is a new method, validated in our study, to quantify hemoglobin in a hematuria specimen. Copyright (c) 2010 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Acar, Nihat; Karakasli, Ahmet; Karaarslan, Ahmet; Mas, Nermin Ng; Hapa, Onur
2017-01-01
Volumetric measurements of benign tumors enable surgeons to trace volume changes during follow-up periods. For a volumetric measurement technique to be applicable, it should be easy, rapid, and inexpensive and should carry a high interobserver reliability. We aimed to assess the interobserver reliability of a volumetric measurement technique using the Cavalier's principle of stereological methods. The computerized tomography (CT) of 15 patients with a histopathologically confirmed diagnosis of enchondroma with variant tumor sizes and localizations was retrospectively reviewed for interobserver reliability evaluation of the volumetric stereological measurement with the Cavalier's principle, V = t × [((SU) × d) /SL]2 × Σ P. The volumes of the 15 tumors collected by the observers are demonstrated in Table 1. There was no statistical significance between the first and second observers ( p = 0.000 and intraclass correlation coefficient = 0.970) and between the first and third observers ( p = 0.000 and intraclass correlation coefficient = 0.981). No statistical significance was detected between the second and third observers ( p = 0.000 and intraclass correlation coefficient = 0.976). The Cavalier's principle with the stereological technique using the CT scans is an easy, rapid, and inexpensive technique in volumetric evaluation of enchondromas with a trustable interobserver reliability.
Alyusuf, Raja H.; Prasad, Kameshwar; Abdel Satir, Ali M.; Abalkhail, Ali A.; Arora, Roopa K.
2013-01-01
Background: The exponential use of the internet as a learning resource coupled with varied quality of many websites, lead to a need to identify suitable websites for teaching purposes. Aim: The aim of this study is to develop and to validate a tool, which evaluates the quality of undergraduate medical educational websites; and apply it to the field of pathology. Methods: A tool was devised through several steps of item generation, reduction, weightage, pilot testing, post-pilot modification of the tool and validating the tool. Tool validation included measurement of inter-observer reliability; and generation of criterion related, construct related and content related validity. The validated tool was subsequently tested by applying it to a population of pathology websites. Results and Discussion: Reliability testing showed a high internal consistency reliability (Cronbach's alpha = 0.92), high inter-observer reliability (Pearson's correlation r = 0.88), intraclass correlation coefficient = 0.85 and κ =0.75. It showed high criterion related, construct related and content related validity. The tool showed moderately high concordance with the gold standard (κ =0.61); 92.2% sensitivity, 67.8% specificity, 75.6% positive predictive value and 88.9% negative predictive value. The validated tool was applied to 278 websites; 29.9% were rated as recommended, 41.0% as recommended with caution and 29.1% as not recommended. Conclusion: A systematic tool was devised to evaluate the quality of websites for medical educational purposes. The tool was shown to yield reliable and valid inferences through its application to pathology websites. PMID:24392243
Roma, Andres A; Liu, Xiuli; Patil, Deepa T; Xie, Hao; Allende, Daniela
2017-07-01
To analyze interobserver reproducibility and compare practice patterns between academic and community settings of Lower Anogenital Squamous Terminology (LAST). In total, 132 anal biopsy slides were revised as well as p16 immunostains. LAST was used in 49% of cases (academic center, 68%; satellite hospitals [community practice setting], 32%). After pathology review and consensus interpretation, 23 (17%) case diagnoses were reclassified: eight (34.8%) cases (benign or low-grade squamous intraepithelial lesion [LSIL]) were upgraded to high-grade squamous intraepithelial lesion (HSIL) (p16 confirmed ordered during review); four (17.4%) cases originally classified as HSIL were downgraded to LSIL (p16 originally ordered in one case). There was no significant difference in discrepancies between original and consensus diagnosis in the community vs academic setting or by subspecialty (gynecological vs gastrointestinal). Overall interobserver agreement among reviewers was substantial (κ = 0.63) and improved with the use of p16 immunostain in challenging cases (κ = 0.71; P < .001). This new terminology is not yet uniformly used by pathologists in anal/perianal biopsy specimens; this two-tier system has a good interobserver agreement and is further improved with p16 use in appropriate cases. © American Society for Clinical Pathology, 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Reliability of joint count assessment in rheumatoid arthritis: a systematic literature review.
Cheung, Peter P; Gossec, Laure; Mak, Anselm; March, Lyn
2014-06-01
Joint counts are central to the assessment of rheumatoid arthritis (RA) but reliability is an issue. To evaluate the reliability and agreement of joint counts (intra-observer and inter-observer) by health care professionals (physicians, nurses, and metrologists) and patients in RA, and the impact of training and standardization on joint count reliability through a systematic literature review. Articles reporting joint count reliability or agreement in RA in PubMed, EMBase, and the Cochrane library between 1960 and 2012 were selected. Data were extracted regarding tender joint counts (TJCs) and swollen joint counts (SJCs) derived by physicians, metrologists, or patients for intra-observer and inter-observer reliability. In addition, methods and effects of training or standardization were extracted. Statistics expressing reliability such as intraclass correlation coefficients (ICCs) were extracted. Data analysis was primarily descriptive due to high heterogeneity. Twenty-eight studies on health care professionals (HCP) and 20 studies on patients were included. Intra-observer reliability for TJCs and SJCs was good for HCPs and patients (range of ICC: 0.49-0.98). Inter-observer reliability between HCPs for TJCs was higher than for SJCs (range of ICC: 0.64-0.88 vs. 0.29-0.98). Patient inter-observer reliability with HCPs as comparators was better for TJCs (range of ICC: 0.31-0.91) compared to SJCs (0.16-0.64). Nine studies (7 with HCPs and 2 with patients) evaluated consensus or training, with improvement in reliability of TJCs but conflicting evidence for SJCs. Intra- and inter-observer reliability was high for TJCs for HCPs and patients: among all groups, reliability was better for TJCs than SJCs. Inter-observer reliability of SJCs was poorer for patients than HCPs. Data were inconclusive regarding the potential for training to improve SJC reliability. Overall, the results support further evaluation for patient-reported joint counts as an outcome measure. © 2013 Published by Elsevier Inc.
Does the Modified Gartland Classification Clarify Decision Making?
Leung, Sophia; Paryavi, Ebrahim; Herman, Martin J; Sponseller, Paul D; Abzug, Joshua M
2018-01-01
The modified Gartland classification system for pediatric supracondylar fractures is often utilized as a communication tool to aid in determining whether or not a fracture warrants operative intervention. This study sought to determine the interobserver and intraobserver reliability of the Gartland classification system, as well as to determine whether there was agreement that a fracture warranted operative intervention regardless of the classification system. A total of 200 anteroposterior and lateral radiographs of pediatric supracondylar humerus fractures were retrospectively reviewed by 3 fellowship-trained pediatric orthopaedic surgeons and 2 orthopaedic residents and then classified as type I, IIa, IIb, or III. The surgeons then recorded whether they would treat the fracture nonoperatively or operatively. The κ coefficients were calculated to determine interobserver and intraobserver reliability. Overall, the Wilkins-modified Gartland classification has low-moderate interobserver reliability (κ=0.475) and high intraobserver reliability (κ=0.777). A low interobserver reliability was found when differentiating between type IIa and IIb (κ=0.240) among attendings. There was moderate-high interobserver reliability for the decision to operate (κ=0.691) and high intraobserver reliability (κ=0.760). Decreased interobserver reliability was present for decision to operate among residents. For fractures classified as type I, the decision to operate was made 3% of the time and 27% for type IIa. The decision was made to operate 99% of the time for type IIb and 100% for type III. There is almost full agreement for the nonoperative treatment of Type I fractures and operative treatment for type III fractures. There is agreement that type IIb fractures should be treated operatively and that the majority of type IIa fractures should be treated nonoperatively. However, the interobserver reliability for differentiating between type IIa and IIb fractures is low. Our results validate the Gartland classfication system as a method to help direct treatment of pediatric supracondylar humerus fractures, although the modification of the system, IIa versus IIb, seems to have limited reliability and utility. Terminology based on decision to treat may lead to a more clinically useful classification system in the evaluation and treatment of pediatric supracondylar humerus fractures. Level III-diagnostic studies.
Schimek-Jasch, Tanja; Troost, Esther G C; Rücker, Gerta; Prokic, Vesna; Avlar, Melanie; Duncker-Rohr, Viola; Mix, Michael; Doll, Christian; Grosu, Anca-Ligia; Nestle, Ursula
2015-06-01
Interobserver variability in the definition of target volumes (TVs) is a well-known confounding factor in (multicentre) clinical studies employing radiotherapy. Therefore, detailed contouring guidelines are provided in the prospective randomised multicentre PET-Plan (NCT00697333) clinical trial protocol. This trial compares strictly FDG-PET-based TV delineation with conventional TV delineation in patients with locally advanced non-small cell lung cancer (NSCLC). Despite detailed contouring guidelines, their interpretation by different radiation oncologists can vary considerably, leading to undesirable discrepancies in TV delineation. Considering this, as part of the PET-Plan study quality assurance (QA), a contouring dummy run (DR) consisting of two phases was performed to analyse the interobserver variability before and after teaching. In the first phase of the DR (DR1), radiation oncologists from 14 study centres were asked to delineate TVs as defined by the study protocol (gross TV, GTV; and two clinical TVs, CTV-A and CTV-B) in a test patient. A teaching session was held at a study group meeting, including a discussion of the results focussing on discordances in comparison to the per-protocol solution. Subsequently, the second phase of the DR (DR2) was performed in order to evaluate the impact of teaching. Teaching after DR1 resulted in a reduction of absolute TVs in DR2, as well as in better concordance of TVs. The Overall Kappa(κ) indices increased from 0.63 to 0.71 (GTV), 0.60 to 0.65 (CTV-A) and from 0.59 to 0.63 (CTV-B), demonstrating improvements in overall interobserver agreement. Contouring DRs and study group meetings as part of QA in multicentre clinical trials help to identify misinterpretations of per-protocol TV delineation. Teaching the correct interpretation of protocol contouring guidelines leads to a reduction in interobserver variability and to more consistent contouring, which should consequently improve the validity of the overall study results.
Wong, Kevin; Levi, Jessica R
2017-01-01
Objective Previous studies have shown that patient education materials published by the American Academy of Otolaryngology-Head and Neck Surgery Foundation may be too difficult for the average reader to understand. The purpose of this study was to determine if current educational materials show improvements in readability. Study Design Cross-sectional analysis. Setting The Patient Health Information section of the American Academy of Otolaryngology-Head and Neck Surgery Foundation website. Subjects and Methods All patient education articles were extracted in plain text. Webpage navigation, references, author information, appointment information, acknowledgments, and disclaimers were removed. Follow-up editing was also performed to remove paragraph breaks, colons, semicolons, numbers, percentages, and bullets. Readability grade was calculated with the Flesch-Kincaid Grade Level, Flesch Reading Ease, Gunning-Fog Index, Coleman-Liau Index, Automated Readability Index, and Simple Measure of Gobbledygook. Intra- and interobserver reliability were assessed. Results A total of 126 articles from 7 topics were analyzed. Readability levels across all 6 tools showed that the difficulty of patient education materials exceeded the abilities of an average American. As compared with previous studies, current educational materials by the American Academy of Otolaryngology-Head and Neck Surgery Foundation have shown a decrease in difficulty. Intra- and interobserver reliability were both excellent, with intraclass coefficients of 0.99 and 0.96, respectively. Conclusion Improvements in readability is an encouraging finding and one that is consistent with recent trends toward improved health literacy. Nevertheless, online patient educational material is still too difficult for the average reader. Revisions may be necessary for current materials to benefit a larger readership.
Shade selection performed by novice dental professionals and colorimeter.
Klemetti, E; Matela, A-M; Haag, P; Kononen, M
2006-01-01
The objective of this study was to test inter-observer variability in shade selection for porcelain restorations, using three different shade guides: Vita Lumin Vacuum, Vita 3D-Master and Procera. Nineteen young dental professionals acted as observers. The results were also compared with those of a digital colorimeter (Shade Eye Ex; Shofu, Japan). Regarding repeatability, no significant differences were found between the three shade guides, although repeatability was relatively low (33-43%). Agreement with the colorimetric results was also low (8-34%). In conclusion, shade selection shows moderate to great inter-observer variation. In teaching and standardizing the shade selection procedure, a digital colorimeter may be a useful educational tool.
Reliability of cervical vertebral maturation staging.
Rainey, Billie-Jean; Burnside, Girvan; Harrison, Jayne E
2016-07-01
Growth and its prediction are important for the success of many orthodontic treatments. The aim of this study was to determine the reliability of the cervical vertebral maturation (CVM) method for the assessment of mandibular growth. A group of 20 orthodontic clinicians, inexperienced in CVM staging, was trained to use the improved version of the CVM method for the assessment of mandibular growth with a teaching program. They independently assessed 72 consecutive lateral cephalograms, taken at Liverpool University Dental Hospital, on 2 occasions. The cephalograms were presented in 2 different random orders and interspersed with 11 additional images for standardization. The intraobserver and interobserver agreement values were evaluated using the weighted kappa statistic. The intraobserver and interobserver agreement values were substantial (weighted kappa, 0.6-0.8). The overall intraobserver agreement was 0.70 (SE, 0.01), with average agreement of 89%. The interobserver agreement values were 0.68 (SE, 0.03) for phase 1 and 0.66 (SE, 0.03) for phase 2, with average interobserver agreement of 88%. The intraobserver and interobserver agreement values of classifying the vertebral stages with the CVM method were substantial. These findings demonstrate that this method of CVM classification is reproducible and reliable. Copyright © 2016 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
Wu, Ziqiang; Lin, Jialiu; Huang, Jingjing
2015-01-01
Purpose To describe a novel method for quantitative measurement of area parameters in ocular anterior segment ultrasound biomicroscopy (UBM) images using Photoshop software and to assess its intraobserver and interobserver reproducibility. Methods Twenty healthy volunteers with wide angles and twenty patients with narrow or closed angles were consecutively recruited. UBM images were obtained and analyzed using Photoshop software by two physicians with different-level training on two occasions. Borders of anterior segment structures including cornea, iris, lens, and zonules in the UBM image were semi-automatically defined by the Magnetic Lasso Tool in the Photoshop software according to the pixel contrast and modified by the observers. Anterior chamber area (ACA), posterior chamber area (PCA), iris cross-section area (ICA) and angle recess area (ARA) were drawn and measured. The intraobserver and interobserver reproducibilities of the anterior segment area parameters and scleral spur location were assessed by limits of agreement, coefficient of variation (CV), and intraclass correlation coefficient (ICC). Results All of the parameters were successfully measured by Photoshop. The intraobserver and interobserver reproducibilities of ACA, PCA, and ICA were good, with no more than 5% CV and more than 0.95 ICC, while the CVs of ARA were within 20%. The intraobserver and interobserver reproducibilities for defining the spur location were more than 0.97 ICCs. Although the operating times for both observers were less than 3 minutes per image, there was significant difference in the measuring time between two observers with different levels of training (p<0.001). Conclusion Measurements of ocular anterior segment areas on UBM images by Photoshop showed good intraobserver and interobserver reproducibilties. The methodology was easy to adopt and effective in measuring. PMID:25803857
PI-RADS v2: Current standing and future outlook.
Smith, Clayton P; Türkbey, Barış
2018-05-01
The Prostate Imaging-Reporting and Data System (PI-RADS) was created in 2012 to establish standardization in prostate multiparametric magnetic resonance imaging (mpMRI) acquisition, interpretation, and reporting. In hopes of improving upon some of the PI-RADS v1 shortcomings, the PI-RADS Steering Committee released PI-RADS v2 in 2015. This paper reviews the accuracy, interobserver agreement, and clinical outcomes of PI-RADS v2 and comments on the limitations of the current literature. Overall, PI-RADS v2 shows improved sensitivity and similar specificity compared to PI-RADS v1. However, concerns exist regarding interobserver agreement and the heterogeneity of the study methodology.
PI-RADS v2: Current standing and future outlook
Smith, Clayton P.
2018-01-01
The Prostate Imaging-Reporting and Data System (PI-RADS) was created in 2012 to establish standardization in prostate multiparametric magnetic resonance imaging (mpMRI) acquisition, interpretation, and reporting. In hopes of improving upon some of the PI-RADS v1 shortcomings, the PI-RADS Steering Committee released PI-RADS v2 in 2015. This paper reviews the accuracy, interobserver agreement, and clinical outcomes of PI-RADS v2 and comments on the limitations of the current literature. Overall, PI-RADS v2 shows improved sensitivity and similar specificity compared to PI-RADS v1. However, concerns exist regarding interobserver agreement and the heterogeneity of the study methodology. PMID:29733790
Interobserver Variation in Response Evaluation Criteria in Solid Tumors 1.1.
Karmakar, Arunabha; Kumtakar, Apeksha; Sehgal, Himanshu; Kumar, Savith; Kalyanpur, Arjun
2018-06-19
Response Evaluation Criteria in Solid Tumors (RECIST 1.1) is the gold standard for imaging response evaluation in cancer trials. We sought to evaluate consistency of applying RECIST 1.1 between 2 conventionally trained radiologists, designated as A and B; identify reasons for variation; and reconcile these differences for future studies. The study was approved as an institutional quality check exercise. Since no identifiable patient data was collected or used, a waiver of informed consent was granted. Imaging case report forms of a concluded multicentric breast cancer trial were retrospectively reviewed. Cohen's kappa was used to rate interobserver agreement in Response Evaluation Data (target response, nontarget response, new lesions, overall response). Significant variations were reassessed by a senior radiologist to extrapolate reasons for disagreement. Methods to improve agreement were similarly ascertained. Sixty one cases with total of 82 data-pairs were evaluated (35 data-pairs in visit 5, 47 in visit 9). Both radiologists showed moderate agreement in target response (n = 82; ĸ = 0.477; 95% confidence interval [CI]: 0.314-0.640-), nontarget response (n = 82; ĸ = 0.578; 95% CI: 0.213-0.944) and overall response evaluation in both visits (n = 82; ĸ = 0.510; 95% CI: 0.344-0.676). Further assessment demonstrated "Prevalence effect" of Kappa in some cases which led to underestimation of agreement. Percent agreement of overall response was 74.39% while percent variation was 25.6%. Differences in interpreting RECIST 1.1 and in radiological image interpretation were the primary sources of variation. The commonest overall response was "Partial Response" (Rad A:45/82; Rad B:63/82). Inspite of moderate interobserver agreement, qualitative interpretation differences in some cases increased interobserver variability. Protocols such as Adjudication, to reduce easily avoidable inconsistencies are or should be a part of the Standard Operating Procedure in imaging institutions. Based on our findings, a standard checklist has been developed to help reduce the interpretation error-margin for future studies. Such check-lists may improve interobserver agreement in the preadjudication phase thereby improving quality of results and reducing adjudication per case ratio. Improving data reliability when using RECIST 1.1 will reflect in better cancer clinical trial outcomes. A checklist can be of use to imaging centers to assess and improve their own processes. Copyright © 2018. Published by Elsevier Inc.
Reliability of the MDi Psoriasis® Application to Aid Therapeutic Decision-Making in Psoriasis.
Moreno-Ramírez, D; Herrerías-Esteban, J M; Ojeda-Vila, T; Carrascosa, J M; Carretero, G; de la Cueva, P; Ferrándiz, C; Galán, M; Rivera, R; Rodríguez-Fernández, L; Ruiz-Villaverde, R; Ferrándiz, L
2017-09-01
Therapeutic decisions in psoriasis are influenced by disease factors (e.g., severity or location), comorbidity, and demographic and clinical features. We aimed to assess the reliability of a mobile telephone application (MDi-Psoriasis) designed to help the dermatologist make decisions on how to treat patients with moderate to severe psoriasis. We analyzed interobserver agreement between the advice given by an expert panel and the recommendations of the MDi-Psoriasis application in 10 complex cases of moderate to severe psoriasis. The experts were asked their opinion on which treatments were most appropriate, possible, or inappropriate. Data from the same 10 cases were entered into the MDi-Psoriasis application. Agreement was analyzed in 3 ways: paired interobserver concordance (Cohen's κ), multiple interobserver concordance (Fleiss's κ), and percent agreement between recommendations. The mean percent agreement between the total of 1210 observations was 51.3% (95% CI, 48.5-54.1%). Cohen's κ statistic was 0.29 and Fleiss's κ was 0.28. Mean agreement between pairs of human observers only, excluding the MDi-Psoriasis recommendations, was 50.5% (95% CI, 47.6-53.5%). Paired agreement between the recommendations of the MDi-Psoriasis tool and the majority opinion of the expert panel (Cohen's κ) was 0.44 (68.2% agreement). The MDi-Psoriasis tool can generate recommendations that are comparable to those of experts in psoriasis. Copyright © 2017 AEDV. Publicado por Elsevier España, S.L.U. All rights reserved.
Pearson, Adam M; Spratt, Kevin F; Genuario, James; McGough, William; Kosman, Katherine; Lurie, Jon; Sengupta, Dilip K
2011-04-01
Comparison of intra- and interobserver reliability of digitized manual and computer-assisted intervertebral motion measurements and classification of "instability." To determine if computer-assisted measurement of lumbar intervertebral motion on flexion-extension radiographs improves reliability compared with digitized manual measurements. Many studies have questioned the reliability of manual intervertebral measurements, although few have compared the reliability of computer-assisted and manual measurements on lumbar flexion-extension radiographs. Intervertebral rotation, anterior-posterior (AP) translation, and change in anterior and posterior disc height were measured with a digitized manual technique by three physicians and by three other observers using computer-assisted quantitative motion analysis (QMA) software. Each observer measured 30 sets of digital flexion-extension radiographs (L1-S1) twice. Shrout-Fleiss intraclass correlation coefficients for intra- and interobserver reliabilities were computed. The stability of each level was also classified (instability defined as >4 mm AP translation or 10° rotation), and the intra- and interobserver reliabilities of the two methods were compared using adjusted percent agreement (APA). Intraobserver reliability intraclass correlation coefficients were substantially higher for the QMA technique THAN the digitized manual technique across all measurements: rotation 0.997 versus 0.870, AP translation 0.959 versus 0.557, change in anterior disc height 0.962 versus 0.770, and change in posterior disc height 0.951 versus 0.283. The same pattern was observed for interobserver reliability (rotation 0.962 vs. 0.693, AP translation 0.862 vs. 0.151, change in anterior disc height 0.862 vs. 0.373, and change in posterior disc height 0.730 vs. 0.300). The QMA technique was also more reliable for the classification of "instability." Intraobserver APAs ranged from 87 to 97% for QMA versus 60% to 73% for digitized manual measurements, while interobserver APAs ranged from 91% to 96% for QMA versus 57% to 63% for digitized manual measurements. The use of QMA software substantially improved the reliability of lumbar intervertebral measurements and the classification of instability based on flexion-extension radiographs.
Aihara, Hiroyuki; Kumar, Nitin; Thompson, Christopher C
2018-04-19
An education system for narrow band imaging (NBI) interpretation requires sufficient exposure to key features. However, access to didactic lectures by experienced teachers is limited in the United States. To develop and assess the effectiveness of a colorectal lesion identification tutorial. In the image analysis pretest, subjects including 9 experts and 8 trainees interpreted 50 white light (WL) and 50 NBI images of colorectal lesions. Results were not reviewed with subjects. Trainees then participated in an online tutorial emphasizing NBI interpretation in colorectal lesion analysis. A post-test was administered and diagnostic yields were compared to pre-education diagnostic yields. Under the NBI mode, experts showed higher diagnostic yields (sensitivity 91.5% [87.3-94.4], specificity 90.6% [85.1-94.2], and accuracy 91.1% [88.5-93.7] with substantial interobserver agreement [κ value 0.71]) compared to trainees (sensitivity 89.6% [84.8-93.0], specificity 80.6% [73.5-86.3], and accuracy 86.0% [82.6-89.2], with substantial interobserver agreement [κ value 0.69]). The online tutorial improved the diagnostic yields of trainees to the equivalent level of experts (sensitivity 94.1% [90.0-96.6], specificity 89.0% [83.0-93.2], and accuracy 92.0% [89.3-94.7], p < 0.001 with substantial interobserver agreement [κ value 0.78]). This short, online tutorial improved diagnostic performance and interobserver agreement. © 2018 S. Karger AG, Basel.
Elsebaie, H B; Dannawi, Z; Altaf, F; Zaidan, A; Al Mukhtar, M; Shaw, M J; Gibson, A; Noordeen, H
2016-02-01
The achievement of shoulder balance is an important measure of successful scoliosis surgery. No previously described classification system has taken shoulder balance into account. We propose a simple classification system for AIS based on two components which include the curve type and shoulder level. Altogether, three curve types have been defined according to the size and location of the curves, each curve pattern is subdivided into type A or B depending on the shoulder level. This classification was tested for interobserver reproducibility and intraobserver reliability. A retrospective analysis of the radiographs of 232 consecutive cases of AIS patients treated surgically between 2005 and 2009 was also performed. Three major types and six subtypes were identified. Type I accounted for 30 %, type II 28 % and type III 42 %. The retrospective analysis showed three patients developed a decompensation that required extension of the fusion. One case developed worsening of shoulder balance requiring further surgery. This classification was tested for interobserver and intraobserver reliability. The mean kappa coefficients for interobserver reproducibility ranged from 0.89 to 0.952, while the mean kappa value for intraobserver reliability was 0.964 indicating a good-to-excellent reliability. The treatment algorithm guides the spinal surgeon to achieve optimal curve correction and postoperative shoulder balance whilst fusing the smallest number of spinal segments. The high interobserver reproducibility and intraobserver reliability makes it an invaluable tool to describe scoliosis curves in everyday clinical practice.
Nakajima, Erica C; Frankland, Michael P; Johnson, Tucker F; Antic, Sanja L; Chen, Heidi; Chen, Sheau-Chiann; Karwoski, Ronald A; Walker, Ronald; Landman, Bennett A; Clay, Ryan D; Bartholmai, Brian J; Rajagopalan, Srinivasan; Peikert, Tobias; Massion, Pierre P; Maldonado, Fabien
2018-01-01
Lung adenocarcinoma (ADC), the most common lung cancer type, is recognized increasingly as a disease spectrum. To guide individualized patient care, a non-invasive means of distinguishing indolent from aggressive ADC subtypes is needed urgently. Computer-Aided Nodule Assessment and Risk Yield (CANARY) is a novel computed tomography (CT) tool that characterizes early ADCs by detecting nine distinct CT voxel classes, representing a spectrum of lepidic to invasive growth, within an ADC. CANARY characterization has been shown to correlate with ADC histology and patient outcomes. This study evaluated the inter-observer variability of CANARY analysis. Three novice observers segmented and analyzed independently 95 biopsy-confirmed lung ADCs from Vanderbilt University Medical Center/Nashville Veterans Administration Tennessee Valley Healthcare system (VUMC/TVHS) and the Mayo Clinic (Mayo). Inter-observer variability was measured using intra-class correlation coefficient (ICC). The average ICC for all CANARY classes was 0.828 (95% CI 0.76, 0.895) for the VUMC/TVHS cohort, and 0.852 (95% CI 0.804, 0.901) for the Mayo cohort. The most invasive voxel classes had the highest ICC values. To determine whether nodule size influenced inter-observer variability, an additional cohort of 49 sub-centimeter nodules from Mayo were also segmented by three observers, with similar ICC results. Our study demonstrates that CANARY ADC classification between novice CANARY users has an acceptably low degree of variability, and supports the further development of CANARY for clinical application.
Measuring the Cobb angle with the iPhone in kyphoses: a reliability study.
Jacquot, Frederic; Charpentier, Axelle; Khelifi, Sofiane; Gastambide, Daniel; Rigal, Regis; Sautet, Alain
2012-08-01
Smartphones have gained widespread use in the healthcare field to fulfill a variety of tasks. We developed a small iPhone application to take advantage of the built-in position sensor to measure angles in a variety of spinal deformities. We present a reliability study of this tool in measuring kyphotic angles. Radiographs taken from 20 different patients' charts were presented to a panel of six operators at two different times. Radiographs were measured with the protractor and the iPhone application and statistical analysis was applied to measure intraclass correlation coefficients between both measurement methods, and to measure intra- and interobserver reliability The intraclass correlation coefficient calculated between methods (i.e. CobbMeter application on the iPhone versus standard method with the protractor) was 0.963 for all measures, indicating excellent correlation was obtained between the CobbMeter application and the standard method. The interobserver correlation coefficient was 0.965. The intraobserver ICC was 0.977, indicating excellent reproductibility of measurements at different times for all operators. The interobserver ICC between fellowship trained senior surgeons and general orthopaedic residents was 0.989. Consistently, the ICC for intraobserver and interobserver correlations was higher with the CobbMeter application than with the regular protractor method. This difference was not statistically significant. Measuring kyphotic angles with the iPhone application appears to be a valid procedure and is in no way inferior to the standard way of measuring the Cobb angle in kyphotic deformities.
Pedersen, Ken Steen; Toft, Nils
2011-03-01
The objective of the current study was to evaluate intra- and inter-observer agreement using a descriptive classification scale with four categories, descriptive text and pictures for assessment of consistency in faecal samples from pigs post weaning. The four consistency categories were score one=firm and shaped, score two=soft and shaped, score three=loose and score four=watery. Five observers from the same veterinary practice examined 100 faecal samples using the scale with four categories. Four of the observers examined the 100 faecal samples twice within the same day. Within observers the difference in proportions for the individual consistency categories between two examinations was on average 0.04 (range: 0-0.10). The mean intra-observer agreement was 0.82 (range: 0.72-0.91) with a mean kappa value of 0.76 (range: 0.61-0.88). For inter-observer agreement overall kappa was 0.64. For the 10 pair-wise comparisons the mean inter-observer agreement was 0.73 (range: 0.61-0.90) with a mean kappa value of 0.64 (range: 0.48-0.87). The difference in proportions for the individual consistency categories was on average 0.08 (range: 0-0.17). In conclusion, the agreement observed for the descriptive classification scale with four categories, descriptive text and pictures may be categorized as a substantial to almost perfect intra-observer agreement and a moderate to almost perfect inter-observer agreement. However, more objective measures than clinical scales may still be needed to improve intra- and inter-observer agreement in research studies. Copyright © 2010 Elsevier B.V. All rights reserved.
Computerized nailfold video capillaroscopy--a new tool for assessment of Raynaud's phenomenon.
Anderson, Marina E; Allen, P Danny; Moore, Tonia; Hillier, Val; Taylor, Christopher J; Herrick, Ariane L
2005-05-01
To develop a computer based nailfold video capillaroscopy system with enhanced image quality and to assess its disease-subgroup resolving power in patients with primary and secondary Raynaud's phenomenon (RP). Using frame registration software, digitized video images from the microscope were combined to form a panoramic mosaic of the nailfold. Capillary dimensions (apex, arterial, venous, and total width) and density were measured onscreen. Significantly, the new system could guarantee analysis of the same set of capillaries by 2 observers. Forty-eight healthy control subjects, 21 patients with primary RP, 40 patients with limited cutaneous systemic sclerosis (lcSSc), and 11 patients with diffuse cutaneous SSc (dcSSc) were studied. Intra- and interobserver variability were calculated in a subset of 30 subjects. The number of loops/mm was significantly lower, and all 4 capillary dimensions significantly greater, in SSc patients versus controls plus primary RP patients (p < 0.001 for all measures). When comparing control (+ primary RP) patients with SSc patients (lcSSc + dcSSc) the most powerful discriminator was found to be the number of loops/mm. Results for intra- and interobserver reproducibility showed that the limits of agreement were closer when both observers measured the same capillaries. The key feature of the newly developed system is that it improves reproducibility of nailfold capillary measurements by allowing reidentification of the same capillaries by different observers. By allowing access to previous measurements, the new system should improve reliability in longitudinal studies, and therefore has the potential of being a valuable outcome measure of microvessel disease/involvement in clinical trials of scleroderma spectrum disorders.
Groth, M; Forkert, N D; Buhk, J H; Schoenfeld, M; Goebell, E; Fiehler, J
2013-02-01
To compare intra- and inter-observer reliability of aneurysm measurements obtained by a 3D computer-aided technique with standard manual aneurysm measurements in different imaging modalities. A total of 21 patients with 29 cerebral aneurysms were studied. All patients underwent digital subtraction angiography (DSA), contrast-enhanced (CE-MRA) and time-of-flight magnetic resonance angiography (TOF-MRA). Aneurysm neck and depth diameters were manually measured by two observers in each modality. Additionally, semi-automatic computer-aided diameter measurements were performed using 3D vessel surface models derived from CE- (CE-com) and TOF-MRA (TOF-com) datasets. Bland-Altman analysis (BA) and intra-class correlation coefficient (ICC) were used to evaluate intra- and inter-observer agreement. BA revealed the narrowest relative limits of intra- and inter-observer agreement for aneurysm neck and depth diameters obtained by TOF-com (ranging between ±5.3 % and ±28.3 %) and CE-com (ranging between ±23.3 % and ±38.1 %). Direct measurements in DSA, TOF-MRA and CE-MRA showed considerably wider limits of agreement. The highest ICCs were observed for TOF-com and CE-com (ICC values, 0.92 or higher for intra- as well as inter-observer reliability). Computer-aided aneurysm measurement in 3D offers improved intra- and inter-observer reliability and a reproducible parameter extraction, which may be used in clinical routine and as objective surrogate end-points in clinical trials.
Varga, Zsuzsanna; Cassoly, Estelle; Li, Qiyu; Oehlschlegel, Christian; Tapia, Coya; Lehr, Hans Anton; Klingbiel, Dirk; Thürlimann, Beat; Ruhstaller, Thomas
2015-01-01
Background Proliferative activity (Ki-67 Labelling Index) in breast cancer increasingly serves as an additional tool in the decision for or against adjuvant chemotherapy in midrange hormone receptor positive breast cancer. Ki-67 Index has been previously shown to suffer from high inter-observer variability especially in midrange (G2) breast carcinomas. In this study we conducted a systematic approach using different Ki-67 assessments on large tissue sections in order to identify the method with the highest reliability and the lowest variability. Materials and Methods Five breast pathologists retrospectively analyzed proliferative activity of 50 G2 invasive breast carcinomas using large tissue sections by assessing Ki-67 immunohistochemistry. Ki-67-assessments were done on light microscopy and on digital images following these methods: 1) assessing five regions, 2) assessing only darkly stained nuclei and 3) considering only condensed proliferative areas (‘hotspots’). An individual review (the first described assessment from 2008) was also performed. The assessments on light microscopy were done by estimating. All measurements were performed three times. Inter-observer and intra-observer reliabilities were calculated using the approach proposed by Eliasziw et al. Clinical cutoffs (14% and 20%) were tested using Fleiss’ Kappa. Results There was a good intra-observer reliability in 5 of 7 methods (ICC: 0.76–0.89). The two highest inter-observer reliability was fair to moderate (ICC: 0.71 and 0.74) in 2 methods (region-analysis and individual-review) on light microscopy. Fleiss’-kappa-values (14% cut-off) were the highest (moderate) using the original recommendation on light-microscope (Kappa 0.58). Fleiss’ kappa values (20% cut-off) were the highest (Kappa 0.48 each) in analyzing hotspots on light-microscopy and digital-analysis. No methodologies using digital-analysis were superior to the methods on light microscope. Conclusion Our results show that all methods on light-microscopy for Ki-67 assessment in large tissue sections resulted in a good intra-observer reliability. Region analysis and individual review (the original recommendation) on light-microscopy yielded the highest inter-observer reliability. These results show slight improvement to previously published data on poor-reproducibility and thus might be a practical-pragmatic way for routine assessment of Ki-67 Index in G2 breast carcinomas. PMID:25885288
Bonasia, Davide Edoardo; Marmotti, Antongiulio; Massa, Alessandro Domenico Felice; Ferro, Andrea; Blonna, Davide; Castoldi, Filippo; Rossi, Roberto
2015-09-01
In the last two decades, many surgical techniques have been described for articular cartilage repair. Reliable histological scoring systems are fundamental tools to evaluate new procedures. Several histological scoring systems have been described, and these can be divided in elementary and comprehensive scores, according to the number of sub-items. The aim of this study was to test the inter- and intra-observer reliability of ten main scores used for the histological evaluation of in vivo cartilage repair. The authors tested the starting hypothesis that elementary scores would show superior intra- and inter-observer reliability compared with comprehensive scores. Fifty histological sections obtained from the trochlea of New Zealand Rabbit and stained with Safranin-O fast green were used. The histological sections were analysed by 4 observers: 2 experienced in cartilage histology and 2 inexperienced. Histological evaluations were performed at time 1 and time 2, separated by a 30-day interval. The following scores were used: Mankin, O'Driscoll, Pineda, Wakitani, Fortier, Selleres, ICRS, ICRSII, Oswestry (OsScore) and modified O'Driscoll. Intra- and inter-observer reliability were evaluated for each score. In addition, the pavement-ceiling effect and the Bland-Altman Coefficient of Repeatability were then evaluated for each sub-item of every score. Intra-observer reliability was high for all observers in every score, even though the reliability was significantly lower for non-expert observers compared with expert counterparts. In terms of Coefficient of Repeatability, some scores performed better (O'Driscoll, Modified O'Driscoll and ICRSII) than others (Fortier, Seller). Inter-observer reliability was high for all observers in every score, but significantly lower for non-expert compared with expert observers. In expert hands, all the scores showed high intra- and inter-observer reliability, independently of the complexity. Although every score has advantages and disadvantages, ICRSII, O'Driscoll and Modified O'Driscoll scores should be preferred for the evaluation of in vivo cartilage repair in animal models.
Qiao, Jun; Xu, Leilei; Zhu, Zezhang; Zhu, Feng; Liu, Zhen; Qian, Bangping; Qiu, Yong
2014-10-11
Scoliogauge, has been developed for the measurement of ATR on iPhone smartphones. This study was to evaluate the reliability for the smartphone-aided ATR measurement method and to compare its reliability with that of the manual method. Sixty-four AIS patients with single thoracic or lumbar curve participated in this study. Of these patients, thirty-two patients had main thoracic scoliosis while other thirty-two had main thoracolumbar/lumbar scoliosis. Two spine surgeons performed the measurements with Scoliometer and Scoliogauge. The Scoliogauge measurements were conducted on an iPhone 4 smartphone. The intraclass correlation coefficient (ICC) 2-way mixed model on absolute agreement was used to analyze the reliability categorized according to regions: thoracic or lumbar, and Cobb angles: <20 degrees and >40 degrees. ICC < 0.40 is considered as poor, 0.40-0.59 as fair, 0.60-0.74 as good, and 0.75-1.00 as excellent. The overall intraobserver variability was 0.954 and the overall interobserver variability was 0.943 for the scoliometer set, whereas the intraobserver variability was 0.965 and interobserver variability was 0.964 for the scoliogauge set. Both the intraobserver and interobserver ICCs reached the excellent value in the 2 sets for both observers. The mean Cobb angle of thoracic curves in patients with main thoracic scoliosis was similar to that of lumbar curves in those with main thoracolumbar/lumbar scoliosis (35.7 degrees vs. 36.1 degrees). The intraobserver and interobserver reliability was similar between two groups (thoracic vs. lumbar) in the 2 sets. There were 21 patients having Cobb angles < 20 degrees, while 20 patients >40 degrees. The intraobserver and interobserver reliability was better in severe curve(>40 degrees) group. Smartphone-aided measurement for ATR showed excellent reliability, and the reliability of measurement with either scoliometer or scoliogauge could be influenced by Cobb angle that reliability was better for curves with larger Cobb angles.
Berger, Aaron J; Momeni, Arash; Ladd, Amy L
2014-04-01
Trapeziometacarpal, or thumb carpometacarpal (CMC), arthritis is a common problem with a variety of treatment options. Although widely used, the Eaton radiographic staging system for CMC arthritis is of questionable clinical utility, as disease severity does not predictably correlate with symptoms or treatment recommendations. A possible reason for this is that the classification itself may not be reliable, but the literature on this has not, to our knowledge, been systematically reviewed. We therefore performed a systematic review to determine the intra- and interobserver reliability of the Eaton staging system. We systematically reviewed English-language studies published between 1973 and 2013 to assess the degree of intra- and interobserver reliability of the Eaton classification for determining the stage of trapeziometacarpal joint arthritis and pantrapezial arthritis based on plain radiographic imaging. Search engines included: PubMed, Scopus(®), and CINAHL. Four studies, which included a total of 163 patients, met our inclusion criteria and were evaluated. The level of evidence of the studies included in this analysis was determined using the Oxford Centre for Evidence Based Medicine Levels of Evidence Classification by two independent observers. A limited number of studies have been performed to assess intra- and interobserver reliability of the Eaton classification system. The four studies included were determined to be Level 3b. These studies collectively indicate that the Eaton classification demonstrates poor to fair interobserver reliability (kappa values: 0.11-0.56) and fair to moderate intraobserver reliability (kappa values: 0.54-0.657). Review of the literature demonstrates that radiographs assist in the assessment of CMC joint disease, but there is not a reliable system for classification of disease severity. Currently, diagnosis and treatment of thumb CMC arthritis are based on the surgeon's qualitative assessment combining history, physical examination, and radiographic evaluation. Inconsistent agreement using the current common radiographic classification system suggests a need for better radiographic tools to quantify disease severity.
Interobserver variability for the WHO classification of pulmonary carcinoids.
Swarts, Dorian R A; van Suylen, Robert-Jan; den Bakker, Michael A; van Oosterhout, Matthijs F M; Thunnissen, Frederik B J M; Volante, Marco; Dingemans, Anne-Marie C; Scheltinga, Marc R M; Bootsma, Gerben P; Pouwels, Harry M M; van den Borne, Ben E E M; Ramaekers, Frans C S; Speel, Ernst-Jan M
2014-10-01
Pulmonary carcinoids are neuroendocrine tumors histopathologically subclassified into typical (TC; no necrosis, <2 mitoses per 2 mm) and atypical (AC; necrosis or 2 to 10 mitoses per 2 mm). The reproducibility of lung carcinoid classification, however, has not been extensively studied and may be hampered by the presence of pyknotic apoptosis mimicking mitotic figures. Furthermore, prediction of prognosis based on histopathology varies, especially for ACs. We examined the presence of interobserver variation between 5 experienced pulmonary pathologists who reviewed 123 originally diagnosed pulmonary carcinoid cases. The tumors were subsequently redistributed over 3 groups: unanimously classified cases, consensus cases (4/5 pathologists rendered identical diagnosis), and disagreement cases (divergent diagnosis by ≥2 assessors). κ-values were calculated, and results were correlated with clinical follow-up and molecular data. When focusing on the 114/123 cases unanimously classified as pulmonary carcinoids, the interobserver agreement was only fair (κ=0.32). Of these 114 cases, 55% were unanimously classified, 25% reached consensus classification, and for 19% there was no consensus. ACs were significantly more often in the latter category (P=0.00038). The designation of TCs and ACs by ≥3 assessors was not associated with prognosis (P=0.11). However, when disagreement cases were allocated on the basis of Ki-67 proliferative index (<5%; ≥5%) or nuclear orthopedia homeobox immunostaining (+; -), correlation with prognosis improved significantly (P=0.00040 and 0.0024, respectively). In conclusion, there is a considerable interobserver variation in the histopathologic classification of lung carcinoids, in particular concerning ACs. Additional immunomarkers such as Ki-67 or orthopedia homeobox may improve classification and prediction of prognosis.
Klauser, Andrea S; Franz, Magdalena; Arora, Rohit; Feuchtner, Gudrun M; Gruber, Johann; Schirmer, Michael; Jaschke, Werner R; Gabl, Markus F
2010-01-01
We sought to assess vascularity in wrist tenosynovitis by using power Doppler ultrasound (PDUS) and to compare detection of intra- and peritendinous vascularity with that of contrast-enhanced grey-scale ultrasound (CEUS). Twenty-six tendons of 24 patients (nine men, 15 women; mean age ± SD, 54.4 ± 11.8 years) with a clinical diagnosis of tenosynovitis were examined with B-mode ultrasonography, PDUS, and CEUS by using a second-generation contrast agent, SonoVue (Bracco Diagnostics, Milan, Italy) and a low-mechanical-index ultrasound technique. Thickness of synovitis, extent of vascularized pannus, intensity of peritendinous vascularisation, and detection of intratendinous vessels was incorporated in a 3-score grading system (grade 0 to 2). Interobserver variability was calculated. With CEUS, a significantly greater extent of vascularity could be detected than by using PDUS (P < 0.001). In terms of peri- and intratendinous vessels, CEUS was significantly more sensitive in the detection of vascularization compared with PDUS (P < 0.001). No significant correlation between synovial thickening and extent of vascularity could be found (P = 0.089 to 0.097). Interobserver reliability was calculated to be excellent when evaluating the grading score (κ = 0.811 to 1.00). CEUS is a promising tool to detect tendon vascularity with higher sensitivity than PDUS by improved detection of intra- and peritendinous vascularity.
Validation of a novel smartphone accelerometer-based knee goniometer.
Ockendon, Matthew; Gilbert, Robin E
2012-09-01
Loss of full knee extension following anterior cruciate ligament surgery has been shown to impair knee function. However, there can be significant difficulties in accurately and reproducibly measuring a fixed flexion of the knee. We studied the interobserver and the intraobserver reliabilities of a novel, smartphone accelerometer-based, knee goniometer and compared it with a long-armed conventional goniometer for the assessment of fixed flexion knee deformity. Five healthy male volunteers (age range 30 to 40 years) were studied. Measurements of knee flexion angle were made with a telescopic-armed goniometer (Lafayette Instrument, Lafayette, IN) and compared with measurements using the smartphone (iPhone 3GS, Apple Inc., Cupertino, CA) knee goniometer using a novel trigonometric technique based on tibial inclination. Bland-Altman analysis of validity and reliability including statistical analysis of correlation by Pearson's method was undertaken. The iPhone goniometer had an interobserver correlation (r) of 0.994 compared with 0.952 for the Lafayette. The intraobserver correlation was r = 0.982 for the iPhone (compared with 0.927). The datasets from the two instruments correlate closely (r = 0.947) are proportional and have mean difference of only -0.4 degrees (SD 3.86 degrees). The Lafayette goniometer had an intraobserver reliability +/- 9.6 degrees. The interobserver reliability was +/- 8.4 degrees. By comparison the iPhone had an interobserver reliability +/- 2.7 degrees and an intraobserver reliability +/- 4.6 degrees. We found the iPhone goniometer to be a reliable tool for the measurement of subtle knee flexion in the clinic setting.
Hopp, Sascha; Ojodu, Ishaq; Jain, Atul; Fritz, Tobias; Pohlemann, Tim; Kelm, Jens
2018-05-01
Radiographic abnormalities of the symphysis as well as the formation of accessory clefts, indicating injury at the rectus-adductor aponeurosis, reportedly relate to longstanding groin pain in athletes. However, yet, no systematic classification for clinical and scientific purposes exists. We aimed to (1) create a radiographic classification based on symphysography; (2) test intra- and interobserver reliability; (3) characterise clinical significance of the morphologic patterns by evaluating success of injection therapy. We retrospectively reviewed symphysography, AP radiographs, and MRI of the pelvis from 70 consecutive competitive athletes, with chronic groin pain. Symphysographs were evaluated for intra- and interobserver variance using cohen's kappa statistics. Morphologic studies of the different contrast distribution patterns and their clinical and radiological correlation with symptom relief were investigated. All patients were followed up to evaluate immediate and long-term response to the initial therapeutic injection with steroid. Four reproducible symphysographic patterns were identified: type 0, no changes; type 1, symphyseal disk degeneration; types 2a with unilateral clefts, bilateral clefts (2b), suprapubic clefts (2c); and type 3, with expanded or multidirectional clefts. Analysis revealed excellent intra (0.94)-and interobserver (0.90) reliability. Our findings showed that 78.6% of our patients had significant short-term improvement enabling early resumption of physiotherapy, only in types 1 and 2 (p = 0.001), while type 0 and 3 did not respond. At follow-up, only 21.8% had permanent pain relief. Regarding the detection of pathologic clefts with symphysography, sensitivity (88%) and specifity (77%) were superior to that of MRI. A reproducible symphysography-based classification of distinct morphologic patterns is proposed. It serves as a predictive tool for response to injection therapy in a select group of pathologic lesions. Complete recovery after injection can only be expected in a lesser percentage, as this might indicate surgical treatment for long-term non-responders.
Cheung, Yun-Chung; Lin, Yu-Ching; Wan, Yung-Liang; Yeow, Kee-Min; Huang, Pei-Chin; Lo, Yung-Feng; Tsai, Hsiu-Pei; Ueng, Shir-Hwa; Chang, Chee-Jen
2014-10-01
To analyse the accuracy of dual-energy contrast-enhanced spectral mammography in dense breasts in comparison with contrast-enhanced subtracted mammography (CESM) and conventional mammography (Mx). CESM cases of dense breasts with histological proof were evaluated in the present study. Four radiologists with varying experience in mammography interpretation blindly read Mx first, followed by CESM. The diagnostic profiles, consistency and learning curve were analysed statistically. One hundred lesions (28 benign and 72 breast malignancies) in 89 females were analysed. Use of CESM improved the cancer diagnosis by 21.2 % in sensitivity (71.5 % to 92.7 %), by 16.1 % in specificity (51.8 % to 67.9 %) and by 19.8 % in accuracy (65.9 % to 85.8 %) compared with Mx. The interobserver diagnostic consistency was markedly higher using CESM than using Mx alone (0.6235 vs. 0.3869 using the kappa ratio). The probability of a correct prediction was elevated from 80 % to 90 % after 75 consecutive case readings. CESM provided additional information with consistent improvement of the cancer diagnosis in dense breasts compared to Mx alone. The prediction of the diagnosis could be improved by the interpretation of a significant number of cases in the presence of 6 % benign contrast enhancement in this study. • DE-CESM improves the cancer diagnosis in dense breasts compared with mammography. • DE-CESM shows greater consistency than mammography alone by interobserver blind reading. • Diagnostic improvement of DE-CESM is independent of the mammographic reading experience.
Nestle, Ursula; Rischke, Hans Christian; Eschmann, Susanne Martina; Holl, Gabriele; Tosch, Marco; Miederer, Matthias; Plotkin, Michail; Essler, Markus; Puskas, Cornelia; Schimek-Jasch, Tanja; Duncker-Rohr, Viola; Rühl, Friederike; Leifert, Anja; Mix, Michael; Grosu, Anca-Ligia; König, Jochem; Vach, Werner
2015-11-01
Oncologic imaging is a key for successful cancer treatment. While the quality assurance (QA) of image acquisition protocols has already been focussed, QA of reading and reporting offers still room for improvement. The latter was addressed in the context of a prospective multicentre trial on fluoro-deoxyglucose (FDG)-positron-emission tomography (PET)/CT-based chemoradiotherapy for locally advanced non-small cell lung cancer (NSCLC). An expert panel was prospectively installed performing blinded reviews of mediastinal NSCLC involvement in FDG-PET/CT. Due to a high initial reporting inter-observer disagreement, the independent data monitoring committee (IDMC) triggered an interventional harmonisation process, which overall involved 11 experts uttering 6855 blinded diagnostic statements. After assessing the baseline inter-observer agreement (IOA) of a blinded re-review (phase 1), a discussion process led to improved reading criteria (phase 2). Those underwent a validation study (phase 3) and were then implemented into the study routine. After 2 months (phase 4) and 1 year (phase 5), the IOA was reassessed. The initial overall IOA was moderate (kappa 0.52 CT; 0.53 PET). After improvement of reading criteria, the kappa values improved substantially (kappa 0.61 CT; 0.66 PET), which was retained until the late reassessment (kappa 0.71 CT; 0.67 PET). Subjective uncertainty was highly predictive for low IOA. The IOA of an expert panel was significantly improved by a structured interventional harmonisation process which could be a model for future clinical trials. Furthermore, the low IOA in reporting nodal involvement in NSCLC may bear consequences for individual patient care. Copyright © 2015 Elsevier Ltd. All rights reserved.
Del Pilar Duque Orozco, Maria; Abousamra, Oussama; Church, Chris; Lennon, Nancy; Henley, John; Rogers, Kenneth J; Sees, Julieanne P; Connor, Justin; Miller, Freeman
2016-09-01
Assessment of gait abnormalities in cerebral palsy (CP) is challenging, and access to instrumented gait analysis is not always feasible. Therefore, many observational gait analysis scales have been devised. This study aimed to evaluate the interobserver reliability, intraobserver reliability, and validity of Edinburgh visual gait score (EVGS). Video of 30 children with spastic CP were reviewed by 7 raters (10 children each in GMFCS levels I, II, and III, age 6-12 years). Three observers had high level of experience in gait analysis (10+ years), two had medium level (2-5 years) and two had no previous experience (orthopedic fellows). Interobserver reliability was evaluated using percentage of complete agreement and kappa values. Criterion validity was evaluated by comparing EVGS scores with 3DGA data taken from the same video visit. Interobserver agreement was 60-90% and Kappa values were 0.18-0.85 for the 17 items in EVGS. Reliability was higher for distal segments (foot/ankle/knee 63-90%; trunk/pelvis/hip 60-76%), with greater experience (high 66-91%, medium 62-90%, no-experience 41-87%), with more EVGS practice (1st 10 videos 52-88%, last 10 videos 64-97%) and when used with higher functioning children (GMFCS I 65-96%, II 58-90%, III 35-65%). Intraobserver agreement was 64-92%. Agreement between EVGS and 3DGA was 52-73%. We believe that having EVGS as part of the standardized gait evaluation is helpful in optimizing the visual scoring. EVGS can be a supportive tool that adds quantitative data instead of only qualitative assessment to a video only gait evaluation. Copyright © 2016 Elsevier B.V. All rights reserved.
Can emergency physicians accurately and reliably assess acute vertigo in the emergency department?
Vanni, Simone; Nazerian, Peiman; Casati, Carlotta; Moroni, Federico; Risso, Michele; Ottaviani, Maddalena; Pecci, Rudi; Pepe, Giuseppe; Vannucchi, Paolo; Grifoni, Stefano
2015-04-01
To validate a clinical diagnostic tool, used by emergency physicians (EPs), to diagnose the central cause of patients presenting with vertigo, and to determine interrater reliability of this tool. A convenience sample of adult patients presenting to a single academic ED with isolated vertigo (i.e. vertigo without other neurological deficits) was prospectively evaluated with STANDING (SponTAneousNystagmus, Direction, head Impulse test, standiNG) by five trained EPs. The first step focused on the presence of spontaneous nystagmus, the second on the direction of nystagmus, the third on head impulse test and the fourth on gait. The local standard practice, senior audiologist evaluation corroborated by neuroimaging when deemed appropriate, was considered the reference standard. Sensitivity and specificity of STANDING were calculated. On the first 30 patients, inter-observer agreement among EPs was also assessed. Five EPs with limited experience in nystagmus assessment volunteered to participate in the present study enrolling 98 patients. Their average evaluation time was 9.9 ± 2.8 min (range 6-17). Central acute vertigo was suspected in 16 (16.3%) patients. There were 13 true positives, three false positives, 81 true negatives and one false negative, with a high sensitivity (92.9%, 95% CI 70-100%) and specificity (96.4%, 95% CI 93-38%) for central acute vertigo according to senior audiologist evaluation. The Cohen's kappas of the first, second, third and fourth steps of the STANDING were 0.86, 0.93, 0.73 and 0.78, respectively. The whole test showed a good inter-observer agreement (k = 0.76, 95% CI 0.45-1). In the hands of EPs, STANDING showed a good inter-observer agreement and accuracy validated against the local standard of care. © 2015 Australasian College for Emergency Medicine and Australasian Society for Emergency Medicine.
Huynh, Thien J; Flaherty, Matthew L; Gladstone, David J; Broderick, Joseph P; Demchuk, Andrew M; Dowlatshahi, Dar; Meretoja, Atte; Davis, Stephen M; Mitchell, Peter J; Tomlinson, George A; Chenkin, Jordan; Chia, Tze L; Symons, Sean P; Aviv, Richard I
2014-01-01
Rapid, accurate, and reliable identification of the computed tomography angiography spot sign is required to identify patients with intracerebral hemorrhage for trials of acute hemostatic therapy. We sought to assess the accuracy and interobserver agreement for spot sign identification. A total of 131 neurology, emergency medicine, and neuroradiology staff and fellows underwent imaging certification for spot sign identification before enrolling patients in 3 trials targeting spot-positive intracerebral hemorrhage for hemostatic intervention (STOP-IT, SPOTLIGHT, STOP-AUST). Ten intracerebral hemorrhage cases (spot-positive/negative ratio, 1:1) were presented for evaluation of spot sign presence, number, and mimics. True spot positivity was determined by consensus of 2 experienced neuroradiologists. Diagnostic performance, agreement, and differences by training level were analyzed. Mean accuracy, sensitivity, and specificity for spot sign identification were 87%, 78%, and 96%, respectively. Overall sensitivity was lower than specificity (P<0.001) because of true spot signs incorrectly perceived as spot mimics. Interobserver agreement for spot sign presence was moderate (k=0.60). When true spots were correctly identified, 81% correctly identified the presence of single or multiple spots. Median time needed to evaluate the presence of a spot sign was 1.9 minutes (interquartile range, 1.2-3.1 minutes). Diagnostic performance, interobserver agreement, and time needed for spot sign evaluation were similar among staff physicians and fellows. Accuracy for spot identification is high with opportunity for improvement in spot interpretation sensitivity and interobserver agreement particularly through greater reliance on computed tomography angiography source data and awareness of limitations of multiplanar images. Further prospective study is needed.
Interobserver agreement on histopathological lesions in class III or IV lupus nephritis.
Wilhelmus, Suzanne; Cook, H Terence; Noël, Laure-Hélène; Ferrario, Franco; Wolterbeek, Ron; Bruijn, Jan A; Bajema, Ingeborg M
2015-01-07
To treat lupus nephritis effectively, proper identification of the histologic class is essential. Although the classification system for lupus nephritis is nearly 40 years old, remarkably few studies have investigated interobserver agreement. Interobserver agreement among nephropathologists was studied, particularly with respect to the recognition of class III/IV lupus nephritis lesions, and possible causes of disagreement were determined. A link to a survey containing pictures of 30 glomeruli was provided to all 360 members of the Renal Pathology Society; 34 responses were received from 12 countries (a response rate of 9.4%). The nephropathologist was asked whether glomerular lesions were present that would categorize the biopsy as class III/IV. If so, additional parameters were scored. To determine the interobserver agreement among the participants, κ or intraclass correlation values were calculated. The intraclass correlation or κ-value was also calculated for two separate levels of experience (specifically, nephropathologists who were new to the field or moderately experienced [less experienced] and nephropathologists who were highly experienced). Intraclass correlation for the presence of a class III/IV lesion was 0.39 (poor). The κ/intraclass correlation values for the additional parameters were as follows: active, chronic, or both: 0.36; segmental versus global: 0.39; endocapillary proliferation: 0.46; influx of inflammatory cells: 0.32; swelling of endothelial cells: 0.46; extracapillary proliferation: 0.57; type of crescent: 0.46; and wire loops: 0.35. The highly experienced nephropathologists had significantly less interobserver variability compared with the less experienced nephropathologists (P=0.004). There is generally poor agreement in terms of recognizing class III/IV lesions. Because experience clearly increases interobserver agreement, this agreement may be improved by training nephropathologists. These results also underscore the importance of a central review by experienced nephropathologists in clinical trials. Copyright © 2015 by the American Society of Nephrology.
Reliability of a four-column classification for tibial plateau fractures.
Martínez-Rondanelli, Alfredo; Escobar-González, Sara Sofía; Henao-Alzate, Alejandro; Martínez-Cano, Juan Pablo
2017-09-01
A four-column classification system offers a different way of evaluating tibial plateau fractures. The aim of this study is to compare the intra-observer and inter-observer reliability between four-column and classic classifications. This is a reliability study, which included patients presenting with tibial plateau fractures between January 2013 and September 2015 in a level-1 trauma centre. Four orthopaedic surgeons blindly classified each fracture according to four different classifications: AO, Schatzker, Duparc and four-column. Kappa, intra-observer and inter-observer concordance were calculated for the reliability analysis. Forty-nine patients were included. The mean age was 39 ± 14.2 years, with no gender predominance (men: 51%; women: 49%), and 67% of the fractures included at least one of the posterior columns. The intra-observer and inter-observer concordance were calculated for each classification: four-column (84%/79%), Schatzker (60%/71%), AO (50%/59%) and Duparc (48%/58%), with a statistically significant difference among them (p = 0.001/p = 0.003). Kappa coefficient for intr-aobserver and inter-observer evaluations: Schatzker 0.48/0.39, four-column 0.61/0.34, Duparc 0.37/0.23, and AO 0.34/0.11. The proposed four-column classification showed the highest intra and inter-observer agreement. When taking into account the agreement that occurs by chance, Schatzker classification showed the highest inter-observer kappa, but again the four-column had the highest intra-observer kappa value. The proposed classification is a more inclusive classification for the posteromedial and posterolateral fractures. We suggest, therefore, that it be used in addition to one of the classic classifications in order to better understand the fracture pattern, as it allows more attention to be paid to the posterior columns, it improves the surgical planning and allows the surgical approach to be chosen more accurately.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pogson, Elise M.; Liverpool and Macarthur Cancer Therapy Centres, Liverpool; Ingham Institute for Applied Medical Research, Liverpool
2016-11-15
Purpose: To determine whether T2-weighted MRI improves seroma cavity (SC) and whole breast (WB) interobserver conformity for radiation therapy purposes, compared with the gold standard of CT, both in the prone and supine positions. Methods and Materials: Eleven observers (2 radiologists and 9 radiation oncologists) delineated SC and WB clinical target volumes (CTVs) on T2-weighted MRI and CT supine and prone scans (4 scans per patient) for 33 patient datasets. Individual observer's volumes were compared using the Dice similarity coefficient, volume overlap index, center of mass shift, and Hausdorff distances. An average cavity visualization score was also determined. Results: Imaging modalitymore » did not affect interobserver variation for WB CTVs. Prone WB CTVs were larger in volume and more conformal than supine CTVs (on both MRI and CT). Seroma cavity volumes were larger on CT than on MRI. Seroma cavity volumes proved to be comparable in interobserver conformity in both modalities (volume overlap index of 0.57 (95% Confidence Interval (CI) 0.54-0.60) for CT supine and 0.52 (95% CI 0.48-0.56) for MRI supine, 0.56 (95% CI 0.53-0.59) for CT prone and 0.55 (95% CI 0.51-0.59) for MRI prone); however, after registering modalities together the intermodality variation (Dice similarity coefficient of 0.41 (95% CI 0.36-0.46) for supine and 0.38 (0.34-0.42) for prone) was larger than the interobserver variability for SC, despite the location typically remaining constant. Conclusions: Magnetic resonance imaging interobserver variation was comparable to CT for the WB CTV and SC delineation, in both prone and supine positions. Although the cavity visualization score and interobserver concordance was not significantly higher for MRI than for CT, the SCs were smaller on MRI, potentially owing to clearer SC definition, especially on T2-weighted MR images.« less
Quinn, S F; Neubauer, N M; Sheley, R C; Demlow, T A; Szumowski, J
1996-01-01
MR imaging was used to evaluate the integrity of silicone breast implants in 54 women with 108 implants. MR images were interpreted by relatively inexperienced readers who tried to reproduce the experiences reported in the literature. The study examines the interobserver agreement using different diagnostic signs and the influence of experience on interpretation errors. Prospective and retrospective interpretations were compared with surgical findings at the time of explanation. Diagnostic indicators, including the linguine sign, the inverted tear drop sign, the C sign, water droplets mixed with silicone, and extracapsular globules of silicone, were evaluated for diagnostic efficacy and interobserver agreement. The prospective sensitivity and specificity were 87% and 78%, respectively. With the retrospective interpretations, the sensitivity and specificity increased to 93% and 92%, respectively. Most of the prospective false-positive interpretations were due to misinterpreting radial folds as signs of implant rupture. Six implants interpreted retrospectively as false positives had gross amounts of silicone around the implants at surgery but there were no obvious rents in the implant shells. There was fair to excellent interobserver agreement with the individual diagnostic signs except for extracapsular globules of silicone. All of the signs had specificities of greater than 90%. The sensitivities of the individual signs were less than the overall retrospective sensitivity. With experience, the sensitivity improved from 87% to 93% and the specificity improved from 78% to 92%. This study helps substantiate the use of diagnostic signs used by other authors to detect silicone loss from breast implants by MR imaging; however, questions remain as to the clinical role of MR imaging in evaluating implants for silicone loss.
An International Ki67 Reproducibility Study in Adrenal Cortical Carcinoma.
Papathomas, Thomas G; Pucci, Eugenio; Giordano, Thomas J; Lu, Hao; Duregon, Eleonora; Volante, Marco; Papotti, Mauro; Lloyd, Ricardo V; Tischler, Arthur S; van Nederveen, Francien H; Nose, Vania; Erickson, Lori; Mete, Ozgur; Asa, Sylvia L; Turchini, John; Gill, Anthony J; Matias-Guiu, Xavier; Skordilis, Kassiani; Stephenson, Timothy J; Tissier, Frédérique; Feelders, Richard A; Smid, Marcel; Nigg, Alex; Korpershoek, Esther; van der Spek, Peter J; Dinjens, Winand N M; Stubbs, Andrew P; de Krijger, Ronald R
2016-04-01
Despite the established role of Ki67 labeling index in prognostic stratification of adrenocortical carcinomas and its recent integration into treatment flow charts, the reproducibility of the assessment method has not been determined. The aim of this study was to investigate interobserver variability among endocrine pathologists using a web-based virtual microscopy approach. Ki67-stained slides of 76 adrenocortical carcinomas were analyzed independently by 14 observers, each according to their method of preference including eyeballing, formal manual counting, and digital image analysis. The interobserver variation was statistically significant (P<0.001) in the absence of any correlation between the various methods. Subsequently, 61 static images were distributed among 15 observers who were instructed to follow a category-based scoring approach. Low levels of interobserver (F=6.99; Fcrit=1.70; P<0.001) as well as intraobserver concordance (n=11; Cohen κ ranging from -0.057 to 0.361) were detected. To improve harmonization of Ki67 analysis, we tested the utility of an open-source Galaxy virtual machine application, namely Automated Selection of Hotspots, in 61 virtual slides. The software-provided Ki67 values were validated by digital image analysis in identical images, displaying a strong correlation of 0.96 (P<0.0001) and dividing the cases into 3 classes (cutoffs of 0%-15%-30% and/or 0%-10%-20%) with significantly different overall survivals (P<0.05). We conclude that current practices in Ki67 scoring assessment vary greatly, and interobserver variation sets particular limitations to its clinical utility, especially around clinically relevant cutoff values. Novel digital microscopy-enabled methods could provide critical aid in reducing variation, increasing reproducibility, and improving reliability in the clinical setting.
Values of a Patient and Observer Scar Assessment Scale to Evaluate the Facial Skin Graft Scar.
Chae, Jin Kyung; Kim, Jeong Hee; Kim, Eun Jung; Park, Kun
2016-10-01
The patient and observer scar assessment scale (POSAS) recently emerged as a promising method, reflecting both observer's and patient's opinions in evaluating scar. This tool was shown to be consistent and reliable in burn scar assessment, but it has not been tested in the setting of skin graft scar in skin cancer patients. To evaluate facial skin graft scar applied to POSAS and to compare with objective scar assessment tools. Twenty three patients, who diagnosed with facial cutaneous malignancy and transplanted skin after Mohs micrographic surgery, were recruited. Observer assessment was performed by three independent rates using the observer component of the POSAS and Vancouver scar scale (VSS). Patient self-assessment was performed using the patient component of the POSAS. To quantify scar color and scar thickness more objectively, spectrophotometer and ultrasonography was applied. Inter-observer reliability was substantial with both VSS and the observer component of the POSAS (average measure intraclass coefficient correlation, 0.76 and 0.80, respectively). The observer component consistently showed significant correlations with patients' ratings for the parameters of the POSAS (all p -values<0.05). The correlation between subjective assessment using POSAS and objective assessment using spectrophotometer and ultrasonography showed low relationship. In facial skin graft scar assessment in skin cancer patients, the POSAS showed acceptable inter-observer reliability. This tool was more comprehensive and had higher correlation with patient's opinion.
Bouwense, Stefan A; van Brunschot, Sandra; van Santvoort, Hjalmar C; Besselink, Marc G; Bollen, Thomas L; Bakker, Olaf J; Banks, Peter A; Boermeester, Marja A; Cappendijk, Vincent C; Carter, Ross; Charnley, Richard; van Eijck, Casper H; Freeny, Patrick C; Hermans, John J; Hough, David M; Johnson, Colin D; Laméris, Johan S; Lerch, Markus M; Mayerle, Julia; Mortele, Koenraad J; Sarr, Michael G; Stedman, Brian; Vege, Santhi Swaroop; Werner, Jens; Dijkgraaf, Marcel G; Gooszen, Hein G; Horvath, Karen D
2017-08-01
Severe acute pancreatitis is associated with peripancreatic morphologic changes as seen on imaging. Uniform communication regarding these morphologic findings is crucial for accurate diagnosis and treatment. For the original 1992 Atlanta classification, interobserver agreement is poor. We hypothesized that for the revised Atlanta classification, interobserver agreement will be better. An international, interobserver agreement study was performed among expert and nonexpert radiologists (n = 14), surgeons (n = 15), and gastroenterologists (n = 8). Representative computed tomographies of all stages of acute pancreatitis were selected from 55 patients and were assessed according to the revised Atlanta classification. The interobserver agreement was calculated among all reviewers and subgroups, that is, expert and nonexpert reviewers; interobserver agreement was defined as poor (≤0.20), fair (0.21-0.40), moderate (0.41-0.60), good (0.61-0.80), or very good (0.81-1.00). Interobserver agreement among all reviewers was good (0.75 [standard deviation, 0.21]) for describing the type of acute pancreatitis and good (0.62 [standard deviation, 0.19]) for the type of peripancreatic collection. Expert radiologists showed the best and nonexpert clinicians the lowest interobserver agreement. Interobserver agreement was good for the revised Atlanta classification, supporting the importance for widespread adaption of this revised classification for clinical and research communications.
Liu, Ying-Buh; Yang, Stephen S; Hsieh, Cheng-Hsing; Lin, Chia-Da; Chang, Shang-Jen
2014-05-01
To evaluate the inter-observer, intra-observer and intra-individual reliability of uroflowmetry and post-void residual urine (PVR) tests in adult men. Healthy volunteers aged over 40 years were enrolled. Every participant underwent two sets of uroflowmetry and PVR tests with a 2-week interval between the tests. The uroflowmetry tests were interpreted by four urologists independently. Uroflowmetry curves were classified as bell-shaped, bell-shaped with tail, obstructive, restrictive, staccato, interrupted and tower-shaped and scored from 1 (highly abnormal) to 5 (absolutely normal). The agreements between the observers, interpretations and tests within individuals were analyzed using kappa statistics and intraclass correlation coefficients. Generalizability theory with decision analysis was used to determine how many observers, tests, and interpretations were needed to obtain an acceptable reliability (> 0.80). Of 108 volunteers, we randomly selected the uroflowmetry results from 25 participants for the evaluation of reliability. The mean age of the studied adults was 55.3 years. The intra-individual and intra-observer reliability on uroflowmetry tests ranged from good to very good. However, the inter-observer reliability on normalcy and specific type of flow pattern were relatively lower. In generalizability theory, three observers were needed to obtain an acceptable reliability on normalcy of uroflow pattern if the patient underwent uroflowmetry tests twice with one observation. The intra-individual and intra-observer reliability on uroflowmetry tests were good while the inter-observer reliability was relatively lower. To improve inter-observer reliability, the definition of uroflowmetry should be clarified by the International Continence Society. © 2013 Wiley Publishing Asia Pty Ltd.
Bertal, Mileva; Vezzoni, Aldo; Houdellier, Blandine; Bogaerts, Evelien; Stock, Emmelie; Polis, Ingeborgh; Deforce, Dieter; Saunders, Jimmy H; Broeckx, Bart J G
2018-06-02
To describe and evaluate the accuracy, intra- and inter-observer variability of the laxity index (LI), used to quantify hip laxity on stress radiographs obtained with the Vezzoni-modified Badertscher distension device (VMBDD). Stress radiographs of 10 dogs obtained with the VMBDD were measured three times by an experienced observer. Six participants with different backgrounds (two ECVDI residents, two PhD students, two veterinary assistants) followed a short presentation and performed subsequently the measurements four times in two separate sessions. The effect of self-learning, feedback and specialization on the accuracy of the measurements was assessed. While the intra- and inter-observer variability were in agreement with other studies, the results of the experienced observer indicated that the variability can be very low. Neither feedback nor self-learning improved the results. A high degree of experience in radiographic assessment was not necessary to perform the measurements correctly. As the LI measurements were acceptable after a short presentation, they support the use of VMBDD for a complete and correct in-house evaluation of the hip joint by trained clinicians. However, we propose that, in the context of screening, measurements should be performed by a limited number of experienced examiners, to limit the impact of the inter-observer variability. Schattauer GmbH Stuttgart.
Sannmann, I; Burfeind, O; Suthar, V; Bos, A; Bruins, M; Heuwieser, W
2013-09-01
The objective of this study was to determine test characteristics (i.e., intra- and interobserver variability, intraassay variability, sensitivity, and specificity) of an evaluation of odor from vaginal discharge (VD) of cows in the first 10 d postpartum conducted by olfactory cognition and an electronic device, respectively. In experiment 1, 16 investigators (9 veterinary students and 7 licensed veterinarians) evaluated 5 VD samples each on 10 different days. The kappa test revealed an agreement between investigators (interobserver) of κ=0.43 with a Fleiss adjusted standard error of 0.0061. The overall agreement was the same for students (κ=0.28) and veterinarians (κ=0.28). Mean agreement within observers (intraobserver) was κ=0.52 for all observers, and 0.49 and 0.62 for students and veterinarians, respectively. In experiment 2, the repeatability of an electronic device (DiagNose; C-it, Zutphen, the Netherlands) was tested. Therefore, 5 samples of VD from 5 cows were evaluated 10 times each. The repeatability was 0.97, determined by Cronbach's α. In experiment 3, 20 samples collected from healthy cows and 20 of cows with acute puerperal metritis were evaluated by the 16 investigators and the DiagNose using a dichotomous scale (1=cow with acute puerperal metritis; 0=healthy cow). Sensitivity and specificity of olfactory evaluation was 75.0 and 60.1% compared with 92.0 and 100%, respectively, for the electronic nose device. The study revealed a considerable subjectivity of the human nose concerning the classification into healthy and sick animals based on the assessment of vaginal discharge. The repeatability of the electronic nose was higher. In conclusion, the DiagNose system, although imperfect, is a reasonable tool to improve odor assessment of VD. The current system, however, is not suitable as a screening tool in the field. Further research is warranted to adapt such electronic devices to practical on-farm screening tools. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Plain film measurement error in acute displaced midshaft clavicle fractures
Archer, Lori Anne; Hunt, Stephen; Squire, Daniel; Moores, Carl; Stone, Craig; O’Dea, Frank; Furey, Andrew
2016-01-01
Background Clavicle fractures are common and optimal treatment remains controversial. Recent literature suggests operative fixation of acute displaced mid-shaft clavicle fractures (DMCFs) shortened more than 2 cm improves outcomes. We aimed to identify correlation between plain film and computed tomography (CT) measurement of displacement and the inter- and intraobserver reliability of repeated radiographic measurements. Methods We obtained radiographs and CT scans of patients with acute DMCFs. Three orthopedic staff and 3 residents measured radiographic displacement at time zero and 2 weeks later. The CT measurements identified absolute shortening in 3 dimensions (by subtracting the length of the fractured from the intact clavicle). We then compared shortening measured on radiographs and shortening measured in 3 dimensions on CT. Interobserver and intraobserver reliability were calculated. Results We reviewed the fractures of 22 patients. Bland–Altman repeatability coefficient calculations indicated that radiograph and CT measurements of shortening could not be correlated owing to an unacceptable amount of measurement error (6 cm). Interobserver reliability for plain radiograph measurements was excellent (Cronbach α = 0.90). Likewise, intraobserver reliabilities for plain radiograph measurements as calculated with paired t tests indicated excellent correlation (p > 0.05 in all but 1 observer [p = 0.04]). Conclusion To establish shortening as an indication for DMCF fixation, reliable measurement tools are required. The low correlation between plain film and CT measurements we observed suggests further research is necessary to establish what imaging modality reliably predicts shortening. Our results indicate weak correlation between radiograph and CT measurement of acute DMCF shortening. PMID:27438054
NASA Astrophysics Data System (ADS)
Garcia de Leon Valenzuela, Maria Julia
This project explores the reliability of building a biological profile for an unknown individual based on three-dimensional (3D) images of the individual's skeleton. 3D imaging technology has been widely researched for medical and engineering applications, and it is increasingly being used as a tool for anthropological inquiry. While the question of whether a biological profile can be derived from 3D images of a skeleton with the same accuracy as achieved when using dry bones has been explored, bigger sample sizes, a standardized scanning protocol and more interobserver error data are needed before 3D methods can become widely and confidently used in forensic anthropology. 3D images of Computed Tomography (CT) scans were obtained from 130 innominate bones from Boston University's skeletal collection (School of Medicine). For each bone, both 3D images and original bones were assessed using the Phenice and Suchey-Brooks methods. Statistical analysis was used to determine the agreement between 3D image assessment versus traditional assessment. A pool of six individuals with varying experience in the field of forensic anthropology scored a subsample (n = 20) to explore interobserver error. While a high agreement was found for age and sex estimation for specimens scored by the author, the interobserver study shows that observers found it difficult to apply standard methods to 3D images. Higher levels of experience did not result in higher agreement between observers, as would be expected. Thus, a need for training in 3D visualization before applying anthropological methods to 3D bones is suggested. Future research should explore interobserver error using a larger sample size in order to test the hypothesis that training in 3D visualization will result in a higher agreement between scores. The need for the development of a standard scanning protocol focusing on the optimization of 3D image resolution is highlighted. Applications for this research include the possibility of digitizing skeletal collections in order to expand their use and for deriving skeletal collections from living populations and creating population-specific standards. Further research for the development of a standard scanning and processing protocol is needed before 3D methods in forensic anthropology are considered as reliable tools for generating biological profiles.
Ilahi, Omer A; Mansfield, David J; Urrea, Luis H; Qadeer, Ali A
2014-10-01
To assess interobserver and intraobserver agreement of estimating anterior cruciate ligament (ACL) femoral tunnel positioning arthroscopically using circular and linear (noncircular) estimation methods and to determine whether overlay template visual aids improve agreement. Standardized intraoperative pictures of femoral tunnel pilot holes (taken with a 30° arthroscope through an anterolateral portal at 90° of knee flexion with horizontal being parallel to the tibial surface) in 27 patients undergoing single-bundle ACL reconstruction were presented to 3 fellowship-trained arthroscopists on 2 separate occasions. On both viewings, each surgeon estimated the femoral tunnel pilot hole location to the nearest half-hour mark using a whole clock face and half clock face, to the nearest 15° using a whole compass and half compass, in the top or bottom half of a linear quadrant, and in the top or bottom half of a linear trisector. Evaluations were performed first without and then with an overlay template of each estimation method. The average difference among reviewers was quite similar for all 4 circular methods with the use of visual aids. Without overlay template visual aids, pair-wise κ statistic values for interobserver agreement ranged from -0.14 to 0.56 for the whole clock face and from 0.16 to 0.42 for the half clock face. With overlay visual guides, interobserver agreement ranged from 0.29 to 0.63 for the whole clock face and from 0.17 to 0.66 for the half clock face. The quadrant method's interobserver agreement ranged from 0.22 to 0.60, and that of the trisection method ranged from 0.17 to 0.57. Neither linear estimation method's reliability uniformly improved with the use of overlay templates. Intraobserver agreement without overlay templates ranged from 0.17 to 0.49 for the whole clock face, 0.11 to 0.47 for the half clock face, 0.01 to 0.66 for the quadrant method, and 0.20 to 0.57 for the trisection method. Use of overlay templates did not uniformly improve intraobserver agreement for any estimation method. There does not appear to be any advantage of using a half clock face or compass for estimating femoral tunnel position compared with a whole clock-face analogy. Visual reference aids appear to improve interobserver agreement (reliability) of circular analogies. The linear quadrant appears to be the most reliable method (fair to moderate agreement) for estimating femoral tunnel position without a visual aid for reference, but even better reliability, ranging from fair to good agreement, may be obtained by using the whole clock-face analogy with a visual aid. Increasing femoral tunnel position reliability may improve outcomes of ACL reconstruction surgery. Copyright © 2014 Arthroscopy Association of North America. Published by Elsevier Inc. All rights reserved.
Sjoding, Michael W; Hofer, Timothy P; Co, Ivan; Courey, Anthony; Cooke, Colin R; Iwashyna, Theodore J
2018-02-01
Failure to reliably diagnose ARDS may be a major driver of negative clinical trials and underrecognition and treatment in clinical practice. We sought to examine the interobserver reliability of the Berlin ARDS definition and examine strategies for improving the reliability of ARDS diagnosis. Two hundred five patients with hypoxic respiratory failure from four ICUs were reviewed independently by three clinicians, who evaluated whether patients had ARDS, the diagnostic confidence of the reviewers, whether patients met individual ARDS criteria, and the time when criteria were met. Interobserver reliability of an ARDS diagnosis was "moderate" (kappa = 0.50; 95% CI, 0.40-0.59). Sixty-seven percent of diagnostic disagreements between clinicians reviewing the same patient was explained by differences in how chest imaging studies were interpreted, with other ARDS criteria contributing less (identification of ARDS risk factor, 15%; cardiac edema/volume overload exclusion, 7%). Combining the independent reviews of three clinicians can increase reliability to "substantial" (kappa = 0.75; 95% CI, 0.68-0.80). When a clinician diagnosed ARDS with "high confidence," all other clinicians agreed with the diagnosis in 72% of reviews. There was close agreement between clinicians about the time when a patient met all ARDS criteria if ARDS developed within the first 48 hours of hospitalization (median difference, 5 hours). The reliability of the Berlin ARDS definition is moderate, driven primarily by differences in chest imaging interpretation. Combining independent reviews by multiple clinicians or improving methods to identify bilateral infiltrates on chest imaging are important strategies for improving the reliability of ARDS diagnosis. Copyright © 2017 American College of Chest Physicians. All rights reserved.
Redley, Bernice; Waugh, Rachael
2018-04-01
Nurse bedside handover quality is influenced by complex interactions related to the content, processes used and the work environment. Audit tools are seldom tested in 'real' settings. Examine the reliability, validity and usability of a quality improvement tool for audit of nurse bedside handover. Naturalistic, descriptive, mixed-methods. Six inpatient wards at a single large not-for-profit private health service in Victoria, Australia. Five nurse experts and 104 nurses involved in 199 change-of-shift bedside handovers. A focus group with experts and pilot test were used to examine content and face validity, and usability of the handover audit tool. The tool was examined for inter-rater reliability and usability using observation audits of handovers across six wards. Data were collected in 2013-2014. Two independent observers for 72 audits demonstrated acceptable inter-observer agreement for 27 (77%) items. Reliability was weak for items examining the handover environment. Seventeen items were not observed reflecting gaps in practices. Across 199 observation audits, gaps in nurse bedside handover practice most often related to process and environment, rather than content items. Usability was impacted by high observer burden, familiarity and non-specific illustrative behaviours. The reliability and validity of most items to audit handover content was acceptable. Gaps in practices for process and environment items were identified. Context specific exemplars and reducing the items used at each handover audit can enhance usability. Further research is needed to develop context specific exemplars and undertake additional reliability testing using a wide range of handover settings. CONTRIBUTION OF THE PAPER. Copyright © 2017 Elsevier Inc. All rights reserved.
Multimedia systems in ultrasound image boundary detection and measurements
NASA Astrophysics Data System (ADS)
Pathak, Sayan D.; Chalana, Vikram; Kim, Yongmin
1997-05-01
Ultrasound as a medical imaging modality offers the clinician a real-time of the anatomy of the internal organs/tissues, their movement, and flow noninvasively. One of the applications of ultrasound is to monitor fetal growth by measuring biparietal diameter (BPD) and head circumference (HC). We have been working on automatic detection of fetal head boundaries in ultrasound images. These detected boundaries are used to measure BPD and HC. The boundary detection algorithm is based on active contour models and takes 32 seconds on an external high-end workstation, SUN SparcStation 20/71. Our goal has been to make this tool available within an ultrasound machine and at the same time significantly improve its performance utilizing multimedia technology. With the advent of high- performance programmable digital signal processors (DSP), the software solution within an ultrasound machine instead of the traditional hardwired approach or requiring an external computer is now possible. We have integrated our boundary detection algorithm into a programmable ultrasound image processor (PUIP) that fits into a commercial ultrasound machine. The PUIP provides both the high computing power and flexibility needed to support computationally-intensive image processing algorithms within an ultrasound machine. According to our data analysis, BPD/HC measurements made on PUIP lie within the interobserver variability. Hence, the errors in the automated BPD/HC measurements using the algorithm are on the same order as the average interobserver differences. On PUIP, it takes 360 ms to measure the values of BPD/HC on one head image. When processing multiple head images in sequence, it takes 185 ms per image, thus enabling 5.4 BPD/HC measurements per second. Reduction in the overall execution time from 32 seconds to a fraction of a second and making this multimedia system available within an ultrasound machine will help this image processing algorithm and other computer-intensive imaging applications become a practical tool for the sonographers in the feature.
Özkan, Sezai; Mellema, Jos J.; Ring, David; Chen, Neal C.
2017-01-01
Background: To examine whether interobserver reliability, decision-making, and confidence in decision-making in the treatment of distal radius fractures changes if radiographs are viewed on a messenger application on a mobile phone compared to a standard DICOM viewer. Methods: Radiographs of distal radius fractures were presented to surgeons on either a smart phone using a mobile messenger application or a laptop using a DICOM viewer application. Twenty observers participated: 10 (50%) were randomly assigned to the DICOM viewer group and 10 (50%) to the mobile messenger group. Each observer was asked to evaluate the cases and (1) classify the fracture type according to the AO classification, (2) recommend operative or conservative treatment and (3) rate their confidence about this decision. Results: There was no significant difference in interobserver reliability for AO classification and recommendation for surgery for distal radius fractures in both groups. The percentage of recommendation for surgery was significantly higher in the messenger application group compared to the DICOM viewer group (89% versus 78%, P=0.019) and the confidence for treatment decision was significantly higher in the mobile messenger group compared to the DICOM viewer group (8.9 versus 7.9, P=0.026). Conclusion: Messenger applications on mobile phones could facilitate remote decision-making for patients with distal radius fractures, but should be used with caution. PMID:29226202
Values of a Patient and Observer Scar Assessment Scale to Evaluate the Facial Skin Graft Scar
Chae, Jin Kyung; Kim, Eun Jung; Park, Kun
2016-01-01
Background The patient and observer scar assessment scale (POSAS) recently emerged as a promising method, reflecting both observer's and patient's opinions in evaluating scar. This tool was shown to be consistent and reliable in burn scar assessment, but it has not been tested in the setting of skin graft scar in skin cancer patients. Objective To evaluate facial skin graft scar applied to POSAS and to compare with objective scar assessment tools. Methods Twenty three patients, who diagnosed with facial cutaneous malignancy and transplanted skin after Mohs micrographic surgery, were recruited. Observer assessment was performed by three independent rates using the observer component of the POSAS and Vancouver scar scale (VSS). Patient self-assessment was performed using the patient component of the POSAS. To quantify scar color and scar thickness more objectively, spectrophotometer and ultrasonography was applied. Results Inter-observer reliability was substantial with both VSS and the observer component of the POSAS (average measure intraclass coefficient correlation, 0.76 and 0.80, respectively). The observer component consistently showed significant correlations with patients' ratings for the parameters of the POSAS (all p-values<0.05). The correlation between subjective assessment using POSAS and objective assessment using spectrophotometer and ultrasonography showed low relationship. Conclusion In facial skin graft scar assessment in skin cancer patients, the POSAS showed acceptable inter-observer reliability. This tool was more comprehensive and had higher correlation with patient's opinion. PMID:27746642
Dysplastic naevus: histological criteria and their inter-observer reproducibility.
Hastrup, N; Clemmensen, O J; Spaun, E; Søndergaard, K
1994-06-01
Forty melanocytic lesions were examined in a pilot study, which was followed by a final series of 100 consecutive melanocytic lesions, in order to evaluate the inter-observer reproducibility of the histological criteria proposed for the dysplastic naevus. The specimens were examined in a blind fashion by four observers. Analysis by kappa statistics showed poor reproducibility of nuclear features, while reproducibility of architectural features was acceptable, improving in the final series. Consequently, we cannot apply the combined criteria of cytological and architectural features with any confidence in the diagnosis of dysplastic naevus, and, until further studies have documented that architectural criteria alone will suffice in the diagnosis of dysplastic naevus, we, as pathologists, shall avoid this term.
Koo, Henry; Leveridge, Mike; Thompson, Charles; Zdero, Rad; Bhandari, Mohit; Kreder, Hans J; Stephen, David; McKee, Michael D; Schemitsch, Emil H
2008-07-01
The purpose of this study was to measure interobserver reliability of 2 classification systems of pelvic ring fractures and to determine whether computed tomography (CT) improves reliability. The reliability of several radiographic findings was also tested. Thirty patients taken from a database at a Level I trauma facility were reviewed. For each patient, 3 radiographs (AP pelvis, inlet, and outlet) and CT scans were available. Six different reviewers (pelvic and acetabular specialist, orthopaedic traumatologist, or orthopaedic trainee) classified the injury according to Young-Burgess and Tile classification systems after reviewing plain radiographs and then after CT scans. The Kappa coefficient was used to determine interobserver reliability of these classification systems before and after CT scan. For plain radiographs, overall Kappa values for the Young-Burgess and Tile classification systems were 0.72 and 0.30, respectively. For CT scan and plain radiographs, the overall Kappa values for the Young-Burgess and Tile classification systems were 0.63 and 0.33, respectively. The pelvis/acetabular surgeons demonstrated the highest level of agreement using both classification systems. For individual questions, the addition of CT did significantly improve reviewer interpretation of fracture stability. The pre-CT and post-CT Kappa values for fracture stability were 0.59 and 0.93, respectively. The CT scan can improve the reliability of assessment of pelvic stability because of its ability to identify anatomical features of injury. The Young-Burgess system may be optimal for the learning surgeon. The Tile classification system is more beneficial for specialists in pelvic and acetabular surgery.
Marshall, Carrie; Mounzer, Rawad; Hall, Matt; Simon, Violette; Centeno, Barbara; Dennis, Katie; Dhillon, Jasreman; Fan, Fang; Khazai, Laila; Klapman, Jason; Komanduri, Srinadh; Lin, Xiaoqi; Lu, David; Mehrotra, Sanjana; Muthusamy, V Raman; Nayar, Ritu; Paintal, Ajit; Rao, Jianyu; Sams, Sharon; Shah, Janak; Watson, Rabindra; Rastogi, Amit; Wani, Sachin
2018-07-01
Despite the widespread use of endoscopic ultrasound-guided fine-needle aspiration (EUS-FNA) to sample pancreatic lesions and the standardization of pancreaticobiliary cytopathologic nomenclature, there are few data on inter-observer agreement among cytopathologists evaluating pancreatic cytologic specimens obtained by EUS-FNA. We developed a scoring system to assess agreement among cytopathologists in overall diagnosis and quantitative and qualitative parameters, and evaluated factors associated with agreement. We performed a prospective study to validate results from our pilot study that demonstrated moderate to substantial inter-observer agreement among cytopathologists for the final cytologic diagnosis. In the first phase, 3 cytopathologists refined criteria for assessment of quantity and quality measures. During phase 2, EUS-FNA specimens of solid pancreatic lesions from 46 patients were evaluated by 11 cytopathologists at 5 tertiary care centers using a standardized scoring tool. Individual quantitative and qualitative measures were scored and an overall cytologic diagnosis was determined. Clinical and EUS parameters were assessed as predictors of unanimous agreement. Inter-observer agreement (IOA) was calculated using multi-rater kappa (κ) statistics and a logistic regression model was created to identify factors associated with unanimous agreement. The IOA for final diagnoses, based on cytologic analysis, was moderate (κ = 0.56; 95% CI, 0.43-0.70). Kappa values did not increase when categories of suspicious for malignancy, malignant, and neoplasm were combined. IOA was slight to moderate for individual quantitative (κ = 0.007; 95% CI, -0.03 to -0.04) and qualitative parameters (κ = 0.5; 95% CI, 0.47-0.53). Jaundice was the only factor associated with agreement among all cytopathologists on multivariate analysis (odds ratio for unanimous agreement, 5.3; 95% CI, 1.1-26.89). There is a suboptimal level of agreement among cytopathologists in the diagnosis of malignancy based on analysis of EUS-FNA specimens obtained from solid pancreatic masses. Strategies are needed to refine the cytologic criteria for diagnosis of malignancy and enhance tissue acquisition techniques to improve diagnostic reproducibility among cytopathologists. Copyright © 2018 AGA Institute. Published by Elsevier Inc. All rights reserved.
Sabiani, Laura; Le Dû, Renaud; Loundou, Anderson; d'Ercole, Claude; Bretelle, Florence; Boubli, Léon; Carcopino, Xavier
2015-12-01
The objective of the study was to evaluate the intra- and interobserver agreement among obstetric experts in court regarding the retrospective review of abnormal fetal heart rate tracings and obstetrical management of patients with abnormal fetal heart rate during labor. A total of 22 French obstetric experts in court reviewed 30 cases of term deliveries of singleton pregnancies diagnosed with at least 1 hour of abnormal fetal heart rate, including 10 cases with adverse neonatal outcome. The experts reviewed all cases twice within a 3-month interval, with the first review being blinded to neonatal outcome. For each case reviewed, the experts were provided with the obstetric data and copies of the complete fetal heart rate recording and the partogram. The experts were asked to classify the abnormal fetal heart rate tracing and to express whether they agreed with the obstetrical management performed. When they disagreed, the experts were asked whether they concluded that an error had been made and whether they considered the obstetrical management as the cause of cerebral palsy in children if any. Compared with blinded review, the experts were significantly more likely to agree with the obstetric management performed (P < .001) and with the mode of delivery (P < .001) when informed about the neonatal outcome and were less likely to conclude that an error had been made (P < .001) or to establish a link with potential cerebral palsy (P = .003). The experts' intraobserver agreement for the review of abnormal fetal heart rate tracing and obstetrical management were both mediocre (kappa = 0.46-0.51 and kappa = 0.48-0.53, respectively). The interobserver agreement for the review of abnormal fetal heart rate tracing was low and was not improved by knowledge of the neonatal outcome (kappa = 0.11-0.18). The interobserver agreement for the interpretation of obstetrical management was also low (kappa = 0.08-0.19) but appeared to be improved by knowledge of the neonatal outcome (kappa = 0.15-0.32). The intra- and interobserver agreement among obstetric experts in court for the review of abnormal fetal heart rate tracing and the appropriateness of obstetrical care is poor, suggesting a lack of objectivity of obstetrical expertise as currently performed in court. Copyright © 2015 Elsevier Inc. All rights reserved.
Cotruta, Bogdan; Gheorghe, Cristian; Iacob, Razvan; Dumbrava, Mona; Radu, Cristina; Bancila, Ion; Becheanu, Gabriel
2017-12-01
Evaluation of severity and extension of gastric atrophy and intestinal metaplasia is recommended to identify subjects with a high risk for gastric cancer. The inter-observer agreement for the assessment of gastric atrophy is reported to be low. The aim of the study was to evaluate the inter-observer agreement for the assessment of severity and extension of gastric atrophy using oriented and unoriented gastric biopsy samples. Furthermore, the quality of biopsy specimens in oriented and unoriented samples was analyzed. A total of 35 subjects with dyspeptic symptoms addressed for gastrointestinal endoscopy that agreed to enter the study were prospectively enrolled. The OLGA/OLGIM gastric biopsies protocol was used. From each subject two sets of biopsies were obtained (four from the antrum, two oriented and two unoriented, two from the gastric incisure, one oriented and one unoriented, four from the gastric body, two oriented and two unoriented). The orientation of the biopsy samples was completed using nitrocellulose filters (Endokit®, BioOptica, Milan, Italy). The samples were blindly examined by two experienced pathologists. Inter-observer agreement was evaluated using kappa statistic for inter-rater agreement. The quality of histopathology specimens taking into account the identification of lamina propria was analyzed in oriented vs. unoriented samples. The samples with detectable lamina propria mucosae were defined as good quality specimens. Categorical data was analyzed using chi-square test and a two-sided p value <0.05 was considered statistically significant. A total of 350 biopsy samples were analyzed (175 oriented / 175 unoriented). The kappa index values for oriented/unoriented OLGA 0/I/II/III and IV stages have been 0.62/0.13, 0.70/0.20, 0.61/0.06, 0.62/0.46, and 0.77/0.50, respectively. For OLGIM 0/I/II/III stages the kappa index values for oriented/unoriented samples were 0.83/0.83, 0.88/0.89, 0.70/0.88 and 0.83/1, respectively. No case of OLGIM IV stage was found in the present case series. Good quality histopathology specimens were described in 95.43% of the oriented biopsy samples, and in 89.14% of the unoriented biopsy samples, respectively (p=0.0275). The orientation of gastric biopsies specimens improves the inter-observer agreement for the assessment of gastric atrophy.
Veta, Mitko; van Diest, Paul J.; Jiwa, Mehdi; Al-Janabi, Shaimaa; Pluim, Josien P. W.
2016-01-01
Background Tumor proliferation speed, most commonly assessed by counting of mitotic figures in histological slide preparations, is an important biomarker for breast cancer. Although mitosis counting is routinely performed by pathologists, it is a tedious and subjective task with poor reproducibility, particularly among non-experts. Inter- and intraobserver reproducibility of mitosis counting can be improved when a strict protocol is defined and followed. Previous studies have examined only the agreement in terms of the mitotic count or the mitotic activity score. Studies of the observer agreement at the level of individual objects, which can provide more insight into the procedure, have not been performed thus far. Methods The development of automatic mitosis detection methods has received large interest in recent years. Automatic image analysis is viewed as a solution for the problem of subjectivity of mitosis counting by pathologists. In this paper we describe the results from an interobserver agreement study between three human observers and an automatic method, and make two unique contributions. For the first time, we present an analysis of the object-level interobserver agreement on mitosis counting. Furthermore, we train an automatic mitosis detection method that is robust with respect to staining appearance variability and compare it with the performance of expert observers on an “external” dataset, i.e. on histopathology images that originate from pathology labs other than the pathology lab that provided the training data for the automatic method. Results The object-level interobserver study revealed that pathologists often do not agree on individual objects, even if this is not reflected in the mitotic count. The disagreement is larger for objects from smaller size, which suggests that adding a size constraint in the mitosis counting protocol can improve reproducibility. The automatic mitosis detection method can perform mitosis counting in an unbiased way, with substantial agreement with human experts. PMID:27529701
Veta, Mitko; van Diest, Paul J; Jiwa, Mehdi; Al-Janabi, Shaimaa; Pluim, Josien P W
2016-01-01
Tumor proliferation speed, most commonly assessed by counting of mitotic figures in histological slide preparations, is an important biomarker for breast cancer. Although mitosis counting is routinely performed by pathologists, it is a tedious and subjective task with poor reproducibility, particularly among non-experts. Inter- and intraobserver reproducibility of mitosis counting can be improved when a strict protocol is defined and followed. Previous studies have examined only the agreement in terms of the mitotic count or the mitotic activity score. Studies of the observer agreement at the level of individual objects, which can provide more insight into the procedure, have not been performed thus far. The development of automatic mitosis detection methods has received large interest in recent years. Automatic image analysis is viewed as a solution for the problem of subjectivity of mitosis counting by pathologists. In this paper we describe the results from an interobserver agreement study between three human observers and an automatic method, and make two unique contributions. For the first time, we present an analysis of the object-level interobserver agreement on mitosis counting. Furthermore, we train an automatic mitosis detection method that is robust with respect to staining appearance variability and compare it with the performance of expert observers on an "external" dataset, i.e. on histopathology images that originate from pathology labs other than the pathology lab that provided the training data for the automatic method. The object-level interobserver study revealed that pathologists often do not agree on individual objects, even if this is not reflected in the mitotic count. The disagreement is larger for objects from smaller size, which suggests that adding a size constraint in the mitosis counting protocol can improve reproducibility. The automatic mitosis detection method can perform mitosis counting in an unbiased way, with substantial agreement with human experts.
The Music Attentiveness Screening Assessment, Revised (MASA-R): A Study of Technical Adequacy.
Waldon, Eric G; Lesser, Alexander; Weeden, Lydia; Messick, Emily
2016-01-01
Evidence suggests that attention is an important consideration when designing procedural support interventions for children undergoing distressing medical procedures. As such, the extent to which children can attend to musical stimuli used during music-based procedural support interventions would seem important. The Music Attentiveness Screening Assessment (MASA) was designed to assess a child's ability to attend to musical stimuli, but further revisions were deemed necessary to improve administration, test-retest reliability, and interobserver agreement for the measure's items. This study investigated the technical adequacy of the Music Attentiveness Screening Assessment, Revised (MASA-R), with a non-clinical sample of children aged 4 to 9 years by examining (a) Construct validity using comparator instruments measuring auditory attention; (b) Test-retest reliability following a two-week delay; and (c) Interobserver agreement when administered by two independent examiners. This non-clinical sample included 69 children who were administered both items from MASA-R and two comparator instruments: the Auditory Attention subtest from the NEPSY-II (NII-AA) for children aged 5 to 9 years (n = 47); and the Auditory Attention subtest from the Woodcock-Johnson Tests of Cognitive Abilities, 3rd ed. (WJIII-AA), for children aged 4 years (n = 22). A significant proportion of score variance was shared by both MASA-R items and the comparator measures: R (2) = .16, F(2, 66) = 6.30, p = .003. MASA-R score estimates with regard to test-retest reliability (Item I, intra-class correlation [ICC] = .88; Item II, ICC = .91) and interobserver agreement (Item I, ICC = .99; Item II, ICC = .98) also fell into acceptable ranges. Estimates of MASA-R score construct validity, test-retest reliability, and interobserver agreement appear improved over its predecessor, MASA. While findings are promising, additional investigation of its use with a clinical sample is needed before it can be confidently used in pediatrics. © the American Music Therapy Association 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Sakane, Makoto; Hori, Masatoshi; Onishi, Hiromitsu; Tsuboyama, Takahiro; Ota, Takashi; Tatsumi, Mitsuaki; Ueda, Yutaka; Kimura, Toshihiro; Kimura, Tadashi; Tomiyama, Noriyuki
The aim of this study was to evaluate the diagnostic ability of magnetic resonance imaging (MRI) in premenopausal women with G1 endometrial carcinoma. Twenty-six patients underwent T2W, diffusion weighted, and dynamic contrast-enhanced 3-T MRI. The degree of myometrial invasion was pathologically classified into no invasion, shallow (3 mm or less), and more. Two radiologists assessed myometrial invasion on MRI. Diagnostic accuracy, sensitivity, specificity, positive and negative predictive values, AUC, and interobserver agreement were analyzed. For assessing myometrial invasion, mean accuracy, sensitivity, specificity, positive predictive values, negative predictive values, and AUC, respectively, were as follows: 63%, 42%, 85%, 79%, 47%, and 0.75. Mean interobserver agreement was fair (k = 0.36). Shallow invasions were underestimated as no invasion on MRI in all 6 cases. Magnetic resonance imaging produced false-negative result on half of patients. The misjudgments tended to happen in patients with shallow invasion.
A practical assessment of physician biopsychosocial performance.
Margalit, Alon Pa; Glick, Shimon M; Benbassat, Jochanan; Cohen, Ayala; Margolis, Carmi Z
2007-10-01
A biopsychosocial approach to care seems to improve patient satisfaction and health outcomes. Nevertheless, this approach is not widely practiced, possibly because its precepts have not been translated into observable skills. To identify the skill components of a biopsychosocial consultation and develop an tool for their evaluation. We approached three e-mail discussion groups of family physicians and pooled their responses to the question "what types of observed physician behavior would characterize a biopsychosocial consultation?" We received 35 responses describing 37 types of behavior, all of which seemed to cluster around one of three aspects: patient-centered interview; system-centered and family-centered approach to care; or problem-solving orientation. Using these categories, we developed a nine-item evaluation tool. We used the evaluation tool to score videotaped encounters of patients with two types of doctors: family physicians who were identified by peer ratings to have a highly biopsychosocial orientation (n = 9) or a highly biomedical approach (n = 4); and 44 general practitioners, before and after they had participated in a program that taught a biopsychosocial approach to care. The evaluation tool was found to demonstrate high reliability (alpha = 0.90) and acceptable interobserver variability. The average scores of the physicians with a highly biopsychosocial orientation were significantly higher than those of physicians with a highly biomedical approach. There were significant differences between the scores of the teaching-program participants before and after the program. A biopsychosocial approach to patient care can be characterized using a valid and easy-to-apply evaluation tool.
A Tool for Measuring Active Learning in the Classroom
Devlin, John W.; Kirwin, Jennifer L.; Qualters, Donna M.
2007-01-01
Objectives To develop a valid and reliable active-learning inventory tool for use in large classrooms and compare faculty perceptions of active-learning using the Active-Learning Inventory Tool. Methods The Active-Learning Inventory Tool was developed using published literature and validated by national experts in educational research. Reliability was established by trained faculty members who used the Active-Learning Inventory Tool to observe 9 pharmacy lectures. Instructors were then interviewed to elicit perceptions regarding active learning and asked to share their perceptions. Results Per lecture, 13 (range: 4-34) episodes of active learning encompassing 3 (range: 2-5) different types of active learning occurred over 2.2 minutes (0.6-16) per episode. Both interobserver (≥87%) and observer-instructor agreement (≥68%) were high for these outcomes. Conclusions The Active-Learning Inventory Tool is a valid and reliable tool to measure active learning in the classroom. Future studies are needed to determine the impact of the Active-Learning Inventory Tool on teaching and its usefulness in other disciplines. PMID:17998982
Rammeh, Soumaya; Khadra, Hajer Ben; Znaidi, Nadia Sabbegh; Romdhane, Neila Attia; Najjar, Taoufik; Bouzaidi, Slim; Zermani, Rachida
2014-01-01
Many classification systems are currently used for histological evaluation of the severity of chronic viral hepatitis, including the Ishak and Metavir scores, but there is not a consensus classification. The objective of this work was to study the intra and inter-observers agreement of these two scores in the histopathological analysis of liver biopsies in patients with chronic viral hepatitis B or C. Fifty nine patients were included in the study, 26 had chronic hepatitis C and 33 had chronic hepatitis B. To investigate the inter-observers agreement, the liver biopsies were analyzed separately by two pathologists without prior consensus reading. The two pathologists conducted then a consensual reading before reviewing all cases independently. Cohen's kappa coefficient was calculated and in case of asymmetry Spearman's rho coefficient. Before the consensus reading, the agreement was moderate for the analysis of histological activity with both scores (Metavir: kappa=0.41, Ishak: rho=0.58). For the analysis of fibrosis, the agreement was good with both scores (Metavir: kappa=0.61, Ishak: rho=0.86). The consensus reading has improved the reproducibility of the activity that has become good with both scores (Metavir: kappa=0.77, Ishak: rho=0.76). For fibrosis improvement was observed with the Ishak score which agreement became excellent (kappa=0.81). In conclusion, we recommend in routine practice, a combined score: Metavir for activity and Ishak for fibrosis and to make a double reading for each biopsy.
Mosmuller, David; Tan, Robin; Mulder, Frans; Bachour, Yara; de Vet, Henrica; Don Griot, Peter
2016-10-01
It is essential to have a reliable assessment method in order to compare the results of cleft lip and palate surgery. In this study the computer-based program SymNose, a method for quantitative assessment of the nose and lip, will be assessed on usability and reliability. The symmetry of the nose and lip was measured twice in 50 six-year-old complete and incomplete unilateral cleft lip and palate patients by four observers. For the frontal view the asymmetry level of the nose and upper lip were evaluated and for the basal view the asymmetry level of the nose and nostrils were evaluated. A mean inter-observer reliability when tracing each image once or twice was 0.70 and 0.75, respectively. Tracing the photographs with 2 observers and 4 observers gave a mean inter-observer score of 0.86 and 0.92, respectively. The mean intra-observer reliability varied between 0.80 and 0.84. SymNose is a practical and reliable tool for the retrospective assessment of large caseloads of 2D photographs of cleft patients for research purposes. Moderate to high single inter-observer reliability was found. For future research with SymNose reliable outcomes can be achieved by using the average outcomes of single tracings of two observers. Copyright © 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Sánchez-Sánchez, M M; Sánchez-Izquierdo, R; Sánchez-Muñoz, E I; Martínez-Yegles, I; Fraile-Gamo, M P; Arias-Rivera, S
2014-01-01
The Glasgow coma scale (GCS) is a common tool used for neurological assessment of critically ill patients. Despite its widespread use, the GCS has some limitations, as sometimes different observers may value differently the same response. To evaluate the interobserver agreement, among intensive care nurses with a minimum of 3 years experience, both in the overall estimate of GCS and for each of its components. Prospective observational study including 110 neurological and/or neurosurgical patients conducted in a critical care unit of 18 beds, from October 2010 until December 2012. Registered variables: Demographic characteristics, reason for admission, overall GCS and its components. The neurological evaluation was conducted by a minimum of 3 nurses. One of them applied an algorithm and consensual assessment technique and all, independently, valued response to stimuli. Interobserver agreement was measured using the intraclass correlation coefficient (ICC) for a confidence interval (CI) of 95%. The study was approved by the Ethics Committee for Clinical Trails. The intraclass correlation coefficient (confident interval) for scale was: Overall GCS: 0.989 (0.985-0.992); ocular response: 0.981 (0.974-0.986); verbal response: 0.971 (0.960-0.979); motor response: 0.987 (0.982-0.991). In our cohort of patients we observed a high level of consistency in the application of both the GCS as in each of its components. Copyright © 2013 Elsevier España, S.L. y SEEIUC. All rights reserved.
Wielenga, J M; De Vos, R; de Leeuw, R; De Haan, R J
2004-01-01
Assessment of clinimetric properties and diagnostic quality of a stress measurement scale (COMFORT scale). Sample of an open population. Neonatology department (Neonatal Intensive Care Unit), Academic Medical Centre/Emma Children's Hospital, Amsterdam, The Netherlands. One clinical expert and 9 observers observed ventilated premature born babies simultaneously. Criterion validity was assessed by correlating the COMFORT scale with the clinical judgment regarding the amount of stress. Interobserver reliability was assessed on the clinical judgment as well as on the COMFORT scale. Diagnostic qualities were evaluated with a ROC curve. On 19 ventilated prematurely born babies (mean gestational age 30 weeks, mean birth weight 1385 gm), one clinical expert and 9 observers made 30 paired observations. The criterion validity of the COMFORT scale was good (Pearson's r of 0.84). The interobserver reliability of the clinical judgment was very good (weighted Kappa 0.84). The interobserver reliability of each item varied from good to almost perfect (weighted Kappa of 0.64 for muscle tone to 1.00 on heart rate). The reliability of the total COMFORT scale score was satisfying (intra-class correlation coefficient of 0.94). The diagnostic quality of the COMFORT scale was excellent, at a cut-off point of 20 the sensitivity was 100 percent, the specificity was 77 percent, and the area under the curve (AUC) of 0.95. In this first evaluation, the COMFORT scale appears to be a valid and reliable measurement tool to assess the stress of ventilated prematurely born babies.
Nagata, Yasufumi; Kado, Yuichiro; Onoue, Takeshi; Otani, Kyoko; Nakazono, Akemi; Otsuji, Yutaka; Takeuchi, Masaaki
2018-01-01
Background Left ventricular ejection fraction (LVEF) and global longitudinal strain (GLS) play important roles in diagnosis and management of cardiac diseases. However, the issue of the accuracy and reliability of LVEF and GLS remains to be solved. Image quality is one of the most important factors affecting measurement variability. The aim of this study was to investigate whether improved image quality could reduce observer variability. Methods Two sets of three apical images were acquired using relatively old- and new-generation ultrasound imaging systems (Vivid 7 and Vivid E95) in 308 subjects. Image quality was assessed by endocardial border delineation index (EBDI) using a 3-point scoring system. Three observers measured the LVEF and GLS, and these values and inter-observer variability were investigated. Results Image quality was significantly better with Vivid E95 (EBDI: 26.8 ± 5.9) than that with Vivid 7 (22.8 ± 6.3, P < 0.0001). Regarding the inter-observer variability of LVEF, the r-value, bias, 95% limit of agreement and intra-class correlation coefficient for Vivid 7 were comparable to those for Vivid E95. The % variabilities were significantly lower for Vivid E95 (5.3–6.5%) than those for Vivid 7 (6.5–7.5%). Regarding GLS, all observer variability parameters were better for Vivid E95 than for Vivid 7. Improvements in image quality yielded benefits to both LVEF and GLS measurement reliability. Multivariate analysis showed that image quality was indeed an important factor of observer variability in the measurement of LVEF and GLS. Conclusions The new-generation ultrasound imaging system offers improved image quality and reduces inter-observer variability in the measurement of LVEF and GLS. PMID:29432198
Maynard, Gregory A; Morris, Timothy A; Jenkins, Ian H; Stone, Sarah; Lee, Joshua; Renvall, Marian; Fink, Ed; Schoenhaus, Robert
2010-01-01
Hospital-acquired (HA) venous thromboembolism (VTE) is a common source of morbidity/mortality. Prophylactic measures are underutilized. Available risk assessment models/protocols are not prospectively validated. Improve VTE prophylaxis, reduce HA VTE, and prospectively validate a VTE risk-assessment model. Observational design. Academic medical center. Adult inpatients on medical/surgical services. A simple VTE risk assessment linked to a menu of preferred VTE prophylaxis methods, embedded in order sets. Education, audit/feedback, and concurrent identification of nonadherence. Randomly sampled inpatient audits determined the percent of patients with "adequate" VTE prevention. HA VTE cases were identified concurrently via digital imaging system. Interobserver agreement for VTE risk level and judgment of adequate prophylaxis were calculated from 150 random audits. Interobserver agreement with 5 observers was high (kappa score for VTE risk level = 0.81, and for judgment of "adequate" prophylaxis = 0.90). The percent of patients on adequate prophylaxis improved each of the 3 years (58%, 78%, and 93%; P < 0.001) and reached 98% in the last 6 months of 2007; 361 cases of HA VTE occurred over 3 years. Significant reductions for the risk of HA VTE (risk ratio [RR] = 0.69; 95% confidence interval [CI] = 0.47-0.79) and preventable HA VTE (RR = 0.14; 95% CI = 0.06-0.31) occurred. We detected no increase in heparin-induced thrombocytopenia (HIT) or prophylaxis-related bleeding using administrative data/chart review. We prospectively validated a VTE risk-assessment/prevention protocol by demonstrating ease of use, good interobserver agreement, and effectiveness. Improved VTE prophylaxis resulted in a substantial reduction in HA VTE. (c) 2010 Society of Hospital Medicine.
Nagata, Yasufumi; Kado, Yuichiro; Onoue, Takeshi; Otani, Kyoko; Nakazono, Akemi; Otsuji, Yutaka; Takeuchi, Masaaki
2018-03-01
Left ventricular ejection fraction (LVEF) and global longitudinal strain (GLS) play important roles in diagnosis and management of cardiac diseases. However, the issue of the accuracy and reliability of LVEF and GLS remains to be solved. Image quality is one of the most important factors affecting measurement variability. The aim of this study was to investigate whether improved image quality could reduce observer variability. Two sets of three apical images were acquired using relatively old- and new-generation ultrasound imaging systems (Vivid 7 and Vivid E95) in 308 subjects. Image quality was assessed by endocardial border delineation index (EBDI) using a 3-point scoring system. Three observers measured the LVEF and GLS, and these values and inter-observer variability were investigated. Image quality was significantly better with Vivid E95 (EBDI: 26.8 ± 5.9) than that with Vivid 7 (22.8 ± 6.3, P < 0.0001). Regarding the inter-observer variability of LVEF, the r -value, bias, 95% limit of agreement and intra-class correlation coefficient for Vivid 7 were comparable to those for Vivid E95. The % variabilities were significantly lower for Vivid E95 (5.3-6.5%) than those for Vivid 7 (6.5-7.5%). Regarding GLS, all observer variability parameters were better for Vivid E95 than for Vivid 7. Improvements in image quality yielded benefits to both LVEF and GLS measurement reliability. Multivariate analysis showed that image quality was indeed an important factor of observer variability in the measurement of LVEF and GLS. The new-generation ultrasound imaging system offers improved image quality and reduces inter-observer variability in the measurement of LVEF and GLS. © 2018 The authors.
Fetter, Renata Lemos; Bigogno, Fernanda Guedes; de Oliveira, Fernanda Galvão Pasculli; Avesani, Carla Maria
2014-01-01
The 7 point subjective global assessment (7p-SGA) and the malnutrition inflammation score (MIS) are tools commonly applied for the assessment of nutritional status in dialyzed patients. Both were developed in English and require translation to Portuguese to be applied in Brazil. The cross-cultural equivalence process ensures semantic and measurement equivalence of a translated tool. To perform the cross-cultural adaptation to Portuguese of the 7p-SGA and MIS. Semantic equivalence was performed by the back-translation method and by assessing the degree of similarity between the original instrument and that back-translated from Portuguese to English (Back-translation). The assessment of the equivalence measurement was made by evaluating the intern reliability (Cronbach's α) and interobserver reliability (two observers). One-hundred and one elderly patients on hemodialysis (HD) were included. Both instruments showed a high degree of semantic similarity with results close to the maximum value (7p-SGA 96.8 ± 7.8 and MIS 99.6 ± 1.4). The intern consistency showed a Cronbach's α value for 7p-SGA of 0.72 and of 0.53 for MIS. The interobserver reproducibility of 7p-SGA was moderate (intraclass coefficient [ICC] = 0.74 [95% CI: 0.58; 0.84]), while for MIS was strong (ICC = 0.88 [95% CI: 0.81; 0.93]). The 7p-SGA and MIS translated into Portuguese can be applied for assessing the nutritional status of elderly patients on HD. Studies testing the applicability of these instruments in adult patients on HD and in peritoneal dialysis should yet be performed.
Fractal analysis for assessing tumour grade in microscopic images of breast tissue
NASA Astrophysics Data System (ADS)
Tambasco, Mauro; Costello, Meghan; Newcomb, Chris; Magliocco, Anthony M.
2007-03-01
In 2006, breast cancer is expected to continue as the leading form of cancer diagnosed in women, and the second leading cause of cancer mortality in this group. A method that has proven useful for guiding the choice of treatment strategy is the assessment of histological tumor grade. The grading is based upon the mitosis count, nuclear pleomorphism, and tubular formation, and is known to be subject to inter-observer variability. Since cancer grade is one of the most significant predictors of prognosis, errors in grading can affect patient management and outcome. Hence, there is a need to develop a breast cancer-grading tool that is minimally operator dependent to reduce variability associated with the current grading system, and thereby reduce uncertainty that may impact patient outcome. In this work, we explored the potential of a computer-based approach using fractal analysis as a quantitative measure of cancer grade for breast specimens. More specifically, we developed and optimized computational tools to compute the fractal dimension of low- versus high-grade breast sections and found them to be significantly different, 1.3+/-0.10 versus 1.49+/-0.10, respectively (Kolmogorov-Smirnov test, p<0.001). These results indicate that fractal dimension (a measure of morphologic complexity) may be a useful tool for demarcating low- versus high-grade cancer specimens, and has potential as an objective measure of breast cancer grade. Such prognostic value could provide more sensitive and specific information that would reduce inter-observer variability by aiding the pathologist in grading cancers.
McAteer, J; Stone, S; Fuller, C; Charlett, A; Cookson, B; Slade, R; Michie, S
2008-03-01
Previous observational measures of healthcare worker (HCW) hand-hygiene behaviour (HHB) fail to provide adequate standard operating procedures (SOPs), accounts of inter-rater agreement testing or evidence of sensitivity to change. This study reports the development of an observational tool in a way that addresses these deficiencies. Observational categories were developed systematically, guided by a clinical guideline, previous measures and pilot hand-hygiene behaviour observations (HHOs). The measure, a simpler version of the Geneva tool, consists of HHOs (before and after low-risk, high-risk or unobserved contact), HHBs (soap, alcohol hand rub, no action, unknown), and type of HCW. Inter-observer agreement for each category was assessed by observation of 298 HHOs and HHBs by two independent observers on acute elderly and intensive care units. Raw agreement (%) and Kappa were 77% and 0.68 for HHB; 83% and 0.77 for HHO; and 90% and 0.77 for HCW. Inter-observer agreement for overall compliance of a group of HCWs was assessed by observation of 1191 HHOs and HHBs by two pairs of independent observers. Overall agreement was good (intraclass correlation coefficient = 0.79). Sensitivity to change was examined by autoregressive time-series modelling of longitudinal observations for 8 months on the intensive therapy unit during an Acinetobacter baumannii outbreak and subsequent strengthening of infection control measures. Sensitivity to change was demonstrated by a rise in compliance from 80 to 98% with an odds ratio of increased compliance of 7.00 (95% confidence interval: 4.02-12.2) P < 0.001.
[Real time 3D echocardiography
NASA Technical Reports Server (NTRS)
Bauer, F.; Shiota, T.; Thomas, J. D.
2001-01-01
Three-dimensional representation of the heart is an old concern. Usually, 3D reconstruction of the cardiac mass is made by successive acquisition of 2D sections, the spatial localisation and orientation of which require complex guiding systems. More recently, the concept of volumetric acquisition has been introduced. A matricial emitter-receiver probe complex with parallel data processing provides instantaneous of a pyramidal 64 degrees x 64 degrees volume. The image is restituted in real time and is composed of 3 planes (planes B and C) which can be displaced in all spatial directions at any time during acquisition. The flexibility of this system of acquisition allows volume and mass measurement with greater accuracy and reproducibility, limiting inter-observer variability. Free navigation of the planes of investigation allows reconstruction for qualitative and quantitative analysis of valvular heart disease and other pathologies. Although real time 3D echocardiography is ready for clinical usage, some improvements are still necessary to improve its conviviality. Then real time 3D echocardiography could be the essential tool for understanding, diagnosis and management of patients.
A Student Assessment Tool for Standardized Patient Simulations (SAT-SPS): Psychometric analysis.
Castro-Yuste, Cristina; García-Cabanillas, María José; Rodríguez-Cornejo, María Jesús; Carnicer-Fuentes, Concepción; Paloma-Castro, Olga; Moreno-Corral, Luis Javier
2018-05-01
The evaluation of the level of clinical competence acquired by the student is a complex process that must meet various requirements to ensure its quality. The psychometric analysis of the data collected by the assessment tools used is a fundamental aspect to guarantee the student's competence level. To conduct a psychometric analysis of an instrument which assesses clinical competence in nursing students at simulation stations with standardized patients in OSCE-format tests. The construct of clinical competence was operationalized as a set of observable and measurable behaviors, measured by the newly-created Student Assessment Tool for Standardized Patient Simulations (SAT-SPS), which was comprised of 27 items. The categories assigned to the items were 'incorrect or not performed' (0), 'acceptable' (1), and 'correct' (2). 499 nursing students. Data were collected by two independent observers during the assessment of the students' performance at a four-station OSCE with standardized patients. Descriptive statistics were used to summarize the variables. The difficulty levels and floor and ceiling effects were determined for each item. Reliability was analyzed using internal consistency and inter-observer reliability. The validity analysis was performed considering face validity, content and construct validity (through exploratory factor analysis), and criterion validity. Internal reliability and inter-observer reliability were higher than 0.80. The construct validity analysis suggested a three-factor model accounting for 37.1% of the variance. These three factors were named 'Nursing process', 'Communication skills', and 'Safe practice'. A significant correlation was found between the scores obtained and the students' grades in general, as well as with the grades obtained in subjects with clinical content. The assessment tool has proven to be sufficiently reliable and valid for the assessment of the clinical competence of nursing students using standardized patients. This tool has three main components: the nursing process, communication skills, and safety management. Copyright © 2018 Elsevier Ltd. All rights reserved.
Novel Tool for Complete Digitization of Paper Electrocardiography Data.
Ravichandran, Lakshminarayan; Harless, Chris; Shah, Amit J; Wick, Carson A; Mcclellan, James H; Tridandapani, Srini
We present a Matlab-based tool to convert electrocardiography (ECG) information from paper charts into digital ECG signals. The tool can be used for long-term retrospective studies of cardiac patients to study the evolving features with prognostic value. To perform the conversion, we: 1) detect the graphical grid on ECG charts using grayscale thresholding; 2) digitize the ECG signal based on its contour using a column-wise pixel scan; and 3) use template-based optical character recognition to extract patient demographic information from the paper ECG in order to interface the data with the patients' medical record. To validate the digitization technique: 1) correlation between the digital signals and signals digitized from paper ECG are performed and 2) clinically significant ECG parameters are measured and compared from both the paper-based ECG signals and the digitized ECG. The validation demonstrates a correlation value of 0.85-0.9 between the digital ECG signal and the signal digitized from the paper ECG. There is a high correlation in the clinical parameters between the ECG information from the paper charts and digitized signal, with intra-observer and inter-observer correlations of 0.8-0.9 (p < 0.05), and kappa statistics ranging from 0.85 (inter-observer) to 1.00 (intra-observer). The important features of the ECG signal, especially the QRST complex and the associated intervals, are preserved by obtaining the contour from the paper ECG. The differences between the measures of clinically important features extracted from the original signal and the reconstructed signal are insignificant, thus highlighting the accuracy of this technique. Using this type of ECG digitization tool to carry out retrospective studies on large databases, which rely on paper ECG records, studies of emerging ECG features can be performed. In addition, this tool can be used to potentially integrate digitized ECG information with digital ECG analysis programs and with the patient's electronic medical record.
Implementation of a Posted Schedule to Increase Class-Wide Interobserver Agreement Assessment
ERIC Educational Resources Information Center
Doucette, Stefanie; DiGennaro Reed, Florence D.; Reed, Derek D.; Maguire, Helena; Marquardt, Heidi
2012-01-01
The present study investigated the impact of an antecedent intervention in the form of a daily posted schedule on the interobserver agreement (IOA) assessment of educational goals implemented within a classroom at a private school serving individuals with disabilities. During baseline, the percentage of academic goals with interobserver agreement…
Does clinical experience affect the reproducibility of cervical vertebrae maturation method?
Rongo, Roberto; Valleta, Rosa; Bucci, Rosaria; Bonetti, Giulio Alessandri; Michelotti, Ambrosina; D'Antò, Vincenzo
2015-09-01
To assess interobserver and intraobserver reproducibility of the cervical vertebrae maturation method (CVMM) among three panels of judges with different levels of orthodontic experience (OE). Fifty individual lateral cephalograms of good quality with complete visualization of cervical vertebrae 1 to 4 were selected. Thirty clinicians, divided according to their OE into three groups (junior group, JU, OE ≤ 1 year; postgraduate group, PG, 2 ≤ OE ≤ 4 years; specialist group, SP, OE ≥ 7 years), evaluated the cephalograms in two sessions (T1 and T2) at 3 weeks apart. Kendall's W and weighted Cohen's kappa (κ) coefficients were performed to assess interobserver and intraobserver agreement. The level of significance was set as P < .05. For both the interobserver and the intraobserver datasets, the percentage of perfect agreement (PPA) and the number of stages apart for each disagreement were calculated. Kendall's W at T1 was SP = 0.61, PG = 0.70, and JU = 0.87; at T2 it was SP = 0.78, PG = 0.85, and JU = 0.86. The percentage of total interobserver perfect agreement (Inter-PPA) was 42.3% at T1 and 46.3% at T2. The JU group had the highest Cohen's κ coefficient at 0.78, while the PG and SP had coefficients of 0.64 each. The percentage of total intraobserver perfect agreement (Intra-PPA) was 54.2%. The reproducibility of the method was not improved by the level of orthodontic experience. The group with the lowest level of orthodontic experience had the best performance.
O'Daniel, Jennifer C; Rosenthal, David I; Garden, Adam S; Barker, Jerry L; Ahamad, Anesa; Ang, K Kian; Asper, Joshua A; Blanco, Angel I; de Crevoisier, Renaud; Holsinger, F Christopher; Patel, Chirag B; Schwartz, David L; Wang, He; Dong, Lei
2007-04-01
To investigate interobserver variability in the delineation of head-and-neck (H&N) anatomic structures on CT images, including the effects of image artifacts and observer experience. Nine observers (7 radiation oncologists, 1 surgeon, and 1 physician assistant) with varying levels of H&N delineation experience independently contoured H&N gross tumor volumes and critical structures on radiation therapy treatment planning CT images alongside reference diagnostic CT images for 4 patients with oropharynx cancer. Image artifacts from dental fillings partially obstructed 3 images. Differences in the structure volumes, center-of-volume positions, and boundary positions (1 SD) were measured. In-house software created three-dimensional overlap distributions, including all observers. The effects of dental artifacts and observer experience on contouring precision were investigated, and the need for contrast media was assessed. In the absence of artifacts, all 9 participants achieved reasonable precision (1 SD < or =3 mm all boundaries). The structures obscured by dental image artifacts had larger variations when measured by the 3 metrics (1 SD = 8 mm cranial/caudal boundary). Experience improved the interobserver consistency of contouring for structures obscured by artifacts (1 SD = 2 mm cranial/caudal boundary). Interobserver contouring variability for anatomic H&N structures, specifically oropharyngeal gross tumor volumes and parotid glands, was acceptable in the absence of artifacts. Dental artifacts increased the contouring variability, but experienced participants achieved reasonable precision even with artifacts present. With a staging contrast CT image as a reference, delineation on a noncontrast treatment planning CT image can achieve acceptable precision.
Marawar, Satyajit V; Madom, Ian A; Palumbo, Mark; Tallarico, Richard A; Ordway, Nathaniel R; Metkar, Umesh; Wang, Dongliang; Green, Adam; Lavelle, William F
2017-01-01
Treating surgeon's visual assessment of axial MRI images to ascertain the degree of stenosis has a critical impact on surgical decision-making. The purpose of this study was to prospectively analyze the impact of surgeon experience on inter-observer and intra-observer reliability of assessing severity of spinal stenosis on MRIs by spine surgeons directly involved in surgical decision-making. Seven fellowship trained spine surgeons reviewed MRI studies of 30 symptomatic patients with lumbar stenosis and graded the stenosis in the central canal, the lateral recess and the foramen at T12-L1 to L5-S1 as none, mild, moderate or severe. No specific instructions were provided to what constituted mild, moderate, or severe stenosis. Two surgeons were "senior" (>fifteen years of practice experience); two were "intermediate" (>four years of practice experience), and three "junior" (< one year of practice experience). The concordance correlation coefficient (CCC) was calculated to assess inter-observer reliability. Seven MRI studies were duplicated and randomly re-read to evaluate inter-observer reliability. Surgeon experience was found to be a strong predictor of inter-observer reliability. Senior inter-observer reliability was significantly higher assessing central(p<0.001), foraminal p=0.005 and lateral p=0.001 than "junior" group.Senior group also showed significantly higher inter-observer reliability that intermediate group assessing foraminal stenosis (p=0.036). In intra-observer reliability the results were contrary to that found in inter-observer reliability. Inter-observer reliability of assessing stenosis on MRIs increases with surgeon experience. Lower intra-observer reliability values among the senior group, although not clearly explained, may be due to the small number of MRIs evaluated and quality of MRI images.Level of evidence: Level 3.
Storr, Ashleigh; Venetis, Christos A; Cooke, Simon; Kilani, Suha; Ledger, William
2017-02-01
What is the inter-observer and intra-observer agreement between embryologists when selecting a single Day 5 embryo for transfer? The inter-observer and intra-observer agreement between embryologists when selecting a single Day 5 embryo for transfer was generally good, although not optimal, even among experienced embryologists. Previous research on the morphological assessment of early stage (two pronuclei to Day 3) embryos has shown varying levels of inter-observer and intra-observer agreement. However, single blastocyst transfer is now becoming increasingly popular and there are no published data that assess inter-observer and intra-observer agreement when selecting a single embryo for Day 5 transfer. This was a prospective study involving 10 embryologists working at five different IVF clinics within a single organization between July 2013 and November 2015. The top 10 embryologists were selected based on their yearly Quality Assurance Program scores for blastocyst grading and were asked to morphologically grade all Day 5 embryos and choose a single embryo for transfer in a survey of 100 cases using 2D images. A total of 1000 decisions were therefore assessed. For each case, Day 5 images were shown, followed by a Day 3 and Day 5 image of the same embryo. Subgroup analyses were also performed based on the following characteristics of embryologists: the level of clinical embryology experience in the laboratory; amount of research experience; number of days per week spent grading embryos. The agreement between these embryologists and the one that scored the embryos on the actual day of transfer was also evaluated. Inter-observer and intra-observer variability was assessed using the kappa coefficient to evaluate the extent of agreement. This study showed that all 10 embryologists agreed on the embryo chosen for transfer in 50 out of 100 cases. In 93 out of 100 cases, at least 6 out of the 10 embryologists agreed. The inter-observer and intra-observer agreement among embryologists when selecting a single Day 5 embryo for transfer was generally good as assessed by the kappa scores (kappa = 0.734, 95% CI: 0.665-0.791 and 0.759, 95% CI: 0.622-0.833, respectively). The subgroup analyses did not substantially alter the inter-observer and intra-observer agreement among embryologists. The agreement when Day 3 images were included alongside Day 5 images of the same embryos resulted in a change of mind at least three times by each embryologist (on average for <10% of cases) and resulted in a small decrease in inter-observer and intra-observer agreement between embryologists (kappa = 0.676, 95% CI: 0.617-0.724 and 0.752, 95% CI: 0.656-808, respectively).The assessment of the inter-observer agreement with regard to morphological grading of Day 5 embryos showed only a fair-to-moderate agreement, which was observed across all subgroup analyses. The highest overall kappa coefficient was seen for the grading of the developmental stage of an embryo (0.513; 95% CI: 0.492-0.538). The findings were similar when the individual embryologists were compared with the embryologist who made the morphological assessments of the available embryos on the actual day of transfer. All embryologists had already completed their training and were working under one organization with similar policies between the five clinics. Therefore, the inter-observer agreement might not be as high between embryologists working in clinics with different policies or with different levels of training. The generally good, although not optimal uniformity between participating embryologists when selecting a Day 5 embryo for transfer, as well as, the surprisingly low agreement when morphologically grading Day 5 embryos could be improved, potentially resulting in increased pregnancy rates. Future studies need to be directed toward technologies that can help achieve this. None declared. Not applicable. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Matsuda, Akira; Kawabata, Hiroshi; Tohyama, Kaoru; Maeda, Tomoya; Araseki, Kayano; Hata, Tomoko; Suzuki, Takahiro; Kayano, Hidekazu; Shimbo, Kei; Usuki, Kensuke; Chiba, Shigeru; Ishikawa, Takayuki; Arima, Nobuyoshi; Nohgawa, Masaharu; Ohta, Akiko; Miyazaki, Yasushi; Nakao, Sinnji; Ozawa, Keiya; Arai, Shunya; Kurokawa, Mineo; Mitani, Kinuko; Takaori-Kondo, Akifumi
2018-06-07
The diagnosis of myelodysplastic syndromes (MDS) is based on morphology and cytogenetics. However, limited information is currently available on the interobserver concordance of the assessment of dysplastic lineages (<10% or ≥10% in bone marrow (BM)). The revised International Prognostic Scoring System (IPSS-R) described a new threshold (2%) for BM blasts. However, the interobserver concordance of the categories (0-≤2% and >2-<5%) has limited data. The purpose of the present study was to investigate the assessment of dysplastic lineages and IPSS-R reproducibility. Our study was divided into two Steps. In each Step, the microscopic examinations were performed separately by two morphologists. Regarding the category of BM blasts ≤2% and >2-<5%, interobserver agreement was more than 'moderate' in all pairs (kappa test: 0.43-0.90). Regarding dysgranulopoiesis (dysG) and dyserythropoiesis (dysE) in BM, interobserver agreement was more than 'moderate' in all pairs (kappa test, dysG: 0.45-0.96, dysE: 0.45-0.81). Regarding the category of dysmegakaryopoiesis (dysMgk) in BM, interobserver agreement was more than moderate in 4 out of 5 pairs (kappa test: 0.58-1.00), and was fair for one pair (kappa test: 0.37). We consider that high interobserver concordance may be possible for the BM blast cell count (≤2% or >2-<5%) and dysplasia (<10% or ≥10%) of each lineage. Copyright © 2018 Elsevier Ltd. All rights reserved.
Sağlam, Arzu; Usubütün, Alp; Dolgun, Anıl; Mutter, George L; Salman, M Coşkun; Kurtulan, Olcay; Akyol, Aytekin; Özkan, Eylem Akar; Baykara, Sema; Bülbül, Dilek; Calay, Zerrin; Eren, Funda; Gümürdülü, Derya; Haberal, Nihan; Ilvan, Şennur; Karaveli, Şeyda; Koyuncuoğlu, Meral; Müezzinoğlu, Bahar; Müftüoğlu, Kamil Hakan; Özen, Özlem; Özdemir, Necmettin; Peştereli, Elif; Ulukuş, Çağnur; Zekioğlu, Osman
2017-01-01
Inter-observer differences in the diagnosis of HPV related cervical lesions are problematic and response of gynecologists to these diagnostic entities is non-standardized. This study evaluated the diagnostic reproducibility of "cervical intraepithelial neoplasia" (CIN) and "squamous intraepithelial lesion" (SIL) diagnoses. 19 pathologists evaluated 66 cases once using H&E slides and once with immunohistochemical studies (p16, Ki-67 and Pro-ExC). Management response to diagnoses was evaluated amongst 12 gynecologists. Pathologists and gynecologists were also given a questionnaire about how additional information like smear results and age modify diagnosis and management. We show moderate interobserver diagnostic reproducibility amongst pathologists. The overall kappa value was 0.50 and 0.59 using the CIN and SIL classifications respectively. Impact of immunohistochemical evaluation on interpretation of cases differed and there was lack of statistically significant improvement of interobserver diagnostic reproducibility with the addition of immunohistochemistry. We saw that choice of treatment methods amongst gynecologists varied and overall concordance was only fair to moderate. The CIN2 diagnostic category was seen to have the lowest percentage agreement amongst both pathologists and gynecologists. We showed that pathologists had diagnostic "styles" and gynecologists had management "styles". In summary each pathologist had different diagnostic tendencies which were affected not only by histopathology and marker studies, but also by the patient management tendencies of the gynecologist that the pathologist worked with. The two-tiered modified Bethesda system improved diagnostic agreement. We concluded that immunohistochemistry should be used only to resolve problems in select cases and not for every case.
Duregon, Eleonora; Fassina, Ambrogio; Volante, Marco; Nesi, Gabriella; Santi, Raffaella; Gatti, Gaia; Cappellesso, Rocco; Dalino Ciaramella, Paolo; Ventura, Laura; Gambacorta, Marcello; Dei Tos, Angelo Paolo; Loli, Paola; Mannelli, Massimo; Mantero, Franco; Berruti, Alfredo; Terzolo, Massimo; Papotti, Mauro
2013-09-01
The pathologic diagnosis of adrenocortical carcinoma (ACC) still needs to be improved, because the renowned Weiss Score (WS) system has a poor reproducibility of some parameters and is difficult to apply in borderline cases and in ACC variants. The "reticulin algorithm" (RA) defines malignancy through an altered reticulin framework associated with 1 of the 3 following parameter: necrosis, high mitotic rate, and vascular invasion. This study aimed at validating the interobserver reproducibility of reticulin stain evaluation in an unpublished series of 245 adrenocortical tumors (61 adenomas and 184 carcinomas) from 5 Italian centers, classified according to the WS. Eight pathologists reviewed all reticulin-stained slides. After training, a second round of evaluation on discordant cases was performed 10 weeks later. The RA reclassified 67 cases (27%) as adenomas, including 44 with no reticulin alterations and 23 with an altered reticulin framework but lacking the subsequent parameters of the triad. The other 178 cases (73%) were carcinomas according to the above-mentioned criteria. A complete (8/8 pathologists) interobserver agreement was reached in 75% of cases (κ=0.702), irrespective of case derivation, pathologists' experience, and histologic variants, and was further improved when only those cases with high WS and clinically malignant behavior were considered. After the training, the overall agreement increased to 86%. We conclude that reticulin staining is a reliable technique and an easy-to-interpret system in adrenocortical tumors; moreover, it has a high interobserver reproducibility, which supports the notion of using such a method in the proposed 2-step RA approach for ACC diagnosis.
Bruyn, George A W; Hanova, Petra; Iagnocco, Annamaria; d'Agostino, Maria-Antonietta; Möller, Ingrid; Terslev, Lene; Backhaus, Marina; Balint, Peter V; Filippucci, Emilio; Baudoin, Paul; van Vugt, Richard; Pineda, Carlos; Wakefield, Richard; Garrido, Jesus; Pecha, Ondrej; Naredo, Esperanza
2014-11-01
To develop the first ultrasound scoring system of tendon damage in rheumatoid arthritis (RA) and assess its intraobserver and interobserver reliability. We conducted a Delphi study on ultrasound-defined tendon damage and ultrasound scoring system of tendon damage in RA among 35 international rheumatologists with experience in musculoskeletal ultrasound. Twelve patients with RA were included and assessed twice by 12 rheumatologists-sonographers. Ultrasound examination for tendon damage in B mode of five wrist extensor compartments (extensor carpi radialis brevis and longus; extensor pollicis longus; extensor digitorum communis; extensor digiti minimi; extensor carpi ulnaris) and one ankle tendon (tibialis posterior) was performed blindly, independently and bilaterally in each patient. Intraobserver and interobserver reliability were calculated by κ coefficients. A three-grade semiquantitative scoring system was agreed for scoring tendon damage in B mode. The mean intraobserver reliability for tendon damage scoring was excellent (κ value 0.91). The mean interobserver reliability assessment showed good κ values (κ value 0.75). The most reliable were the extensor digiti minimi, the extensor carpi ulnaris, and the tibialis posterior tendons. An ultrasound reference image atlas of tenosynovitis and tendon damage was also developed. Ultrasound is a reproducible tool for evaluating tendon damage in RA. This study strongly supports a new reliable ultrasound scoring system for tendon damage. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Terslev, Lene; Naredo, Esperanza; Aegerter, Philippe; Wakefield, Richard J; Backhaus, Marina; Balint, Peter; Bruyn, George A W; Iagnocco, Annamaria; Jousse-Joulin, Sandrine; Schmidt, Wolfgang A; Szkudlarek, Marcin; Conaghan, Philip G; Filippucci, Emilio
2017-01-01
Objectives To test the reliability of new ultrasound (US) definitions and quantification of synovial hypertrophy (SH) and power Doppler (PD) signal, separately and in combination, in a range of joints in patients with rheumatoid arthritis (RA) using the European League Against Rheumatisms–Outcomes Measures in Rheumatology (EULAR-OMERACT) combined score for PD and SH. Methods A stepwise approach was used: (1) scoring static images of metacarpophalangeal (MCP) joints in a web-based exercise and subsequently when scanning patients; (2) scoring static images of wrist, proximal interphalangeal joints, knee and metatarsophalangeal joints in a web-based exercise and subsequently when scanning patients using different acquisitions (standardised vs usual practice). For reliability, kappa coefficients (κ) were used. Results Scoring MCP joints in static images showed substantial intraobserver variability but good to excellent interobserver reliability. In patients, intraobserver reliability was the same for the two acquisition methods. Interobserver reliability for SH (κ=0.87) and PD (κ=0.79) and the EULAR-OMERACT combined score (κ=0.86) were better when using a ‘standardised’ scan. For the other joints, the intraobserver reliability was excellent in static images for all scores (κ=0.8–0.97) and the interobserver reliability marginally lower. When using standardised scanning in patients, the intraobserver was good (κ=0.64 for SH and the EULAR-OMERACT combined score, 0.66 for PD) and the interobserver reliability was also good especially for PD (κ range=0.41–0.92). Conclusion The EULAR-OMERACT score demonstrated moderate-good reliability in MCP joints using a standardised scan and is equally applicable in non-MCP joints. This scoring system should underpin improved reliability and consequently the responsiveness of US in RA clinical trials. PMID:28948984
Van den Bosch, T; Valentin, L; Van Schoubroeck, D; Luts, J; Bignardi, T; Condous, G; Epstein, E; Leone, F P; Testa, A C; Van Huffel, S; Bourne, T; Timmerman, D
2012-10-01
To estimate the diagnostic accuracy and interobserver agreement in predicting intracavitary uterine pathology at offline analysis of three-dimensional (3D) ultrasound volumes of the uterus. 3D volumes (unenhanced ultrasound and gel infusion sonography with and without power Doppler, i.e. four volumes per patient) of 75 women presenting with abnormal uterine bleeding at a 'bleeding clinic' were assessed offline by six examiners. The sonologists were asked to provide a tentative diagnosis. A histological diagnosis was obtained by hysteroscopy with biopsy or operative hysteroscopy. Proliferative, secretory or atrophic endometrium was classified as 'normal' histology; endometrial polyps, intracavitary myomas, endometrial hyperplasia and endometrial cancer were classified as 'abnormal' histology. The diagnostic accuracy of the six sonologists with regard to normal/abnormal histology and interobserver agreement were estimated. Intracavitary pathology was diagnosed at histology in 39% of patients. Agreement between the ultrasound diagnosis and the histological diagnosis (normal vs abnormal) ranged from 67 to 83% for the six sonologists. In 45% of cases all six examiners agreed with regard to the presence/absence of intracavitary pathology. The percentage agreement between any two examiners ranged from 65 to 91% (Cohen's κ, 0.31-0.81). The Schouten κ for all six examiners was 0.51 (95% CI, 0.40-0.62), while the highest Schouten κ for any three examiners was 0.69. When analyzing stored 3D ultrasound volumes, agreement between sonologists with regard to classifying the endometrium/uterine cavity as normal or abnormal as well as the diagnostic accuracy varied substantially. Possible actions to improve interobserver agreement and diagnostic accuracy include optimization of image quality and the use of a consistent technique for analyzing the 3D volumes. Copyright © 2012 ISUOG. Published by John Wiley & Sons, Ltd.
Patange Subba Rao, Sheethal Prasad; Lewis, James; Haddad, Ziad; Paringe, Vishal; Mohanty, Khitish
2014-10-01
The aim of the study was to evaluate inter-observer reliability and intra-observer reproducibility between the three-column classification and Schatzker classification systems using 2D and 3D CT models. Fifty-two consecutive patients with tibial plateau fractures were evaluated by five orthopaedic surgeons. All patients were classified into Schatzker and three-column classification systems using x-rays and 2D and 3D CT images. The inter-observer reliability was evaluated in the first round and the intra-observer reliability was determined during the second round 2 weeks later. The average intra-observer reproducibility for the three-column classification was from substantial to excellent in all sub classifications, as compared with Schatzker classification. The inter-observer kappa values increased from substantial to excellent in three-column classification and to moderate in Schatzker classification The average values for three-column classification for all the categories are as follows: (I-III) k2D = 0.718, 95% CI 0.554-0.864, p < 0.0001 and average 3D = 0.874, 95% CI 0.754-0.890, p < 0.0001. For Schatzker classification system, the average values for all six categories are as follows: (I-VI) k2D = 0.536, 95% CI 0.365-0.685, p < 0.0001 and average k3D = 0.552 95% CI 0.405-0.700, p < 0.0001. The values are statistically significant. Statistically significant inter-observer values in both rounds were noted with the three-column classification, making it statistically an excellent agreement. The intra-observer reproducibility for the three-column classification improved as compared with the Schatzker classification. The three-column classification seems to be an effective way to characterise and classify fractures of tibial plateau.
BAGLIO, MICHELLE L.; BAXTER, SUZANNE DOMEL; GUINN, CAROLINE H.; THOMPSON, WILLIAM O.; SHAFFER, NICOLE M.; FRYE, FRANCESCA H. A.
2005-01-01
This article (a) provides a general review of interobserver reliability (IOR) and (b) describes our method for assessing IOR for items and amounts consumed during school meals for a series of studies regarding the accuracy of fourth-grade children's dietary recalls validated with direct observation of school meals. A widely used validation method for dietary assessment is direct observation of meals. Although many studies utilize several people to conduct direct observations, few published studies indicate whether IOR was assessed. Assessment of IOR is necessary to determine that the information collected does not depend on who conducted the observation. Two strengths of our method for assessing IOR are that IOR was assessed regularly throughout the data collection period and that IOR was assessed for foods at the item and amount level instead of at the nutrient level. Adequate agreement among observers is essential to the reasoning behind using observation as a validation tool. Readers are encouraged to question the results of studies that fail to mention and/or to include the results for assessment of IOR when multiple people have conducted observations. PMID:15354155
Smartphone Photography as a Tool to Measure Knee Range of Motion.
Mica, Megan Conti; Wagner, Eric R; Shin, Alexander Y
2018-01-01
The objective of this study was to validate measuring knee range of motion (ROM) from smartphone photography. Thirty-two participants (64 knees) obtained smartphone photographs of knee flexion and extension. Surgeons obtained the same photographs and goniometric measurement of ROM. ROM was measured using Adobe Photoshop. Goniometer versus digital measurements, participant versus surgeon photographs, and interobserver measurements were analyzed. The average difference in goniometer and digital photograph measurements was 5°. The interclass correlation was .642(L) and .656(R). The Bland-Altman plots demonstrated that 29/32 digital measurements were within the 95% confidence interval (CI). Participants' versus researchers' photographs averaged a 2° difference. The interclass correlation was .924(L) and .91(R). Bland-Altman plots demonstrated that 31/32 measurements were within the 95% CI. Interobserver reliability averaged aROMdifference of 5°. The concordance coefficients were .647(L) and .723(R). Bland-Altman plots demonstrated that 30 of 32 digital measurements were within the 95% CI. Measuring knee ROM using smartphone digital photography is valid and reliable. (Journal of Surgical Orthopaedic Advances 27(1):52-57, 2018).
Walter, S G; Stadler, T; Thomas, T S; Thomas, W
2018-03-02
To introduce a (semi-)quantitative surgical score for the classification of rotator cuff tears. A total of 146 consecutive patients underwent rotator cuff repair and were assessed using the previously defined Advanced Rotator Cuff Tear Score (ARoCuS) criteria: muscle tendon, size, tissue quality, pattern as well as mobilization of the tear. The data set was split into a training (125 patients) and a testing set (21 patients). The training data set fitted a nonlinear predictive model of the tear score based on the ARoCuS criteria, while the testing data served as control. Based on the scoring results, rotator cuff tears were assigned to one of four categories (ΔV I-IV) and received a stage-adapted treatment. For statistical analysis, mean values ± standard deviation, interclass correlation coefficients (ICC) and kappa values were calculated. Overall, 32 patients were classified as ΔV I, 68 as ΔV II and 37 as ΔV III. Nine patients showed ΔV IV tears. Patients of all ΔV groups improved significantly their Constant scores (p < 0.001) and profited from significant pain reduction after surgery (p < 0.001). To date, ten patients have undergone revision surgery with five of them primarily classified as ΔV IV. Kappa values for the interobserver reliability ranged between 0.69 and 0.95. ICC scores for the ΔV category were 0.95 for interobserver reliability. The ARoCuS facilitates intra-operative decision-making and enables surgeons and researches to document rotator cuff tears in a standardized and reproducible manner.
Karbalaie, Abdolamir; Abtahi, Farhad; Fatemi, Alimohammad; Etehadtavakol, Mahnaz; Emrani, Zahra; Erlandsson, Björn-Erik
2017-09-01
Nailfold capillaroscopy is a practical method for identifying and obtaining morphological changes in capillaries which might reveal relevant information about diseases and health. Capillaroscopy is harmless, and seems simple and repeatable. However, there is lack of established guidelines and instructions for acquisition as well as the interpretation of the obtained images; which might lead to various ambiguities. In addition, assessment and interpretation of the acquired images are very subjective. In an attempt to overcome some of these problems, in this study a new modified technique for assessment of nailfold capillary density is introduced. The new method is named elliptic broken line (EBL) which is an extension of the two previously known methods by defining clear criteria for finding the apex of capillaries in different scenarios by using a fitted elliptic. A graphical user interface (GUI) is developed for pre-processing, manual assessment of capillary apexes and automatic correction of selected apexes based on 90° rule. Intra- and inter-observer reliability of EBL and corrected EBL is evaluated in this study. Four independent observers familiar with capillaroscopy performed the assessment for 200 nailfold videocapillaroscopy images, form healthy subject and systemic lupus erythematosus patients, in two different sessions. The results show elevation from moderate (ICC=0.691) and good (ICC=0.753) agreements to good (ICC=0.750) and good (ICC=0.801) for intra- and inter-observer reliability after automatic correction of EBL. This clearly shows the potential of this method to improve the reliability and repeatability of assessment which motivates us for further development of automatic tool for EBL method. Copyright © 2017 Elsevier Inc. All rights reserved.
van Veelen, G A; Schweitzer, K J; van der Vaart, C H
2013-11-01
To evaluate the reliability of measurements of the levator hiatus and levator-urethra gap (LUG) using three/four-dimensional (3D/4D) transperineal ultrasound in women during their first pregnancy and 6 months postpartum, and to assess the learning process for these measurements. An inexperienced observer was taught to perform measurements of the levator hiatus and LUG by an experienced observer. After training, 3D/4D ultrasound volume datasets of 40 women in the first trimester were analyzed by these two observers. Another training session then took place and both observers repeated the analyses of the same volume datasets. Finally, analyses of 40 volume datasets of the women 6 months postpartum were performed by both observers. Intra- and interobserver reliability were determined by intraclass correlation coefficients (ICC) with 95% CIs. For levator hiatal measurements, in the women during their first pregnancy the interobserver reliability was substantial to almost perfect after both the first and second training session (ICC, 0.62-0.83 and 0.71-0.89, respectively, for anteroposterior diameter, transverse diameter and area at rest, on contraction and on Valsalva) and the intraobserver reliability was substantial to almost perfect for both observers. For these measurements performed once the women had delivered, interobserver reliability was moderate to almost perfect. For LUG measurements performed during pregnancy, interobserver reliability was slight to moderate after the first training session (ICC, 0.14-0.54), but improved after the second training session (ICC, 0.38-0.71), and intraobserver reliability was moderate to substantial for the experienced observer and slight to moderate for the inexperienced observer. For these measurements performed when the women had delivered, interobserver reliability was fair to moderate. The levator hiatus and LUG can be measured reliably using 3D/4D ultrasound in primigravid and primiparous women. The technique to measure dimensions of the levator hiatus requires limited teaching, but LUG measurements are more difficult and require more extensive training. Copyright © 2013 ISUOG. Published by John Wiley & Sons Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saha, Ashirbani, E-mail: as698@duke.edu; Grimm, La
Purpose: To assess the interobserver variability of readers when outlining breast tumors in MRI, study the reasons behind the variability, and quantify the effect of the variability on algorithmic imaging features extracted from breast MRI. Methods: Four readers annotated breast tumors from the MRI examinations of 50 patients from one institution using a bounding box to indicate a tumor. All of the annotated tumors were biopsy proven cancers. The similarity of bounding boxes was analyzed using Dice coefficients. An automatic tumor segmentation algorithm was used to segment tumors from the readers’ annotations. The segmented tumors were then compared between readersmore » using Dice coefficients as the similarity metric. Cases showing high interobserver variability (average Dice coefficient <0.8) after segmentation were analyzed by a panel of radiologists to identify the reasons causing the low level of agreement. Furthermore, an imaging feature, quantifying tumor and breast tissue enhancement dynamics, was extracted from each segmented tumor for a patient. Pearson’s correlation coefficients were computed between the features for each pair of readers to assess the effect of the annotation on the feature values. Finally, the authors quantified the extent of variation in feature values caused by each of the individual reasons for low agreement. Results: The average agreement between readers in terms of the overlap (Dice coefficient) of the bounding box was 0.60. Automatic segmentation of tumor improved the average Dice coefficient for 92% of the cases to the average value of 0.77. The mean agreement between readers expressed by the correlation coefficient for the imaging feature was 0.96. Conclusions: There is a moderate variability between readers when identifying the rectangular outline of breast tumors on MRI. This variability is alleviated by the automatic segmentation of the tumors. Furthermore, the moderate interobserver variability in terms of the bounding box does not translate into a considerable variability in terms of assessment of enhancement dynamics. The authors propose some additional ways to further reduce the interobserver variability.« less
Dhutia, Harshil; Malhotra, Aneil; Yeo, Tee Joo; Ster, Irina Chis; Gabus, Vincent; Steriotis, Alexandros; Dores, Helder; Mellor, Greg; García-Corrales, Carmen; Ensam, Bode; Jayalapan, Viknesh; Ezzat, Vivienne Anne; Finocchiaro, Gherardo; Gati, Sabiha; Papadakis, Michael; Tome-Esteban, Maria; Sharma, Sanjay
2017-08-01
Preparticipation screening for cardiovascular disease in young athletes with electrocardiography is endorsed by the European Society of Cardiology and several major sporting organizations. One of the concerns of the ECG as a screening test in young athletes relates to the potential for variation in interpretation. We investigated the degree of variation in ECG interpretation in athletes and its financial impact among cardiologists of differing experience. Eight cardiologists (4 with experience in screening athletes) each reported 400 ECGs of consecutively screened young athletes according to the 2010 European Society of Cardiology recommendations, Seattle criteria, and refined criteria. Cohen κ coefficient was used to calculate interobserver reliability. Cardiologists proposed secondary investigations after ECG interpretation, the costs of which were based on the UK National Health Service tariffs. Inexperienced cardiologists were more likely to classify an ECG as abnormal compared with experienced cardiologists (odds ratio, 1.44; 95% confidence interval, 1.03-2.02). Modification of ECG interpretation criteria improved interobserver reliability for categorizing an ECG as abnormal from poor (2010 European Society of Cardiology recommendations; κ=0.15) to moderate (refined criteria; κ=0.41) among inexperienced cardiologists; however, interobserver reliability was moderate for all 3 criteria among experienced cardiologists (κ=0.40-0.53). Inexperienced cardiologists were more likely to refer athletes for further evaluation compared with experienced cardiologists (odds ratio, 4.74; 95% confidence interval, 3.50-6.43) with poorer interobserver reliability (κ=0.22 versus κ=0.47). Interobserver reliability for secondary investigations after ECG interpretation ranged from poor to fair among inexperienced cardiologists (κ=0.15-0.30) and fair to moderate among experienced cardiologists (κ=0.21-0.46). The cost of cardiovascular evaluation per athlete was $175 (95% confidence interval, $142-$228) and $101 (95% confidence interval, $83-$131) for inexperienced and experienced cardiologists, respectively. Interpretation of the ECG in athletes and the resultant cascade of investigations are highly physician dependent even in experienced hands with important downstream financial implications, emphasizing the need for formal training and standardized diagnostic pathways. © 2017 American Heart Association, Inc.
Eechaute, Christophe; Vaes, Peter; Van Aerschot, Lieve; Asman, Sara; Duquet, William
2007-01-18
The assessment of outcomes from the patient's perspective becomes more recognized in health care. Also in patients with chronic ankle instability, the degree of present impairments, disabilities and participation problems should be documented from the perspective of the patient. The decision about which patient-assessed instrument is most appropriate for clinical practice should be based upon systematic reviews. Only rating scales constructed for patients with acute ligament injuries were systematically reviewed in the past. The aim of this study was to review systematically the clinimetric qualities of patient-assessed instruments designed for patients with chronic ankle instability. A computerized literature search of Medline, Embase, Cinahl, Web of Science, Sport Discus and the Cochrane Controlled Trial Register was performed to identify eligible instruments. Two reviewers independently evaluated the clinimetric qualities of the selected instruments using a criteria list. The inter-observer reliability of both the selection procedure and the clinimetric evaluation was calculated using modified kappa coefficients. The inter-observer reliability of the selection procedure was excellent (k = .86). Four instruments met the eligibility criteria: the Ankle Joint Functional Assessment Tool (AJFAT), the Functional Ankle Outcome Score (FAOS), the Foot and Ankle Disability Index (FADI) and the Functional Ankle Ability Measure (FAAM). The inter-observer reliability of the quality assessment was substantial to excellent (k between .64 and .88). Test-retest reliability was demonstrated for the FAOS, the FADI and the FAAM but not for the AJFAT. The FAOS and the FAAM met the criteria for content validity and construct validity. For none of the studied instruments, the internal consistency was sufficiently demonstrated. The presence of floor- and ceiling effects was assessed for the FAOS but ceiling effects were present for all subscales. Responsiveness was demonstrated for the AJFAT, FADI and the FAAM. Only for the FAAM, a minimal clinical important difference (MCID) was presented. The FADI and the FAAM can be considered as the most appropriate, patient-assessed tools to quantify functional disabilities in patients with chronic ankle instability. The clinimetric qualities of the FAAM need to be further demonstrated in a specific population of patients with chronic ankle instability.
Pilic, Denisa; Höfs, Carolin; Weitmann, Sandra; Nöh, Frank; Fröhlich, Thorsten; Skopnik, Heino; Köhler, Henrik; Wenzl, Tobias G; Schmidt-Choudhury, Anjona
2011-09-01
Assessment of intra- and interobserver agreement in multiple intraluminal impedance (MII) measurement between investigators from different institutions. Twenty-four 18- to 24-hour MII tracings were randomly chosen from 4 different institutions (6 per center). Software-aided automatic analysis was performed. Each result was validated by 2 independent investigators from the 4 different centers (4 investigator combinations). For intraobserver agreement, 6 measurements were analyzed twice by the same investigator. Agreement between investigators was calculated using the Cohen kappa coefficient. Interobserver agreement: 13 measurements showed a perfect agreement (kappa > 0.8); 9 had a substantial (kappa 0.61-0.8), 1 a moderate (kappa coefficient 0.41 to 0.6), and 1 a fair agreement (kappa coefficient 0.11-0.4). Median kappa value was 0.83. Intraobserver agreement: 5 tracings showed perfect and 1 showed a substantial agreement. The median kappa value was 0.88. Most measurements showed substantial to perfect intra- and interobserver agreement. Still, we found a few outliers presumably caused by poorer signal quality in some tracings rather than being observer dependent. An improvement of analysis results may be achieved by using a standard analysis protocol, a standardized method for judging tracing quality, better training options for method users, and more interaction between investigators from different institutions.
AO Distal Radius Fracture Classification: Global Perspective on Observer Agreement.
Jayakumar, Prakash; Teunis, Teun; Giménez, Beatriz Bravo; Verstreken, Frederik; Di Mascio, Livio; Jupiter, Jesse B
2017-02-01
Background The primary objective of this study was to test interobserver reliability when classifying fractures by consensus by AO types and groups among a large international group of surgeons. Secondarily, we assessed the difference in inter- and intraobserver agreement of the AO classification in relation to geographical location, level of training, and subspecialty. Methods A randomized set of radiographic and computed tomographic images from a consecutive series of 96 distal radius fractures (DRFs), treated between October 2010 and April 2013, was classified using an electronic web-based portal by an invited group of participants on two occasions. Results Interobserver reliability was substantial when classifying AO type A fractures but fair and moderate for type B and C fractures, respectively. No difference was observed by location, except for an apparent difference between participants from India and Australia classifying type B fractures. No statistically significant associations were observed comparing interobserver agreement by level of training and no differences were shown comparing subspecialties. Intra-rater reproducibility was "substantial" for fracture types and "fair" for fracture groups with no difference accounting for location, training level, or specialty. Conclusion Improved definition of reliability and reproducibility of this classification may be achieved using large international groups of raters, empowering decision making on which system to utilize. Level of Evidence Level III.
AO Distal Radius Fracture Classification: Global Perspective on Observer Agreement
Jayakumar, Prakash; Teunis, Teun; Giménez, Beatriz Bravo; Verstreken, Frederik; Di Mascio, Livio; Jupiter, Jesse B.
2016-01-01
Background The primary objective of this study was to test interobserver reliability when classifying fractures by consensus by AO types and groups among a large international group of surgeons. Secondarily, we assessed the difference in inter- and intraobserver agreement of the AO classification in relation to geographical location, level of training, and subspecialty. Methods A randomized set of radiographic and computed tomographic images from a consecutive series of 96 distal radius fractures (DRFs), treated between October 2010 and April 2013, was classified using an electronic web-based portal by an invited group of participants on two occasions. Results Interobserver reliability was substantial when classifying AO type A fractures but fair and moderate for type B and C fractures, respectively. No difference was observed by location, except for an apparent difference between participants from India and Australia classifying type B fractures. No statistically significant associations were observed comparing interobserver agreement by level of training and no differences were shown comparing subspecialties. Intra-rater reproducibility was “substantial” for fracture types and “fair” for fracture groups with no difference accounting for location, training level, or specialty. Conclusion Improved definition of reliability and reproducibility of this classification may be achieved using large international groups of raters, empowering decision making on which system to utilize. Level of Evidence Level III PMID:28119795
Bennett, Rebecca J; Taljaard, Dunay S; Olaithe, Michelle; Brennan-Jones, Chris; Eikelboom, Robert H
2017-09-18
The purpose of this study is to raise awareness of interobserver concordance and the differences between interobserver reliability and agreement when evaluating the responsiveness of a clinician-administered survey and, specifically, to demonstrate the clinical implications of data types (nominal/categorical, ordinal, interval, or ratio) and statistical index selection (for example, Cohen's kappa, Krippendorff's alpha, or interclass correlation). In this prospective cohort study, 3 clinical audiologists, who were masked to each other's scores, administered the Practical Hearing Aid Skills Test-Revised to 18 adult owners of hearing aids. Interobserver concordance was examined using a range of reliability and agreement statistical indices. The importance of selecting statistical measures of concordance was demonstrated with a worked example, wherein the level of interobserver concordance achieved varied from "no agreement" to "almost perfect agreement" depending on data types and statistical index selected. This study demonstrates that the methodology used to evaluate survey score concordance can influence the statistical results obtained and thus affect clinical interpretations.
van Doorn, Sascha C; Hazewinkel, Y; East, James E; van Leerdam, Monique E; Rastogi, Amit; Pellisé, Maria; Sanduleanu-Dascalescu, Silvia; Bastiaansen, Barbara A J; Fockens, Paul; Dekker, Evelien
2015-01-01
The Paris classification is an international classification system for describing polyp morphology. Thus far, the validity and reproducibility of this classification have not been assessed. We aimed to determine the interobserver agreement for the Paris classification among seven Western expert endoscopists. A total of 85 short endoscopic video clips depicting polyps were created and assessed by seven expert endoscopists according to the Paris classification. After a digital training module, the same 85 polyps were assessed again. We calculated the interobserver agreement with a Fleiss kappa and as the proportion of pairwise agreement. The interobserver agreement of the Paris classification among seven experts was moderate with a Fleiss kappa of 0.42 and a mean pairwise agreement of 67%. The proportion of lesions assessed as "flat" by the experts ranged between 13 and 40% (P<0.001). After the digital training, the interobserver agreement did not change (kappa 0.38, pairwise agreement 60%). Our study is the first to validate the Paris classification for polyp morphology. We demonstrated only a moderate interobserver agreement among international Western experts for this classification system. Our data suggest that, in its current version, the use of this classification system in daily practice is questionable and it is unsuitable for comparative endoscopic research. We therefore suggest introduction of a simplification of the classification system.
Toro, Brigitte; Nester, Christopher J; Farren, Pauline C
2007-03-01
To evaluate the inter- and intraobserver repeatability of the Salford Gait Tool (SF-GT), a new observation-based gait assessment tool for evaluating sagittal plane cerebral palsy (CP) gait. Masked comparative evaluation. University in the United Kingdom. A convenience sample of 23 pediatric physical therapists with varying degrees of clinical experience recruited from the Greater Manchester area. Participants viewed videotapes of the sagittal plane gait of 13 children and used the SF-GT to analyze their 13 different gait styles on 2 occasions. Eleven children had hemiplegic, diplegic, or quadriplegic CP and 2 were neurologically intact. Inter- and intraobserver repeatability of hip, knee, and ankle joint positions at 6 different phases of the gait cycle. The SF-GT demonstrated good interobserver (77%) and intraobserver (75%) repeatability. We have established that the SF-GT is a repeatable clinical assessment tool with which to guide the diagnosis, treatment planning, and evaluation of interventions by pediatric physical therapists of sagittal plane gait deviations in CP.
Vinke, Elisabeth J; Eyding, Jens; de Korte, Chris L; Slump, Cornelis H; van der Hoeven, Johannes G; Hoedemaekers, Cornelia W E
2017-12-01
Ultrasound perfusion imaging (UPI) can be used for the quantification of cerebral perfusion. In a neuro-intensive care setting, repeated measurements are required to evaluate changes in cerebral perfusion and monitor therapy. The aim of this study was to determine the repeatability of UPI in quantification of cerebral perfusion. UPI measurement of cerebral perfusion was performed three times in healthy patients. The coefficients of variation of the three bolus injections were calculated for both time- and volume-derived perfusion parameters in the macro- and microcirculation. The UPI time-dependent parameters had overall the lowest CVs in both the macro- and microcirculation. The volume-related parameters had poorer repeatability, especially in the microcirculation. Both intra-observer variability and inter-observer variability were low. Although UPI is a promising tool for the bedside measurement of cerebral perfusion, improvement of the technique is required before implementation in routine clinical practice. Copyright © 2017 World Federation for Ultrasound in Medicine and Biology. Published by Elsevier Inc. All rights reserved.
Levegrün, Sabine; Pöttgen, Christoph; Jawad, Jehad Abu; Berkovic, Katharina; Hepp, Rodrigo; Stuschke, Martin
2013-02-01
To evaluate megavoltage computed tomography (MVCT)-based image guidance with helical tomotherapy in patients with vertebral tumors by analyzing factors influencing interobserver variability, considered as quality criterion of image guidance. Five radiation oncologists retrospectively registered 103 MVCTs in 10 patients to planning kilovoltage CTs by rigid transformations in 4 df. Interobserver variabilities were quantified using the standard deviations (SDs) of the distributions of the correction vector components about the observers' fraction mean. To assess intraobserver variabilities, registrations were repeated after ≥4 weeks. Residual deviations after setup correction due to uncorrectable rotational errors and elastic deformations were determined at 3 craniocaudal target positions. To differentiate observer-related variations in minimizing these residual deviations across the 3-dimensional MVCT from image resolution effects, 2-dimensional registrations were performed in 30 single transverse and sagittal MVCT slices. Axial and longitudinal MVCT image resolutions were quantified. For comparison, image resolution of kilovoltage cone-beam CTs (CBCTs) and interobserver variability in registrations of 43 CBCTs were determined. Axial MVCT image resolution is 3.9 lp/cm. Longitudinal MVCT resolution amounts to 6.3 mm, assessed as full-width at half-maximum of thin objects in MVCTs with finest pitch. Longitudinal CBCT resolution is better (full-width at half-maximum, 2.5 mm for CBCTs with 1-mm slices). In MVCT registrations, interobserver variability in the craniocaudal direction (SD 1.23 mm) is significantly larger than in the lateral and ventrodorsal directions (SD 0.84 and 0.91 mm, respectively) and significantly larger compared with CBCT alignments (SD 1.04 mm). Intraobserver variabilities are significantly smaller than corresponding interobserver variabilities (variance ratio [VR] 1.8-3.1). Compared with 3-dimensional registrations, 2-dimensional registrations have significantly smaller interobserver variability in the lateral and ventrodorsal directions (VR 3.8 and 2.8, respectively) but not in the craniocaudal direction (VR 0.75). Tomotherapy image guidance precision is affected by image resolution and residual deviations after setup correction. Eliminating the effect of residual deviations yields small interobserver variabilities with submillimeter precision in the axial plane. In contrast, interobserver variability in the craniocaudal direction is dominated by the poorer longitudinal MVCT image resolution. Residual deviations after image guidance exist and need to be considered when dose gradients ultimately achievable with image guided radiation therapy techniques are analyzed. Copyright © 2013 Elsevier Inc. All rights reserved.
Gastritis staging: interobserver agreement by applying OLGA and OLGIM systems.
Isajevs, Sergejs; Liepniece-Karele, Inta; Janciauskas, Dainius; Moisejevs, Georgijs; Putnins, Viesturs; Funka, Konrads; Kikuste, Ilze; Vanags, Aigars; Tolmanis, Ivars; Leja, Marcis
2014-04-01
Atrophic gastritis remains a difficult histopathological diagnosis with low interobserver agreement. The aim of our study was to compare gastritis staging and interobserver agreement between general and expert gastrointestinal (GI) pathologists using Operative Link for Gastritis Assessment (OLGA) and Operative Link on Gastric Intestinal Metaplasia (OLGIM). We enrolled 835 patients undergoing upper endoscopy in the study. Two general and two expert gastrointestinal pathologists graded biopsy specimens according to the Sydney classification, and the stage of gastritis was assessed by OLGA and OLGIM system. Using OLGA, 280 (33.4 %) patients had gastritis (stage I-IV), whereas with OLGIM this was 167 (19.9 %). OLGA stage III- IV gastritis was observed in 25 patients, whereas by OLGIM stage III-IV was found in 23 patients. Interobserver agreement between expert GI pathologists for atrophy in the antrum, incisura angularis, and corpus was moderate (kappa = 0.53, 0.57 and 0.41, respectively, p < 0.0001), but almost perfect for intestinal metaplasia (kappa = 0.82, 0.80 and 0.81, respectively, p < 0.0001). However, interobserver agreement between general pathologists was poor for atrophy, but moderate for intestinal metaplasia. OLGIM staging provided the highest interobserver agreement, but a substantial proportion of potentially high-risk individuals would be missed if only OLGIM staging is applied. Therefore, we recommend to use a combination of OLGA and OLGIM for staging of chronic gastritis.
Caning, M M; Thisted, D L A; Amer-Wählin, I; Laier, G H; Krebs, L
2018-05-17
To examine interobserver agreement in intrapartum cardiotocography (CTG) classification in women undergoing trial of labor after a cesarean section (TOLAC) at term with or without complete uterine rupture. Nineteen blinded and independent Danish obstetricians assessed CTG tracings from 47 women (174 individual pages) with a complete uterine rupture during TOLAC and 37 women (133 individual pages) with no uterine rupture during TOLAC. Individual pages with CTG tracings lasting at least 20 min were evaluated by three different assessors and counted as an individual case. The tracings were analyzed according to the modified version of the Federation of Gynaecology and Obstetrics (FIGO) guidelines elaborated for the use of STAN (ST-analysis). Occurrence of defined abnormalities was recorded and the tracings were classified as normal, suspicious, pathological, or preterminal. The interobserver agreement was evaluated using Fleiss' kappa. Agreement on classification of a preterminal CTG was almost perfect. The interobserver agreement on normal, suspicious or pathological CTG was moderate to substantial. Regarding the presence of severe variable decelerations, the agreement was moderate. No statistical difference was found in the interobserver agreement between classification of tracings from women undergoing TOLAC with and without complete uterine rupture. The interobserver agreement on classification of CTG tracings from high-risk deliveries during TOLAC is best for assessment of a preterminal CTG and the poorest for the identification of severe variable decelerations.
Validity of a smartphone protractor to measure sagittal parameters in adult spinal deformity.
Kunkle, William Aaron; Madden, Michael; Potts, Shannon; Fogelson, Jeremy; Hershman, Stuart
2017-10-01
Smartphones have become an integral tool in the daily life of health-care professionals (Franko 2011). Their ease of use and wide availability often make smartphones the first tool surgeons use to perform measurements. This technique has been validated for certain orthopedic pathologies (Shaw 2012; Quek 2014; Milanese 2014; Milani 2014), but never to assess sagittal parameters in adult spinal deformity (ASD). This study was designed to assess the validity, reproducibility, precision, and efficiency of using a smartphone protractor application to measure sagittal parameters commonly measured in ASD assessment and surgical planning. This study aimed to (1) determine the validity of smartphone protractor applications, (2) determine the intra- and interobserver reliability of smartphone protractor applications when used to measure sagittal parameters in ASD, (3) determine the efficiency of using a smartphone protractor application to measure sagittal parameters, and (4) elucidate whether a physician's level of experience impacts the reliability or validity of using a smartphone protractor application to measure sagittal parameters in ASD. An experimental validation study was carried out. Thirty standard 36″ standing lateral radiographs were examined. Three separate measurements were performed using a marker and protractor; then at a separate time point, three separate measurements were performed using a smartphone protractor application for all 30 radiographs. The first 10 radiographs were then re-measured two more times, for a total of three measurements from both the smartphone protractor and marker and protractor. The parameters included lumbar lordosis, pelvic incidence, and pelvic tilt. Three raters performed all measurements-a junior level orthopedic resident, a senior level orthopedic resident, and a fellowship-trained spinal deformity surgeon. All data, including the time to perform the measurements, were recorded, and statistical analysis was performed to determine intra- and interobserver reliability, as well as accuracy, efficiency, and precision. Statistical analysis using the intra- and interclass correlation coefficient was calculated using R (version 3.3.2, 2016) to determine the degree of intra- and interobserver reliability. High rates of intra- and interobserver reliability were observed between the junior resident, senior resident, and attending surgeon when using the smartphone protractor application as demonstrated by high inter- and intra-class correlation coefficients greater than 0.909 and 0.874 respectively. High rates of inter- and intraobserver reliability were also seen between the junior resident, senior resident, and attending surgeon when a marker and protractor were used as demonstrated by high inter- and intra-class correlation coefficients greater than 0.909 and 0.807 respectively. The lumbar lordosis, pelvic incidence, and pelvic tilt values were accurately measured by all three raters, with excellent inter- and intra-class correlation coefficient values. When the first 10 radiographs were re-measured at different time points, a high degree of precision was noted. Measurements performed using the smartphone application were consistently faster than using a marker and protractor-this difference reached statistical significance of p<.05. Adult spinal deformity radiographic parameters can be measured accurately, precisely, reliably, and more efficiently using a smartphone protractor application than with a standard protractor and wax pencil. A high degree of intra- and interobserver reliability was seen between the residents and attending surgeon, indicating measurements made with a smartphone protractor are unaffected by an observer's level of experience. As a result, smartphone protractors may be used when planning ASD surgery. Copyright © 2017 Elsevier Inc. All rights reserved.
Lambron, Julien; Rakotonjanahary, Josué; Loisel, Didier; Frampas, Eric; De Carli, Emilie; Delion, Matthieu; Rialland, Xavier; Toulgoat, Frédérique
2016-02-01
Magnetic resonance (MR) images from children with optic pathway glioma (OPG) are complex. We initiated this study to evaluate the accuracy of MR imaging (MRI) interpretation and to propose a simple and reproducible imaging classification for MRI. We randomly selected 140 MRIs from among 510 MRIs performed on 104 children diagnosed with OPG in France from 1990 to 2004. These images were reviewed independently by three radiologists (F.T., 15 years of experience in neuroradiology; D.L., 25 years of experience in pediatric radiology; and J.L., 3 years of experience in radiology) using a classification derived from the Dodge and modified Dodge classifications. Intra- and interobserver reliabilities were assessed using the Bland-Altman method and the kappa coefficient. These reviews allowed the definition of reliable criteria for MRI interpretation. The reviews showed intraobserver variability and large discrepancies among the three radiologists (kappa coefficient varying from 0.11 to 1). These variabilities were too large for the interpretation to be considered reproducible over time or among observers. A consensual analysis, taking into account all observed variabilities, allowed the development of a definitive interpretation protocol. Using this revised protocol, we observed consistent intra- and interobserver results (kappa coefficient varying from 0.56 to 1). The mean interobserver difference for the solid portion of the tumor with contrast enhancement was 0.8 cm(3) (limits of agreement = -16 to 17). We propose simple and precise rules for improving the accuracy and reliability of MRI interpretation for children with OPG. Further studies will be necessary to investigate the possible prognostic value of this approach.
Novel Tool for Complete Digitization of Paper Electrocardiography Data
Harless, Chris; Shah, Amit J.; Wick, Carson A.; Mcclellan, James H.
2013-01-01
Objective: We present a Matlab-based tool to convert electrocardiography (ECG) information from paper charts into digital ECG signals. The tool can be used for long-term retrospective studies of cardiac patients to study the evolving features with prognostic value. Methods and procedures: To perform the conversion, we: 1) detect the graphical grid on ECG charts using grayscale thresholding; 2) digitize the ECG signal based on its contour using a column-wise pixel scan; and 3) use template-based optical character recognition to extract patient demographic information from the paper ECG in order to interface the data with the patients' medical record. To validate the digitization technique: 1) correlation between the digital signals and signals digitized from paper ECG are performed and 2) clinically significant ECG parameters are measured and compared from both the paper-based ECG signals and the digitized ECG. Results: The validation demonstrates a correlation value of 0.85–0.9 between the digital ECG signal and the signal digitized from the paper ECG. There is a high correlation in the clinical parameters between the ECG information from the paper charts and digitized signal, with intra-observer and inter-observer correlations of 0.8–0.9 \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$({\\rm p}<{0.05})$\\end{document}, and kappa statistics ranging from 0.85 (inter-observer) to 1.00 (intra-observer). Conclusion: The important features of the ECG signal, especially the QRST complex and the associated intervals, are preserved by obtaining the contour from the paper ECG. The differences between the measures of clinically important features extracted from the original signal and the reconstructed signal are insignificant, thus highlighting the accuracy of this technique. Clinical impact: Using this type of ECG digitization tool to carry out retrospective studies on large databases, which rely on paper ECG records, studies of emerging ECG features can be performed. In addition, this tool can be used to potentially integrate digitized ECG information with digital ECG analysis programs and with the patient's electronic medical record. PMID:26594601
Jensen, Mads R; Birkballe, Susanne; Nørregaard, Susan; Karlsmark, Tonny
2012-07-01
Tissue dielectric constant (TDC) measurement may become an important tool in the clinical evaluation of chronic lower extremity swelling in women; however, several factors are known to influence TDC measurements, and comparative data on healthy lower extremities are few. Thirty-four healthy women volunteered. Age, BMI, moisturizer use and hair removal were registered. Three blinded investigators performed TDC measurements in a randomized sequence on clearly marked locations on the foot, the ankle and the lower leg. The effective measuring depth was 2.5 mm. The mean TDC was 37.8 ± 5.5 (mean ± SD) on the foot, 29.0 ± 3.1 on the ankle and 30.5 ± 3.9 on the lower leg. TDC was highly dependent on measuring site (P<0.001) but did not vary significantly between investigators (P=0.127). Neither age, BMI, hair removal nor moisturizer use had any significant effect on the lower leg TDC. Intraclass correlation coefficients were 0.77 for the foot, 0.94 for the ankle and 0.94 for the lower leg. The TDC on the foot was significantly higher compared with ankle and lower leg values. Foot measurements should be interpreted cautiously because of questionable interobserver agreement. The interobserver agreement was high on lower leg and ankle measurements. Neither age, BMI, hair removal nor moisturizer use had any significant on effect on the lower leg TDC. TDC values of 35.2 for the ankle and 38.3 for the lower leg are suggested as upper normal reference limits in women. © 2012 The Authors Clinical Physiology and Functional Imaging © 2012 Scandinavian Society of Clinical Physiology and Nuclear Medicine.
DeAngelis, Lisa M.; Brandes, Alba A.; Peereboom, David M.; Galanis, Evanthia; Lin, Nancy U.; Soffietti, Riccardo; Macdonald, David R.; Chamberlain, Marc; Perry, James; Jaeckle, Kurt; Mehta, Minesh; Stupp, Roger; Muzikansky, Alona; Pentsova, Elena; Cloughesy, Timothy; Iwamoto, Fabio M.; Tonn, Joerg-Christian; Vogelbaum, Michael A.; Wen, Patrick Y.; van den Bent, Martin J.; Reardon, David A.
2017-01-01
Abstract Background. The Macdonald criteria and the Response Assessment in Neuro-Oncology (RANO) criteria define radiologic parameters to classify therapeutic outcome among patients with malignant glioma and specify that clinical status must be incorporated and prioritized for overall assessment. But neither provides specific parameters to do so. We hypothesized that a standardized metric to measure neurologic function will permit more effective overall response assessment in neuro-oncology. Methods. An international group of physicians including neurologists, medical oncologists, radiation oncologists, and neurosurgeons with expertise in neuro-oncology drafted the Neurologic Assessment in Neuro-Oncology (NANO) scale as an objective and quantifiable metric of neurologic function evaluable during a routine office examination. The scale was subsequently tested in a multicenter study to determine its overall reliability, inter-observer variability, and feasibility. Results. The NANO scale is a quantifiable evaluation of 9 relevant neurologic domains based on direct observation and testing conducted during routine office visits. The score defines overall response criteria. A prospective, multinational study noted a >90% inter-observer agreement rate with kappa statistic ranging from 0.35 to 0.83 (fair to almost perfect agreement), and a median assessment time of 4 minutes (interquartile range, 3–5). Conclusion. The NANO scale provides an objective clinician-reported outcome of neurologic function with high inter-observer agreement. It is designed to combine with radiographic assessment to provide an overall assessment of outcome for neuro-oncology patients in clinical trials and in daily practice. Furthermore, it complements existing patient-reported outcomes and cognition testing to combine for a global clinical outcome assessment of well-being among brain tumor patients. PMID:28453751
Baek, Hye Jin; Kim, Dong Wook; Ryu, Ji Hwa; Lee, Yoo Jin
2013-09-01
There has been no study to compare the diagnostic accuracy of an experienced radiologist with a trainee in nasal bone fracture. To compare the diagnostic accuracy between conventional radiography and computed tomography (CT) for the identification of nasal bone fractures and to evaluate the interobserver reliability between a staff radiologist and a trainee. A total of 108 patients who underwent conventional radiography and CT after acute nasal trauma were included in this retrospective study. Two readers, a staff radiologist and a second-year resident, independently assessed the results of the imaging studies. Of the 108 patients, the presence of a nasal bone fracture was confirmed in 88 (81.5%) patients. The number of non-depressed fractures was higher than the number of depressed fractures. In nine (10.2%) patients, nasal bone fractures were only identified on conventional radiography, including three depressed and six non-depressed fractures. CT was more accurate as compared to conventional radiography for the identification of nasal bone fractures as determined by both readers (P <0.05), all diagnostic indices of an experienced radiologist were similar to or higher than those of a trainee, and κ statistics showed moderate agreement between the two diagnostic tools for both readers. There was no statistical difference in the assessment of interobserver reliability for both imaging modalities in the identification of nasal bone fractures. For the identification of nasal bone fractures, CT was significantly superior to conventional radiography. Although a staff radiologist showed better values in the identification of nasal bone fracture and differentiation between depressed and non-depressed fractures than a trainee, there was no statistically significant difference in the interpretation of conventional radiography and CT between a radiologist and a trainee.
Conte, Gian Marco; Castellano, Antonella; Altabella, Luisa; Iadanza, Antonella; Cadioli, Marcello; Falini, Andrea; Anzalone, Nicoletta
2017-04-01
Dynamic susceptibility contrast MRI (DSC) and dynamic contrast-enhanced MRI (DCE) are useful tools in the diagnosis and follow-up of brain gliomas; nevertheless, both techniques leave the open issue of data reproducibility. We evaluated the reproducibility of data obtained using two different commercial software for perfusion maps calculation and analysis, as one of the potential sources of variability can be the software itself. DSC and DCE analyses from 20 patients with gliomas were tested for both the intrasoftware (as intraobserver and interobserver reproducibility) and the intersoftware reproducibility, as well as the impact of different postprocessing choices [vascular input function (VIF) selection and deconvolution algorithms] on the quantification of perfusion biomarkers plasma volume (Vp), volume transfer constant (K trans ) and rCBV. Data reproducibility was evaluated with the intraclass correlation coefficient (ICC) and Bland-Altman analysis. For all the biomarkers, the intra- and interobserver reproducibility resulted in almost perfect agreement in each software, whereas for the intersoftware reproducibility the value ranged from 0.311 to 0.577, suggesting fair to moderate agreement; Bland-Altman analysis showed high dispersion of data, thus confirming these findings. Comparisons of different VIF estimation methods for DCE biomarkers resulted in ICC of 0.636 for K trans and 0.662 for Vp; comparison of two deconvolution algorithms in DSC resulted in an ICC of 0.999. The use of single software ensures very good intraobserver and interobservers reproducibility. Caution should be taken when comparing data obtained using different software or different postprocessing within the same software, as reproducibility is not guaranteed anymore.
Reliability of Two Smartphone Applications for Radiographic Measurements of Hallux Valgus Angles.
Mattos E Dinato, Mauro Cesar; Freitas, Marcio de Faria; Milano, Cristiano; Valloto, Elcio; Ninomiya, André Felipe; Pagnano, Rodrigo Gonçalves
The objective of the present study was to assess the reliability of 2 smartphone applications compared with the traditional goniometer technique for measurement of radiographic angles in hallux valgus and the time required for analysis with the different methods. The radiographs of 31 patients (52 feet) with a diagnosis of hallux valgus were analyzed. Four observers, 2 with >10 years' experience in foot and ankle surgery and 2 in-training surgeons, measured the hallux valgus angle and intermetatarsal angle using a manual goniometer technique and 2 smartphone applications (Hallux Angles and iPinPoint). The interobserver and intermethod reliability were estimated using intraclass correlation coefficients (ICCs), and the time required for measurement of the angles among the 3 methods was compared using the Friedman test. A very good or good interobserver reliability was found among the 4 observers measuring the hallux valgus angle and intermetatarsal angle using the goniometer (ICC 0.913 and 0.821, respectively) and iPinPoint (ICC 0.866 and 0.638, respectively). Using the Hallux Angles application, a very good interobserver reliability was found for measurements of the hallux valgus angle (ICC 0.962) and intermetatarsal angle (ICC 0.935) only among the more experienced observers. The time required for the measurements was significantly shorter for the measurements using both smartphone applications compared with the goniometer method. One smartphone application (iPinPoint) was reliable for measurements of the hallux valgus angles by either experienced or nonexperienced observers. The use of these tools might save time in the evaluation of radiographic angles in the hallux valgus. Copyright © 2016 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Odland, Audun; Server, Andres; Saxhaug, Cathrine; Breivik, Birger; Groote, Rasmus; Vardal, Jonas; Larsson, Christopher; Bjørnerud, Atle
2015-11-01
Volumetric magnetic resonance imaging (MRI) is now widely available and routinely used in the evaluation of high-grade gliomas (HGGs). Ideally, volumetric measurements should be included in this evaluation. However, manual tumor segmentation is time-consuming and suffers from inter-observer variability. Thus, tools for semi-automatic tumor segmentation are needed. To present a semi-automatic method (SAM) for segmentation of HGGs and to compare this method with manual segmentation performed by experts. The inter-observer variability among experts manually segmenting HGGs using volumetric MRIs was also examined. Twenty patients with HGGs were included. All patients underwent surgical resection prior to inclusion. Each patient underwent several MRI examinations during and after adjuvant chemoradiation therapy. Three experts performed manual segmentation. The results of tumor segmentation by the experts and by the SAM were compared using Dice coefficients and kappa statistics. A relatively close agreement was seen among two of the experts and the SAM, while the third expert disagreed considerably with the other experts and the SAM. An important reason for this disagreement was a different interpretation of contrast enhancement as either surgically-induced or glioma-induced. The time required for manual tumor segmentation was an average of 16 min per scan. Editing of the tumor masks produced by the SAM required an average of less than 2 min per sample. Manual segmentation of HGG is very time-consuming and using the SAM could increase the efficiency of this process. However, the accuracy of the SAM ultimately depends on the expert doing the editing. Our study confirmed a considerable inter-observer variability among experts defining tumor volume from volumetric MRIs. © The Foundation Acta Radiologica 2014.
Area of ischemia assessed by physicians and software packages from myocardial perfusion scintigrams
2014-01-01
Background The European Society of Cardiology recommends that patients with >10% area of ischemia should receive revascularization. We investigated inter-observer variability for the extent of ischemic defects reported by different physicians and by different software tools, and if inter-observer variability was reduced when the physicians were provided with a computerized suggestion of the defects. Methods Twenty-five myocardial perfusion single photon emission computed tomography (SPECT) patients who were regarded as ischemic according to the final report were included. Eleven physicians in nuclear medicine delineated the extent of the ischemic defects. After at least two weeks, they delineated the defects again, and were this time provided a suggestion of the defect delineation by EXINI HeartTM (EXINI). Summed difference scores and ischemic extent values were obtained from four software programs. Results The median extent values obtained from the 11 physicians varied between 8% and 34%, and between 9% and 16% for the software programs. For all 25 patients, mean extent obtained from EXINI was 17.0% (± standard deviation (SD) 14.6%). Mean extent for physicians was 22.6% (± 15.6%) for the first delineation and 19.1% (± 14.9%) for the evaluation where they were provided computerized suggestion. Intra-class correlation (ICC) increased from 0.56 (95% confidence interval (CI) 0.41-0.72) to 0.81 (95% CI 0.71-0.90) between the first and the second delineation, and SD between physicians were 7.8 (first) and 5.9 (second delineation). Conclusions There was large variability in the estimated ischemic defect size obtained both from different physicians and from different software packages. When the physicians were provided with a suggested delineation, the inter-observer variability decreased significantly. PMID:24479846
Vision 20/20: perspectives on automated image segmentation for radiotherapy.
Sharp, Gregory; Fritscher, Karl D; Pekar, Vladimir; Peroni, Marta; Shusharina, Nadya; Veeraraghavan, Harini; Yang, Jinzhong
2014-05-01
Due to rapid advances in radiation therapy (RT), especially image guidance and treatment adaptation, a fast and accurate segmentation of medical images is a very important part of the treatment. Manual delineation of target volumes and organs at risk is still the standard routine for most clinics, even though it is time consuming and prone to intra- and interobserver variations. Automated segmentation methods seek to reduce delineation workload and unify the organ boundary definition. In this paper, the authors review the current autosegmentation methods particularly relevant for applications in RT. The authors outline the methods' strengths and limitations and propose strategies that could lead to wider acceptance of autosegmentation in routine clinical practice. The authors conclude that currently, autosegmentation technology in RT planning is an efficient tool for the clinicians to provide them with a good starting point for review and adjustment. Modern hardware platforms including GPUs allow most of the autosegmentation tasks to be done in a range of a few minutes. In the nearest future, improvements in CT-based autosegmentation tools will be achieved through standardization of imaging and contouring protocols. In the longer term, the authors expect a wider use of multimodality approaches and better understanding of correlation of imaging with biology and pathology.
Vision 20/20: Perspectives on automated image segmentation for radiotherapy
Sharp, Gregory; Fritscher, Karl D.; Pekar, Vladimir; Peroni, Marta; Shusharina, Nadya; Veeraraghavan, Harini; Yang, Jinzhong
2014-01-01
Due to rapid advances in radiation therapy (RT), especially image guidance and treatment adaptation, a fast and accurate segmentation of medical images is a very important part of the treatment. Manual delineation of target volumes and organs at risk is still the standard routine for most clinics, even though it is time consuming and prone to intra- and interobserver variations. Automated segmentation methods seek to reduce delineation workload and unify the organ boundary definition. In this paper, the authors review the current autosegmentation methods particularly relevant for applications in RT. The authors outline the methods’ strengths and limitations and propose strategies that could lead to wider acceptance of autosegmentation in routine clinical practice. The authors conclude that currently, autosegmentation technology in RT planning is an efficient tool for the clinicians to provide them with a good starting point for review and adjustment. Modern hardware platforms including GPUs allow most of the autosegmentation tasks to be done in a range of a few minutes. In the nearest future, improvements in CT-based autosegmentation tools will be achieved through standardization of imaging and contouring protocols. In the longer term, the authors expect a wider use of multimodality approaches and better understanding of correlation of imaging with biology and pathology. PMID:24784366
Dawson, Lauren C.; Dewey, Cate E.; Stone, Elizabeth A.; Mosley, Cornelia I.; Guerin, Michele T.; Niel, Lee
2017-01-01
Successful prevention, recognition, and treatment of pain are integral to ensuring veterinary patient welfare. A canine and feline welfare assessment tool, incorporating verbal interviews with veterinarians using open-ended questions, was developed to assess pain management practices that safeguard and improve patient welfare. The tool was evaluated in 30 companion- and mixed-animal veterinary clinics in Ontario in order to assess its reliability, feasibility, and validity, while also benchmarking current practices. Responses were analyzed according to a scoring scheme developed based on published literature and expert opinion. Based on weighted kappa statistics, interview scoring had substantial inter-observer (Kw = 0.83, 0.73) and near-perfect intra-observer (Kw = 0.92) agreement, which suggests that the tool reliably collects information about pain management practices. Interviews were completed at all recruited clinics, which indicates high feasibility for the methods. Validity could not be assessed, as participants were reluctant to share information about analgesic administration from their clinical records. Descriptive results indicated areas for which many veterinarians are acting in accordance with best practices for pain management, such as pre-emptive and post-surgical analgesia for ovariohysterectomy patients, and post-surgical care instructions. Areas that offer opportunity for enhancement were also highlighted, e.g., training veterinary staff to recognize signs of pain and duration of analgesia in ovariohysterectomy patients after discharge. Overall, based on this limited sample, most veterinarians appear to be effectively managing their patients’ pain, although areas with opportunity for enhancement were also identified. Further research is needed to assess trends in a broader sample of participants. PMID:29081584
van Hamersvelt, Robbert W; Willemink, Martin J; Takx, Richard A P; Eikendal, Anouk L M; Budde, Ricardo P J; Leiner, Tim; Mol, Christian P; Isgum, Ivana; de Jong, Pim A
2014-07-01
To determine inter-observer and inter-examination variability for aortic valve calcification (AVC) and mitral valve and annulus calcification (MC) in low-dose unenhanced ungated lung cancer screening chest computed tomography (CT). We included 578 lung cancer screening trial participants who were examined by CT twice within 3 months to follow indeterminate pulmonary nodules. On these CTs, AVC and MC were measured in cubic millimetres. One hundred CTs were examined by five observers to determine the inter-observer variability. Reliability was assessed by kappa statistics (κ) and intra-class correlation coefficients (ICCs). Variability was expressed as the mean difference ± standard deviation (SD). Inter-examination reliability was excellent for AVC (κ = 0.94, ICC = 0.96) and MC (κ = 0.95, ICC = 0.90). Inter-examination variability was 12.7 ± 118.2 mm(3) for AVC and 31.5 ± 219.2 mm(3) for MC. Inter-observer reliability ranged from κ = 0.68 to κ = 0.92 for AVC and from κ = 0.20 to κ = 0.66 for MC. Inter-observer ICC was 0.94 for AVC and ranged from 0.56 to 0.97 for MC. Inter-observer variability ranged from -30.5 ± 252.0 mm(3) to 84.0 ± 240.5 mm(3) for AVC and from -95.2 ± 210.0 mm(3) to 303.7 ± 501.6 mm(3) for MC. AVC can be quantified with excellent reliability on ungated unenhanced low-dose chest CT, but manual detection of MC can be subject to substantial inter-observer variability. Lung cancer screening CT may be used for detection and quantification of cardiac valve calcifications. • Low-dose unenhanced ungated chest computed tomography can detect cardiac valve calcifications. • However, calcified cardiac valves are not reported by most radiologists. • Inter-observer and inter-examination variability of aortic valve calcifications is sufficient for longitudinal studies. • Volumetric measurement variability of mitral valve and annulus calcifications is substantial.
Mosmuller, David G M; Mennes, Lisette M; Prahl, Charlotte; Kramer, Gem J C; Disse, Melissa A; van Couwelaar, Gijs M; Niessen, Frank B; Griot, J P W Don
2017-09-01
The development of the Cleft Aesthetic Rating Scale, a simple and reliable photographic reference scale for the assessment of nasolabial appearance in complete unilateral cleft lip and palate patients. A blind retrospective analysis of photographs of cleft lip and palate patients was performed with this new rating scale. VU Medical Center Amsterdam and the Academic Center for Dentistry of Amsterdam. Complete unilateral cleft lip and palate patients at the age of 6 years. Photographs that showed the highest interobserver agreement in earlier assessments were selected for the photographic reference scale. Rules were attached to the rating scale to provide a guideline for the assessment and improve interobserver reliability. Cropped photographs revealing only the nasolabial area were assessed by six observers using this new Cleft Aesthetic Rating Scale in two different sessions. Photographs of 62 children (6 years of age, 44 boys and 18 girls) were assessed. The interobserver reliability for the nose and lip together was 0.62, obtained with the intraclass correlation coefficient. To measure the internal consistency, a Cronbach alpha of .91 was calculated. The estimated reliability for three observers was .84, obtained with the Spearman Brown formula. A new, easy to use, and reliable scoring system with a photographic reference scale is presented in this study.
Jabs, Douglas A; Dick, Andrew; Doucette, John T; Gupta, Amod; Lightman, Susan; McCluskey, Peter; Okada, Annabelle A; Palestine, Alan G; Rosenbaum, James T; Saleem, Sophia M; Thorne, Jennifer; Trusko, Brett
2018-02-01
To evaluate the interobserver agreement among uveitis experts on the diagnosis of the specific uveitic disease. Interobserver agreement analysis. Five committees, each comprised of 9 individuals and working in parallel, reviewed cases from a preliminary database of 25 uveitic diseases, collected by disease, and voted independently online whether the case was the disease in question or not. The agreement statistic, κ, was calculated for the 36 pairwise comparisons for each disease, and a mean κ was calculated for each disease. After the independent online voting, committee consensus conference calls, using nominal group techniques, reviewed all cases not achieving supermajority agreement (>75%) on the diagnosis in the online voting to attempt to arrive at a supermajority agreement. A total of 5766 cases for the 25 diseases were evaluated. The overall mean κ for the entire project was 0.39, with disease-specific variation ranging from 0.23 to 0.79. After the formalized consensus conference calls to address cases that did not achieve supermajority agreement in the online voting, supermajority agreement overall was reached on approximately 99% of cases, with disease-specific variation ranging from 96% to 100%. Agreement among uveitis experts on diagnosis is moderate at best but can be improved by discussion among them. These data suggest the need for validated and widely used classification criteria in the field of uveitis. Copyright © 2017 Elsevier Inc. All rights reserved.
Agreement in the assessment of metastatic spine disease using scoring systems.
Arana, Estanislao; Kovacs, Francisco M; Royuela, Ana; Asenjo, Beatriz; Pérez-Ramírez, Ursula; Zamora, Javier
2015-04-01
To assess variability in the use of Tomita and modified Bauer scores in spine metastases. Clinical data and imaging from 90 patients with biopsy-proven spinal metastases, were provided to 83 specialists from 44 hospitals. Spinal levels involved and the Tomita and modified Bauer scores for each case were determined twice by each clinician, with a minimum of 6-week interval. Clinicians were blinded to every evaluation. Kappa statistic was used to assess intra and inter-observer agreement. Subgroup analyses were performed according to clinicians' specialty (medical oncology, neurosurgery, radiology, orthopedic surgery and radiation oncology), years of experience (⩽7, 8-13, ⩾14), and type of hospital (four levels). For metastases identification, intra-observer agreement was "substantial" (0.60
Patra, S; Gomm, E M W; Macipe, M; Bailey, C
2009-08-01
To assess the quality and accuracy of primary grading in the Bristol and Weston diabetic retinopathy screening programme and to set standards for future interobserver agreement reports. A prospective audit of 213 image sets from six fully trained primary graders in the Bristol and Weston diabetic retinopathy screening programme was carried out over a 4-week period. All the images graded by the primary graders were regraded by an expert grader blinded to the primary grading results and the identity of the primary grader. The interobserver agreement between primary graders and the blinded expert grader and the corresponding Kappa coefficient was determined for overall grading, referable, non-referable and ungradable disease. The audit standard was set at 80% for interobserver agreement with a Kappa coefficient of 0.7. The interobserver agreement bettered the audit standard of 80% in all the categories. The Kappa coefficient was substantial (0.7) for the overall grading results and ranged from moderate to substantial (0.59-0.65) for referable, non-referable and ungradable disease categories. The main recommendation of the audit was to provide refresher training for the primary graders with focus on ungradable disease. The audit demonstrated an acceptable level of quality and accuracy of primary grading in the Bristol and Weston diabetic retinopathy screening programme and provided a standard against which future interobserver agreement can be measured for quality assurance within a screening programme. Diabet. Med. 26, 820-823 (2009).
Razek, Ahmed Abdel Khalek Abdel; Shamaa, Sameh; Lattif, Mahmoud Abdel; Yousef, Hanan Hamid
2017-01-01
To assess inter-observer agreement of whole-body computed tomography (WBCT) in staging and response assessment in lymphoma according to the Lugano classification. Retrospective analysis was conducted of 115 consecutive patients with lymphomas (45 females, 70 males; mean age of 46 years). Patients underwent WBCT with a 64 multi-detector CT device for staging and response assessment after a complete course of chemotherapy. Image analysis was performed by 2 reviewers according to the Lugano classification for staging and response assessment. The overall inter-observer agreement of WBCT in staging of lymphoma was excellent ( k =0.90, percent agreement=94.9%). There was an excellent inter-observer agreement for stage I ( k =0.93, percent agreement=96.4%), stage II ( k =0.90, percent agreement=94.8%), stage III ( k =0.89, percent agreement=94.6%) and stage IV ( k =0.88, percent agreement=94%). The overall inter-observer agreement in response assessment after a completer course of treatment was excellent ( k =0.91, percent agreement=95.8%). There was an excellent inter-observer agreement in progressive disease ( k =0.94, percent agreement=97.1%), stable disease ( k =0.90, percent agreement=95%), partial response ( k =0.96, percent agreement=98.1%) and complete response ( k =0.87, Percent agreement=93.3%). We concluded that WBCT is a reliable and reproducible imaging modality for staging and treatment assessment in lymphoma according to the Lugano classification.
Segmentation precision of abdominal anatomy for MRI-based radiotherapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Noel, Camille E.; Zhu, Fan; Lee, Andrew Y.
2014-10-01
The limited soft tissue visualization provided by computed tomography, the standard imaging modality for radiotherapy treatment planning and daily localization, has motivated studies on the use of magnetic resonance imaging (MRI) for better characterization of treatment sites, such as the prostate and head and neck. However, no studies have been conducted on MRI-based segmentation for the abdomen, a site that could greatly benefit from enhanced soft tissue targeting. We investigated the interobserver and intraobserver precision in segmentation of abdominal organs on MR images for treatment planning and localization. Manual segmentation of 8 abdominal organs was performed by 3 independent observersmore » on MR images acquired from 14 healthy subjects. Observers repeated segmentation 4 separate times for each image set. Interobserver and intraobserver contouring precision was assessed by computing 3-dimensional overlap (Dice coefficient [DC]) and distance to agreement (Hausdorff distance [HD]) of segmented organs. The mean and standard deviation of intraobserver and interobserver DC and HD values were DC{sub intraobserver} = 0.89 ± 0.12, HD{sub intraobserver} = 3.6 mm ± 1.5, DC{sub interobserver} = 0.89 ± 0.15, and HD{sub interobserver} = 3.2 mm ± 1.4. Overall, metrics indicated good interobserver/intraobserver precision (mean DC > 0.7, mean HD < 4 mm). Results suggest that MRI offers good segmentation precision for abdominal sites. These findings support the utility of MRI for abdominal planning and localization, as emerging MRI technologies, techniques, and onboard imaging devices are beginning to enable MRI-based radiotherapy.« less
Kamishima, Tamotsu; Tanimura, Kazuhide; Henmi, Mihoko; Narita, Akihiro; Sakamoto, Fumihiko; Terae, Satoshi; Shirato, Hiroki
2009-05-01
The objective of this study was to assess interobserver uncertainties in power Doppler (PD) examination of the fingers of patients with rheumatoid arthritis (RA), by separating the source of the discrepancy into (1) acquisition of the images and (2) criteria for assessment of the images. Twenty patients who had been diagnosed with RA were enrolled in this study. Ultrasound examinations were performed by one inexperienced and two experienced sonographers. Interobserver variation was measured using a conventional semiquantitative image grading scale. Interobserver variation of the quantitative PD (QPD) index (the summation of the colored pixels in a region of interest) was also assessed. The agreement was higher between the two experienced sonographers (kappa value of 0.8) than between experienced and inexperienced sonographers (kappa value, 0.6-0.7) in the semiquantitative image grading scale. Results suggest that the difference in the assessment on the image grading scale was due more to the difference in the acquisition of the images than to variations in the grading criteria between sonographers. An excellent relationship was noted between the image grading scale and the QPD index for Doppler signal with a Spearman's coefficient of rank correlation of 0.83 (P < 0.0001). Interobserver discrepancies in the image grading and QPD index methods were due more to the difference in the acquisition of the image than to the grading criteria used. The QPD index seems to be as reliable as the image grading scale with reasonable interobserver agreement between experienced sonographers.
Liu, Chao; Cai, Hong-Xin; Zhang, Jian-Feng; Ma, Jian-Jun; Lu, Yin-Jiang; Fan, Shun-Wu
2014-03-01
The high-intensity zone (HIZ) on magnetic resonance imaging (MRI) has been studied for more than 20 years, but its diagnostic value in low back pain (LBP) is limited by the high incidence in asymptomatic subjects. Little effort has been made to improve the objective assessment of HIZ. To develop quantitative measurements for HIZ and estimate intra- and interobserver reliability and to clarify different signal intensity of HIZ in patients with or without LBP. A measurement reliability and prospective comparative study. A consecutive series of patients with LBP between June 2010 and May 2011 (group A) and a successive series of asymptomatic controls during the same period (group B). Incidence of HIZ; quantitative measures, including area of disc, area and signal intensity of HIZ, and magnetic resonance imaging index; and intraclass correlation coefficients (ICCs) for intra- and interobserver reliability. On the basis of HIZ criteria, a series of quantitative dimension and signal intensity measures was developed for assessing HIZ. Two experienced spine surgeons traced the region of interest twice within 4 weeks for assessment of the intra- and interobserver reliability. The quantitative variables were compared between groups A and B. There were 72 patients with LBP and 79 asymptomatic controls enrolling in this study. The prevalence of HIZ in group A and group B was 45.8% and 20.2%, respectively. The intraobserver agreement was excellent for the quantitative measures (ICC=0.838-0.977) as well as interobserver reliability (ICC=0.809-0.935). The mean signal of HIZ in group A was significantly brighter than in group B (57.55±14.04% vs. 45.61±7.22%, p=.000). There was no statistical difference of area of disc and HIZ between the two groups. The magnetic resonance imaging index was found to be higher in group A when compared with group B (3.94±1.71 vs. 3.06±1.50), but with a p value of .050. A series of quantitative measurements for HIZ was established and demonstrated excellent intra- and interobserver reliability. The signal intensity of HIZ was different in patients with or without LBP, and significant brighter signal was observed in symptomatic subjects. Copyright © 2014 Elsevier Inc. All rights reserved.
Verma, Nupur; Hippe, Daniel S; Robinson, Jeffrey D
2016-12-01
Peer review is an important and necessary part of radiology. There are several options to perform the peer review process. This study examines the reproducibility of peer review by comparing two scoring systems. American Board of Radiology-certified radiologists from various practice environments and subspecialties were recruited to score deidentified examinations on a web-based PACS with two scoring systems, RADPEER and Cleareview. Quantitative analysis of the scores was performed for interrater agreement. Interobserver variability was high for both the RADPEER and Cleareview scoring systems. The interobserver correlations (kappa values) were 0.17-0.23 for RADPEER and 0.10-0.16 for Cleareview. Interrater correlation was not statistically significantly different when comparing the RADPEER and Cleareview systems (p = 0.07-0.27). The kappa values were low for the Cleareview subscores when we evaluated for missed findings (0.26), satisfaction of search (0.17), and inadequate interpretation of findings (0.12). Our study confirms the previous report of low interobserver correlation when using the peer review process. There was low interobserver agreement seen when using both the RADPEER and the Cleareview scoring systems.
NASA Astrophysics Data System (ADS)
Jaspers, Mariëlle E.; Maltha, Ilse M.; Klaessens, John H.; Vet, Henrica C.; Verdaasdonk, Rudolf M.; Zuijlen, Paul P.
2016-02-01
In burn wounds early discrimination between the different depths plays an important role in the treatment strategy. The remaining vasculature in the wound determines its healing potential. Non-invasive measurement tools that can identify the vascularization are therefore considered to be of high diagnostic importance. Thermography is a non-invasive technique that can accurately measure the temperature distribution over a large skin or tissue area, the temperature is a measure of the perfusion of that area. The aim of this study was to investigate the clinimetric properties (i.e. reliability and validity) of thermography for measuring burn wound depth. In a cross-sectional study with 50 burn wounds of 35 patients, the inter-observer reliability and the validity between thermography and Laser Doppler Imaging were studied. With ROC curve analyses the ΔT cut-off point for different burn wound depths were determined. The inter-observer reliability, expressed by an intra-class correlation coefficient of 0.99, was found to be excellent. In terms of validity, a ΔT cut-off point of 0.96°C (sensitivity 71%; specificity 79%) differentiates between a superficial partial-thickness and deep partial-thickness burn. A ΔT cut-off point of -0.80°C (sensitivity 70%; specificity 74%) could differentiate between a deep partial-thickness and a full-thickness burn wound. This study demonstrates that thermography is a reliable method in the assessment of burn wound depths. In addition, thermography was reasonably able to discriminate among different burn wound depths, indicating its potential use as a diagnostic tool in clinical burn practice.
Galli, Marco; Ciriello, Vincenzo; Menghi, Amerigo; Aulisa, Angelo G; Rabini, Alessia; Marzetti, Emanuele
2013-06-01
To assess the interobserver concordance of the joint line tenderness (JLT) and McMurray tests, and to determine their diagnostic efficiency for the detection of meniscal lesions. Prospective observational study. Orthopedics outpatient clinic, university hospital. Patients (N=60) with suspected nonacute meniscal lesions who underwent knee arthroscopy. Not applicable. Patients were examined by 3 independent observers with graded levels of experience (>10y, 3y, and 4mo of practice). The interobserver concordance was assessed by Cohen-Fleiss κ statistics. Accuracy, negative and positive predictive values for prevalence 10% to 90%, positive (LR+) and negative (LR-) likelihood ratios, and the Bayesian posttest probability with a positive or negative result were also determined. The diagnostic value of the 2 tests combined was assessed by logistic regression. Arthroscopy was used as the reference test. No interobserver concordance was determined for the JLT. The McMurray test showed higher interobserver concordance, which improved when judgments by the less experienced examiner were discarded. The whole series studied by the "best" examiner (experienced orthopedist) provided the following values: (1) JLT: sensitivity, 62.9%; specificity, 50%; LR+, 1.26; LR-, .74; (2) McMurray: sensitivity, 34.3%; specificity, 86.4%; LR+, 2.52; LR-, .76. The combination of the 2 tests did not offer advantages over the McMurray alone. The JLT alone is of little clinical usefulness. A negative McMurray test does not modify the pretest probability of a meniscal lesion, while a positive result has a fair predictive value. Hence, in a patient with a suspected meniscal lesion, a positive McMurray test indicates that arthroscopy should be performed. In case of a negative result, further examinations, including imaging, are needed. Copyright © 2013 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Siafarikas, F; Staer-Jensen, J; Braekken, I H; Bø, K; Engh, M Ellström
2013-03-01
To evaluate the learning process for acquiring three- and four-dimensional (3D/4D) transperineal ultrasound volumes of the levator hiatus (LH) dimensions at rest, during pelvic floor muscle (PFM) contraction and on Valsalva maneuver, and for analyzing the ultrasound volumes, as well as to perform an interobserver reliability study between two independent ultrasound examiners. This was a prospective study including 22 women. We monitored the learning process of an inexperienced examiner (IE) performing 3D/4D transperineal ultrasonography and analyzing the volumes. The examination included acquiring volumes during three PFM contractions and three Valsalva maneuvers. LH dimensions were determined in the axial plane. The learning process was documented by estimating agreement between the IE and an experienced examiner (E) using the intraclass correlation coefficient. Agreement was calculated in blocks of 10 ultrasound examinations and analyzed volumes. After the learning process was complete the interobserver reliability for the technique was calculated between these two independent examiners. For offline analysis of the first 10 ultrasound volumes obtained by E, good to very good agreement between E and IE was achieved for all LH measurements except for the left and right levator-urethra gap and pubic arc. For the next 10 analyzed volumes, agreement improved for all LH measurements. Volumes that had been obtained by IE and E were then re-evaluated by IE, and good to very good agreement was found for all LH measurements indicating consistency in volume acquisition. The interobserver reliability study showed excellent ICC values (ICC, 0.81-0.97) for all LH measurements except the pubic arc (ICC = 0.67). 3D/4D transperineal ultrasound is a reliable technique that can be learned in a short period of time. Copyright © 2012 ISUOG. Published by John Wiley & Sons, Ltd.
Duong, Luc; Cheriet, Farida; Labelle, Hubert; Cheung, Kenneth M C; Abel, Mark F; Newton, Peter O; McCall, Richard E; Lenke, Lawrence G; Stokes, Ian A F
2009-08-01
Interobserver and intraobserver reliability study for the identification of the Lenke classification lumbar modifier by a panel of experts compared with a computer algorithm. To measure the variability of the Lenke classification lumbar modifier and determine if computer assistance using 3-dimensional spine models can improve the reliability of classification. The lumbar modifier has been proposed to subclassify Lenke scoliotic curve types into A, B, and C on the basis of the relationship between the central sacral vertical line (CSVL) and the apical lumbar vertebra. Landmarks for identification of the CSVL have not been clearly defined, and the reliability of the actual CSVL position and lumbar modifier selection have never been tested independently. Therefore, the value of the lumbar modifier for curve classification remains unknown. The preoperative radiographs of 68 patients with adolescent idiopathic scoliosis presenting a Lenke type 1 curve were measured manually twice by 6 members of the Scoliosis Research Society 3-dimensional classification committee at 6 months interval. Intraobserver and interobserver reliability was quantified using the percentage of agreement and kappa statistics. In addition, the lumbar curve of all subjects was reconstructed in 3-dimension using a stereoradiographic technique and was submitted to a computer algorithm to infer the lumbar modifier according to measurements from the pedicles. Interobserver rates for the first trial showed a mean kappa value of 0.56. Second trial rates were higher with a mean kappa value of 0.64. Intraobserver rates were evaluated at a mean kappa value of 0.69. The computer algorithm was successful in identifying the lumbar curve type and was in agreement with the observers by a proportion up to 93%. Agreement between and within observers for the Lenke lumbar modifier is only moderate to substantial with manual methods. Computer assistance with 3-dimensional models of the spine has the potential to decrease this variability.
Maroules, Christopher D; Hamilton-Craig, Christian; Branch, Kelley; Lee, James; Cury, Roberto C; Maurovich-Horvat, Pál; Rubinshtein, Ronen; Thomas, Dustin; Williams, Michelle; Guo, Yanshu; Cury, Ricardo C
The Coronary Artery Disease Reporting and Data System (CAD-RADS) provides a lexicon and standardized reporting system for coronary CT angiography. To evaluate inter-observer agreement of the CAD-RADS among an panel of early career and expert readers. Four early career and four expert cardiac imaging readers prospectively and independently evaluated 50 coronary CT angiography cases using the CAD-RADS lexicon. All readers assessed image quality using a five-point Likert scale, with mean Likert score ≥4 designating high image quality, and <4 designating moderate/low image quality. All readers were blinded to medical history and invasive coronary angiography findings. Inter-observer agreement for CAD-RADS assessment categories and modifiers were assessed using intra-class correlation (ICC) and Fleiss' Kappa (κ).The impact of reader experience and image quality on inter-observer agreement was also examined. Inter-observer agreement for CAD-RADS assessment categories was excellent (ICC 0.958, 95% CI 0.938-0.974, p < 0.0001). Agreement among expert readers (ICC 0.925, 95% CI 0.884-0.954) was marginally stronger than for early career readers (ICC 0.904, 95% CI 0.852-0.941), both p < 0.0001. High image quality was associated with stronger agreement than moderate image quality (ICC 0.944, 95% CI 0.886-0.974 vs. ICC 0.887, 95% CI 0.775-0.95, both p < 0.0001). While excellent inter-observer agreement was observed for modifiers S (stent) and G (bypass graft) (both κ = 1.0), only fair agreement (κ = 0.40) was observed for modifier V (high risk plaque). Inter-observer reproducibility of CAD-RADS assessment categories and modifiers is excellent, except for high-risk plaque (modifier V) which demonstrates fair agreement. These results suggest CAD-RADS is feasible for clinical implementation. Copyright © 2017. Published by Elsevier Inc.
Toluidine Blue 0.05% Vital Staining for the Diagnosis of Ocular Surface Squamous Neoplasia in Kenya.
Gichuhi, Stephen; Macharia, Ephantus; Kabiru, Joy; Zindamoyen, Alain M'bongo; Rono, Hilary; Ollando, Ernest; Wanyonyi, Leonard; Wachira, Joseph; Munene, Rhoda; Onyuma, Timothy; Jaoko, Walter G; Sagoo, Mandeep S; Weiss, Helen A; Burton, Matthew J
2015-11-01
Clinical features are unreliable for distinguishing ocular surface squamous neoplasia (OSSN) from benign conjunctival lesions. To evaluate the adverse effects, accuracy, and interobserver variation of toluidine blue 0.05% vital staining in distinguishing OSSN, confirmed by histopathology, from other conjunctival lesions. Cross-sectional study in Kenya from July 2012 through July 2014 of 419 adults with suspicious conjunctival lesions. Pregnant and breastfeeding women were excluded. Comprehensive ophthalmic slitlamp examination was conducted. Vital staining with toluidine blue 0.05% aqueous solution was performed before surgery. Initial safety testing was conducted on large tumors scheduled for exenteration looking for corneal toxicity on histology before testing smaller tumors. We asked about pain or discomfort after staining and evaluated the cornea at the slitlamp for epithelial defects. Lesions were photographed before and after staining. Diagnosis was confirmed by histopathology. Six examiners assessed photographs from a subset of 100 consecutive participants for staining and made a diagnosis of OSSN vs non-OSSN. Staining was compared with histopathology to estimate sensitivity, specificity, and predictive values. Adverse effects were enumerated. Interobserver agreement was estimated using the κ statistic. A total of 143 of 419 participants (34%) had OSSN by histopathology. The median age of all participants was 37 years (interquartile range, 32-45 years) and 278 (66%) were female. A total of 322 of the 419 participants had positive staining while 2 of 419 were equivocal. There was no histological evidence of corneal toxicity. Mild discomfort was reported by 88 (21%) and mild superficial punctate keratopathy seen in 7 (1.7%). For detecting OSSN, toluidine blue had a sensitivity of 92% (95% CI, 87%-96%), specificity of 31% (95% CI, 25%-36%), positive predictive value of 41% (95% CI, 35%-46%), and negative predictive value of 88% (95% CI, 80%-94%). Interobserver agreement was substantial for staining (κ = 0.76) and moderate for diagnosis (κ = 0.40). With the high sensitivity and low specificity for OSSN compared with histopathology among patients with conjunctival lesions, toluidine blue 0.05% vital staining is a good screening tool. However, it is not a good diagnostic tool owing to a high frequency of false-positives. The high negative predictive value suggests that a negative staining result indicates that OSSN is relatively unlikely.
Osterhoff, G; Amiri, S; Unno, F; Dodd, A; Guy, P; O'Brien, P J; Lefaivre, K A
2015-08-01
Minimal-invasive placement of screws into the posterior column of the acetabulum (PC) is challenging. Due to the saddle-shaped curvature of the medial cortical border of the PC, the standard fluoroscopic views of the pelvis cannot provide the desired safety during screw insertion. The aim of this study was to define a view tangentially to the medial cortex of the PC and to evaluate its accuracy and inter-observer reproducibility. Radio-dense markers on the medial cortex of the PC along the axis of a PC screw were brought in line and landmarks of the new "Down the PC" view were determined. Kirschner wires were placed into the PC of a pelvis composite model and five pelvic cadaver specimens in a total of 34 different correct and incorrect positions. Based on either only the "Down the PC" view, only the standard views, or a combination of both, three fellowship-trained orthopaedic surgeons had to decide if the inserted wires were in bone in the posterior column or had exited cortex, and if they penetrated the acetabulum. Sensitivity, specificity, and the intra-class correlation coefficient were calculated. A view using three radiographic landmarks (pelvic brim, medial cortical wall of the body of the ischium, ischial spine) was found. Sensitivity and specificity to detect perforation out of the bone were 1.00 and 0.97 for the "Down the PC" view, 0.46 and 0.97 if only the standard views were used, and 1.00 and 0.95 for a combination of both. Sensitivity and specificity to detect intra-articular wire placement were 1.00 and 0.96 for the "Down the PC" view, 0.72 and 0.95 if only the standard views were used, and 0.94 and 0.99 for a combination of both. Inter-observer agreement using only the "Down the PC" view was excellent with an ICC of 0.92 for perforation and ICC of 0.82 for intra-articular wire placement. The "Down the PC" view is a useful addendum in the orthopaedic trauma surgeon's tool box. Using simple landmarks, it is easily to reproduce and thereby shows excellent accuracy and inter-observer agreement in order to detect medial perforation or intra-articular implant position. Copyright © 2015 Elsevier Ltd. All rights reserved.
Diagnosing Nodular Regenerative Hyperplasia of the Liver Is Thwarted by Low Interobserver Agreement.
Jharap, Bindia; van Asseldonk, Dirk P; de Boer, Nanne K H; Bedossa, Pierre; Diebold, Joachim; Jonker, A Mieke; Leteurtre, Emmanuelle; Verheij, Joanne; Wendum, Dominique; Wrba, Fritz; Zondervan, Pieter E; Colombel, Jean-Frédéric; Reinisch, Walter; Mulder, Chris J J; Bloemena, Elisabeth; van Bodegraven, Adriaan A
2015-01-01
Nodular regenerative hyperplasia (NRH) of the liver is associated with several diseases and drugs. Clinical symptoms of NRH may vary from absence of symptoms to full-blown (non-cirrhotic) portal hypertension. However, diagnosing NRH is challenging. The objective of this study was to determine inter- and intraobserver agreement on the histopathologic diagnosis of NRH. Liver specimens (n=48) previously diagnosed as NRH, were reviewed for the presence of NRH by seven pathologists without prior knowledge of the original diagnosis or clinical background. The majority of the liver specimens were from thiopurine using inflammatory bowel disease patients. Histopathologic features contributing to NRH were also assessed. Criteria for NRH were modified by consensus and subsequently validated. Interobserver agreement was evaluated by using the standard kappa index. After review, definite NRH, inconclusive NRH and no NRH were found in 35% (23-40%), 21% (13-27%) and 44% (38-56%), respectively (median, IQR). The median interobserver agreement for NRH was poor (κ = 0.20, IQR 0.14-0.28). The intraobserver variability on NRH ranged between 14% and 71%. After modification of the criteria and exclusion of biopsies with technical shortcomings, the interobserver agreement on the diagnosis NRH was fair (κ = 0.45). The interobserver agreement on the histopathologic diagnosis of NRH was poor, even when assessed by well-experienced liver pathologists. Modification of the criteria of NRH based on consensus effort and exclusion of biopsies of poor quality led to a fairly increased interobserver agreement. The main conclusion of this study is that NRH is a clinicopathologic diagnosis that cannot reliably be based on histopathology alone.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levegruen, Sabine, E-mail: sabine.levegruen@uni-due.de; Poettgen, Christoph; Abu Jawad, Jehad
Purpose: To evaluate megavoltage computed tomography (MVCT)-based image guidance with helical tomotherapy in patients with vertebral tumors by analyzing factors influencing interobserver variability, considered as quality criterion of image guidance. Methods and Materials: Five radiation oncologists retrospectively registered 103 MVCTs in 10 patients to planning kilovoltage CTs by rigid transformations in 4 df. Interobserver variabilities were quantified using the standard deviations (SDs) of the distributions of the correction vector components about the observers' fraction mean. To assess intraobserver variabilities, registrations were repeated after {>=}4 weeks. Residual deviations after setup correction due to uncorrectable rotational errors and elastic deformations were determinedmore » at 3 craniocaudal target positions. To differentiate observer-related variations in minimizing these residual deviations across the 3-dimensional MVCT from image resolution effects, 2-dimensional registrations were performed in 30 single transverse and sagittal MVCT slices. Axial and longitudinal MVCT image resolutions were quantified. For comparison, image resolution of kilovoltage cone-beam CTs (CBCTs) and interobserver variability in registrations of 43 CBCTs were determined. Results: Axial MVCT image resolution is 3.9 lp/cm. Longitudinal MVCT resolution amounts to 6.3 mm, assessed as full-width at half-maximum of thin objects in MVCTs with finest pitch. Longitudinal CBCT resolution is better (full-width at half-maximum, 2.5 mm for CBCTs with 1-mm slices). In MVCT registrations, interobserver variability in the craniocaudal direction (SD 1.23 mm) is significantly larger than in the lateral and ventrodorsal directions (SD 0.84 and 0.91 mm, respectively) and significantly larger compared with CBCT alignments (SD 1.04 mm). Intraobserver variabilities are significantly smaller than corresponding interobserver variabilities (variance ratio [VR] 1.8-3.1). Compared with 3-dimensional registrations, 2-dimensional registrations have significantly smaller interobserver variability in the lateral and ventrodorsal directions (VR 3.8 and 2.8, respectively) but not in the craniocaudal direction (VR 0.75). Conclusion: Tomotherapy image guidance precision is affected by image resolution and residual deviations after setup correction. Eliminating the effect of residual deviations yields small interobserver variabilities with submillimeter precision in the axial plane. In contrast, interobserver variability in the craniocaudal direction is dominated by the poorer longitudinal MVCT image resolution. Residual deviations after image guidance exist and need to be considered when dose gradients ultimately achievable with image guided radiation therapy techniques are analyzed.« less
Campbell, Amelia; Owen, Rebecca; Brown, Elizabeth; Pryor, David; Bernard, Anne; Lehman, Margot
2015-08-01
Cone beam computerised tomography (CBCT) enables soft tissue visualisation to optimise matching in the post-prostatectomy setting, but is associated with inter-observer variability. This study assessed the accuracy and consistency of automated soft tissue localisation using XVI's dual registration tool (DRT). Sixty CBCT images from ten post-prostatectomy patients were matched using: (i) the DRT and (ii) manual soft tissue registration by six radiation therapists (RTs). Shifts in the three Cartesian planes were recorded. The accuracy of the match was determined by comparing shifts to matches performed by two genitourinary radiation oncologists (ROs). A Bland-Altman method was used to assess the 95% levels of agreement (LoA). A clinical threshold of 3 mm was used to define equivalence between methods of matching. The 95% LoA between DRT-ROs in the superior/inferior, left/right and anterior/posterior directions were -2.21 to +3.18 mm, -0.77 to +0.84 mm, and -1.52 to +4.12 mm, respectively. The 95% LoA between RTs-ROs in the superior/inferior, left/right and anterior/posterior directions were -1.89 to +1.86 mm, -0.71 to +0.62 mm and -2.8 to +3.43 mm, respectively. Five DRT CBCT matches (8.33%) were outside the 3-mm threshold, all in the setting of bladder underfilling or rectal gas. The mean time for manual matching was 82 versus 65 s for DRT. XVI's DRT is comparable with RTs manually matching soft tissue on CBCT. The DRT can minimise RT inter-observer variability; however, involuntary bladder and rectal filling can influence the tools accuracy, highlighting the need for RT evaluation of the DRT match. © 2015 The Royal Australian and New Zealand College of Radiologists.
Eijgenraam, Susanne M; Boselie, Toon F M; Sieben, Judith M; Bastiaenen, Caroline H G; Willems, Paul C; Arts, Jacobus J; Lataster, Arno
2017-02-01
The amount of vertebral rotation in the axial plane is of key importance in the prognosis and treatment of adolescent idiopathic scoliosis (AIS). Current methods to determine vertebral rotation are either designed for use in analogue plain radiographs and not useful in digital images, or lack measurement precision and are therefore less suitable for the follow-up of rotation in AIS patients. This study aimed to develop a digital X-ray software tool with high measurement precision to determine vertebral rotation in AIS, and to assess its (concurrent) validity and reliability. In this study a combination of basic science and reliability methodology applied in both laboratory and clinical settings was used. Software was developed using the algorithm of the Perdriolle torsion meter for analogue AP plain radiographs of the spine. Software was then assessed for (1) concurrent validity and (2) intra- and interobserver reliability. Plain radiographs of both human cadaver vertebrae and outpatient AIS patients were used. Concurrent validity was measured by two independent observers, both experienced in the assessment of plain radiographs. Reliability-measurements were performed by three independent spine surgeons. Pearson correlation of the software compared with the analogue Perdriolle torsion meter for mid-thoracic vertebrae was 0.98, for low-thoracic vertebrae 0.97 and for lumbar vertebrae 0.97. Measurement exactness of the software was within 5° in 62% of cases and within 10° in 97% of cases. Intraclass correlation coefficient (ICC) for inter-observer reliability was 0.92 (0.91-0.95), ICC for intra-observer reliability was 0.96 (0.94-0.97). We developed a digital X-ray software tool to determine vertebral rotation in AIS with a substantial concurrent validity and reliability, which may be useful for the follow-up of vertebral rotation in AIS patients. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Bell, L. R.; Dowling, J. A.; Pogson, E. M.; Metcalfe, P.; Holloway, L.
2017-01-01
Accurate, efficient auto-segmentation methods are essential for the clinical efficacy of adaptive radiotherapy delivered with highly conformal techniques. Current atlas based auto-segmentation techniques are adequate in this respect, however fail to account for inter-observer variation. An atlas-based segmentation method that incorporates inter-observer variation is proposed. This method is validated for a whole breast radiotherapy cohort containing 28 CT datasets with CTVs delineated by eight observers. To optimise atlas accuracy, the cohort was divided into categories by mean body mass index and laterality, with atlas’ generated for each in a leave-one-out approach. Observer CTVs were merged and thresholded to generate an auto-segmentation model representing both inter-observer and inter-patient differences. For each category, the atlas was registered to the left-out dataset to enable propagation of the auto-segmentation from atlas space. Auto-segmentation time was recorded. The segmentation was compared to the gold-standard contour using the dice similarity coefficient (DSC) and mean absolute surface distance (MASD). Comparison with the smallest and largest CTV was also made. This atlas-based auto-segmentation method incorporating inter-observer variation was shown to be efficient (<4min) and accurate for whole breast radiotherapy, with good agreement (DSC>0.7, MASD <9.3mm) between the auto-segmented contours and CTV volumes.
Hosseinpour-Feizi, Hojjat; Soleimanpour, Jafar; Sales, Jafar Ganjpour; Arzroumchilar, Ali
2011-01-01
Purpose The aim of this study was to investigate the interobserver agreement of the Lenke and King classifications for adolescent idiopathic scoliosis, and to compare the results of surgery performed based on classification of the scoliosis according to each of these classification systems. Methods The study was conducted in Shohada Hospital in Tabriz, Iran, between 2009 and 2010. First, a reliability assessment was undertaken to assess interobserver agreement of the Lenke and King classifications for adolescent idiopathic scoliosis. Second, postoperative efficacy and safety of surgery performed based on the Lenke and King classifications were compared. Kappa coefficients of agreement were calculated to assess the agreement. Outcomes were compared using bivariate tests and repeated measures analysis of variance. Results A low to moderate interobserver agreement was observed for the King classification; the Lenke classification yielded mostly high agreement coefficients. The outcome of surgery was not found to be substantially different between the two systems. Conclusion Based on the results, the Lenke classification method seems advantageous. This takes into consideration the Lenke classification’s priority in providing details of curvatures in different anatomical surfaces to explain precise intensity of scoliosis, that it has higher interobserver agreement scores, and also that it leads to noninferior postoperative results compared with the King classification method. PMID:22267934
Hosseinpour-Feizi, Hojjat; Soleimanpour, Jafar; Sales, Jafar Ganjpour; Arzroumchilar, Ali
2011-01-01
The aim of this study was to investigate the interobserver agreement of the Lenke and King classifications for adolescent idiopathic scoliosis, and to compare the results of surgery performed based on classification of the scoliosis according to each of these classification systems. The study was conducted in Shohada Hospital in Tabriz, Iran, between 2009 and 2010. First, a reliability assessment was undertaken to assess interobserver agreement of the Lenke and King classifications for adolescent idiopathic scoliosis. Second, postoperative efficacy and safety of surgery performed based on the Lenke and King classifications were compared. Kappa coefficients of agreement were calculated to assess the agreement. Outcomes were compared using bivariate tests and repeated measures analysis of variance. A low to moderate interobserver agreement was observed for the King classification; the Lenke classification yielded mostly high agreement coefficients. The outcome of surgery was not found to be substantially different between the two systems. Based on the results, the Lenke classification method seems advantageous. This takes into consideration the Lenke classification's priority in providing details of curvatures in different anatomical surfaces to explain precise intensity of scoliosis, that it has higher interobserver agreement scores, and also that it leads to noninferior postoperative results compared with the King classification method.
High resolution microendoscopy for classification of colorectal polyps.
Chang, S S; Shukla, R; Polydorides, A D; Vila, P M; Lee, M; Han, H; Kedia, P; Lewis, J; Gonzalez, S; Kim, M K; Harpaz, N; Godbold, J; Richards-Kortum, R; Anandasabapathy, S
2013-07-01
It can be difficult to distinguish adenomas from benign polyps during routine colonoscopy. High resolution microendoscopy (HRME) is a novel method for imaging colorectal mucosa with subcellular detail. HRME criteria for the classification of colorectal neoplasia have not been previously described. Study goals were to develop criteria to characterize HRME images of colorectal mucosa (normal, hyperplastic polyps, adenomas, cancer) and to determine the accuracy and interobserver variability for the discrimination of neoplastic from non-neoplastic polyps when these criteria were applied by novice and expert microendoscopists. Two expert pathologists created consensus HRME image criteria using images from 68 patients with polyps who had undergone colonoscopy plus HRME. Using these criteria, HRME expert and novice microendoscopists were shown a set of training images and then tested to determine accuracy and interobserver variability. Expert microendoscopists identified neoplasia with sensitivity, specificity, and accuracy of 67 % (95 % confidence interval [CI] 58 % - 75 %), 97 % (94 % - 100 %), and 87 %, respectively. Nonexperts achieved sensitivity, specificity, and accuracy of 73 % (66 % - 80 %), 91 % (80 % - 100 %), and 85 %, respectively. Overall, neoplasia were identified with sensitivity 70 % (65 % - 76 %), specificity 94 % (87 % - 100 %), and accuracy 85 %. Kappa values were: experts 0.86; nonexperts 0.72; and overall 0.78. Using the new criteria, observers achieved high specificity and substantial interobserver agreement for distinguishing benign polyps from neoplasia. Increased expertise in HRME imaging improves accuracy. This low-cost microendoscopic platform may be an alternative to confocal microendoscopy in lower-resource or community-based settings.
Intra- and inter-observer agreement in histological assessment of canine soft tissue sarcoma.
Yap, F W; Rasotto, R; Priestnall, S L; Parsons, K J; Stewart, J
2017-12-01
The diagnosis of canine soft tissue sarcoma (STS) is based on histological assessment. Assessment of criteria such as, degree of differentiation, necrosis score and mitotic score, gives rise to a final tumour grade, which is important in the recommendation of treatment and prognosis of patients. Previously diagnosed cases of STS were independently assessed by three board-certified veterinary pathologists. Participating pathologists were blinded to the original results. For the intra-observer study, the cases were assessed by a single pathologist six months apart and slides were randomized between readings. For the inter-observer study, the whole case series was assessed by a single pathologist before being passed onto the next pathologist. Intraclass correlation coefficient (ICC) and Fleiss's Kappa (ƙ) for the intra- (single observer) and inter-observer agreement. Strong agreement was observed for the intra-observer assessment in necrosis score, mitotic score, total score and tumour grading (ICC between 0.78 to 0.91). The intra-observer agreement for differentiation score was rated perfect (ICC 1.00). The agreement between pathologists for the diagnosis and grading of canine STS was moderate (ƙ = 0.60 and 0.43 respectively). Histological assessment of canine STS had high reproducibility by an individual pathologist. The agreement of diagnosis and grading of canine STS was moderate between pathologists. Future studies are required to investigate further assessment criteria to improve the specificity of STS diagnosis and the accuracy of the STS grading in dogs. © 2017 John Wiley & Sons Ltd.
Maduz, Roman; Kugelmeier, Patrick; Meili, Severin; Döring, Robert; Meier, Christoph; Wahl, Peter
2017-04-01
The Abbreviated Injury Scale (AIS) and the Injury Severity Score (ISS) find increasingly widespread use to assess trauma burden and to perform interhospital benchmarking through trauma registries. Since 2015, public resource allocation in Switzerland shall even be derived from such data. As every trauma centre is responsible for its own coding and data input, this study aims at evaluating interobserver reliability of AIS and ISS coding. Interobserver reliability of the AIS and ISS is analysed from a cohort of 50 consecutive severely injured patients treated in 2012 at our institution, coded retrospectively by 3 independent and specifically trained observers. Considering a cutoff ISS≥16, only 38/50 patients (76%) were uniformly identified as polytraumatised or not. Increasing the cut off to ≥20, this increased to 41/50 patients (82%). A difference in the AIS of ≥ 1 was present in 261 (16%) of possible codes. Excluding the vast majority of uninjured body regions, uniformly identical AIS severity values were attributed in 67/193 (35%) body regions, or 318/579 (55%) possible observer pairings. Injury severity all too often is neither identified correctly nor consistently when using the AIS. This leads to wrong identification of severely injured patients using the ISS. Improving consistency of coding through centralisation is recommended before scores based on the AIS are to be used for interhospital benchmarking and resource allocation in the treatment of severely injured patients. Copyright © 2017. Published by Elsevier Ltd.
Schreiter, V; Steffen, I; Huebner, H; Bredow, J; Heimann, U; Kroencke, T J; Poellinger, A; Doellinger, F; Buchert, R; Hamm, B; Brenner, W; Schreiter, N F
2015-01-01
The purpose of this study was to evaluate the reproducibility of a new software based analysing system for ventilation/perfusion single-photon emission computed tomography/computed tomography (V/P SPECT/CT) in patients with pulmonary emphysema and to compare it to the visual interpretation. 19 patients (mean age: 68.1 years) with pulmonary emphysema who underwent V/P SPECT/CT were included. Data were analysed by two independent observers in visual interpretation (VI) and by software based analysis system (SBAS). SBAS PMOD version 3.4 (Technologies Ltd, Zurich, Switzerland) was used to assess counts and volume per lung lobe/per lung and to calculate the count density per lung, lobe ratio of counts and ratio of count density. VI was performed using a visual scale to assess the mean counts per lung lobe. Interobserver variability and association for SBAS and VI were analysed using Spearman's rho correlation coefficient. Interobserver agreement correlated highly in perfusion (rho: 0.982, 0.957, 0.90, 0.979) and ventilation (rho: 0.972, 0.924, 0.941, 0.936) for count/count density per lobe and ratio of counts/count density in SBAS. Interobserver agreement correlated clearly for perfusion (rho: 0.655) and weakly for ventilation (rho: 0.458) in VI. SBAS provides more reproducible measures than VI for the relative tracer uptake in V/P SPECT/CTs in patients with pulmonary emphysema. However, SBAS has to be improved for routine clinical use.
Koh, D-M; Collins, D J; Wallace, T; Chau, I; Riddell, A M
2012-07-01
To compare the diagnostic accuracy of gadolinium-ethoxybenzyl-diethylenetriaminepentaacetic acid (Gd-EOB-DTPA)-enhanced MRI, diffusion-weighted MRI (DW-MRI) and a combination of both techniques for the detection of colorectal hepatic metastases. 72 patients with suspected colorectal liver metastases underwent Gd-EOB-DTPA MRI and DW-MRI. Images were retrospectively reviewed with unenhanced T(1) and T(2) weighted images as Gd-EOB-DTPA image set, DW-MRI image set and combined image set by two independent radiologists. Each lesion detected was scored for size, location and likelihood of metastasis, and compared with surgery and follow-up imaging. Diagnostic accuracy was compared using receiver operating characteristics and interobserver agreement by kappa statistics. 417 lesions (310 metastases, 107 benign) were found in 72 patients. For both readers, diagnostic accuracy using the combined image set was higher [area under the curve (Az)=0.96, 0.97] than Gd-EOB-DTPA image set (Az=0.86, 0.89) or DW-MRI image set (Az=0.93, 0.92). Using combined image set improved identification of liver metastases compared with Gd-EOB-DTPA image set (p<0.001) or DW-MRI image set (p<0.001). There was very good interobserver agreement for lesion classification (κ=0.81-0.88). Combining DW-MRI with Gd-EOB-DTPA-enhanced T(1) weighted MRI significantly improved the detection of colorectal liver metastases.
Aminoff, Bechor Z; Purits, Elena; Noy, Shlomo; Adunsky, Abraham
2004-01-01
Assessment of suffering is extremely important in dying end-stage dementia patients (ESDP). We have developed and examined the reliability and validity of the Mini-Suffering State Examination (MSSE), in 103 consecutive bedridden ESDP. Main outcome measures included inter-observer reliability and concurrent validity. Reliability of the MSSE questionnaire was satisfactory, with Cronbach alpha values of 0.735 and 0.718 for the two physicians (Ph-1, Ph-2), respectively. The kappa agreement coefficient was 0.791. There was a high agreement for seven items (kappa 0.882-0.972) and a substantial agreement for the other three items (kappa 0.621-0.682) of the MSSE. MSSE was validated versus the comfort assessment in dying with dementia (CAD-EOLD) scale and resulted in a significant Pearson correlation (r=-0.796, P<0.001). We conclude that the MSSE scale is a reliable and valid clinical tool, recommended for evaluating the severity of the patient's condition and the level of suffering of ESDP. Use of MSSE may improve medical management and facilitate communication between patients and caregivers.
Chuong, Anh Minh; Corno, Lucie; Beaussier, Hélène; Boulay-Coletta, Isabelle; Millet, Ingrid; Hodel, Jérôme; Taourel, Patrice; Chatellier, Gilles; Zins, Marc
2016-07-01
Purpose To determine whether adding unenhanced computed tomography (CT) to contrast material-enhanced CT improves the diagnostic performance of decreased bowel wall enhancement as a sign of ischemia complicating mechanical small bowel obstruction (SBO). Materials and Methods This retrospective study was approved by the institutional review board, which waived the requirement for informed consent. Two gastrointestinal radiologists independently performed retrospective assessments of 164 unenhanced and contrast-enhanced CT studies from 158 consecutive patients (mean age, 71.2 years) with mechanical SBO. The reference standard was the intraoperative and/or histologic diagnosis (in 80 cases) or results from clinical follow-up in patients who did not undergo surgery (84 cases). Decreased bowel wall enhancement was evaluated with contrast-enhanced images then and both unenhanced and contrast-enhanced images 1 month later. Diagnostic performance of decreased bowel wall enhancement and confidence in the diagnosis were compared between the two readings by using McNemar and Wilcoxon signed rank tests. Interobserver agreement was assessed by using κ statistics and compared with bootstrapping. Results Ischemia was diagnosed in 41 of 164 (25%) episodes of SBO. For both observers, adding unenhanced images improved decreased bowel wall enhancement sensitivity (observer 1: 46.3% [19 of 41] vs 65.8% [27 of 41], P = .02; observer 2: 56.1% [23 of 41] vs 63.4% [26 of 41], P = .45), Youden index (from 0.41 to 0.58 for observer 1 and from 0.42 to 0.61 for observer 2), and confidence score (P < .001 for both). Specificity significantly increased for observer 2 (84.5% [104 of 123] vs 94.3% [116 of 123], P = .002), and interobserver agreement significantly increased, from moderate (κ = 0.48) to excellent (κ = 0.89; P < .0001). Conclusion Adding unenhanced CT to contrast-enhanced CT improved the sensitivity, diagnostic confidence, and interobserver agreement of the diagnosis of ischemia, a complication of mechanical SBO, on the basis of decreased bowel wall enhancement. (©) RSNA, 2016.
Sassowsky, Manfred; Gut, Philipp; Hölscher, Tobias; Hildebrandt, Guido; Müller, Arndt-Christian; Najafi, Yousef; Kohler, Götz; Kranzbühler, Helmut; Guckenberger, Matthias; Zwahlen, Daniel R; Azinwi, Ngwa C; Plasswilm, Ludwig; Takacs, Istvan; Reuter, Christiane; Sumila, Marcin; Manser, Peter; Ost, Piet; Böhmer, Dirk; Pilop, Christiane; Aebersold, Daniel M; Ghadjar, Pirus
2013-11-01
Different international target volume delineation guidelines exist and different treatment techniques are available for salvage radiation therapy (RT) for recurrent prostate cancer, but less is known regarding their respective applicability in clinical practice. A randomized phase III trial testing 64 Gy vs 70 Gy salvage RT was accompanied by an intense quality assurance program including a site-specific and study-specific questionnaire and a dummy run (DR). Target volume delineation was performed according to the European Organisation for the Research and Treatment of Cancer guidelines, and a DR-based treatment plan was established for 70 Gy. Major and minor protocol deviations were noted, interobserver agreement of delineated target contours was assessed, and dose-volume histogram (DVH) parameters of different treatment techniques were compared. Thirty European centers participated, 43% of which were using 3-dimensional conformal RT (3D-CRT), with the remaining centers using intensity modulated RT (IMRT) or volumetric modulated arc technique (VMAT). The first submitted version of the DR contained major deviations in 21 of 30 (70%) centers, mostly caused by inappropriately defined or lack of prostate bed (PB). All but 5 centers completed the DR successfully with their second submitted version. The interobserver agreement of the PB was moderate and was improved by the DR review, as indicated by an increased κ value (0.59 vs 0.55), mean sensitivity (0.64 vs 0.58), volume of total agreement (3.9 vs 3.3 cm(3)), and decrease in the union volume (79.3 vs 84.2 cm(3)). Rectal and bladder wall DVH parameters of IMRT and VMAT vs 3D-CRT plans were not significantly different. The interobserver agreement of PB delineation was moderate but was improved by the DR. Major deviations could be identified for the majority of centers. The DR has improved the acquaintance of the participating centers with the trial protocol. Copyright © 2013 Elsevier Inc. All rights reserved.
Kawaguchi, Yurika Maria Fogaça; Nawa, Ricardo Kenji; Figueiredo, Thais Borgheti; Martins, Lourdes; Pires-Neto, Ruy Camargo
2016-01-01
ABSTRACT Objective: To translate the Perme Intensive Care Unit Mobility Score and the ICU Mobility Scale (IMS) into Portuguese, creating versions that are cross-culturally adapted for use in Brazil, and to determine the interobserver agreement and reliability for both versions. Methods: The processes of translation and cross-cultural validation consisted in the following: preparation, translation, reconciliation, synthesis, back-translation, review, approval, and pre-test. The Portuguese-language versions of both instruments were then used by two researchers to evaluate critically ill ICU patients. Weighted kappa statistics and Bland-Altman plots were used in order to verify interobserver agreement for the two instruments. In each of the domains of the instruments, interobserver reliability was evaluated with Cronbach's alpha coefficient. The correlation between the instruments was assessed by Spearman's correlation test. Results: The study sample comprised 103 patients-56 (54%) of whom were male-with a mean age of 52 ± 18 years. The main reason for ICU admission (in 44%) was respiratory failure. Both instruments showed excellent interobserver agreement (κ > 0.90) and reliability (α > 0.90) in all domains. Interobserver bias was low for the IMS and the Perme Score (−0.048 ± 0.350 and −0.06 ± 0.73, respectively). The 95% CIs for the same instruments ranged from −0.73 to 0.64 and −1.50 to 1.36, respectively. There was also a strong positive correlation between the two instruments (r = 0.941; p < 0.001). Conclusions: In their versions adapted for use in Brazil, both instruments showed high interobserver agreement and reliability. PMID:28117473
Amini, Michael H; Sykes, Joshua B; Olson, Stephen T; Smith, Richard A; Mauck, Benjamin M; Azar, Frederick M; Throckmorton, Thomas W
2015-03-01
The severity of elbow arthritis is one of many factors that surgeons must evaluate when considering treatment options for a given patient. Elbow surgeons have historically used the Broberg and Morrey (BM) and Hastings and Rettig (HR) classification systems to radiographically stage the severity of post-traumatic arthritis (PTA) and primary osteoarthritis (OA). We proposed to compare the intraobserver and interobserver reliability between systems for patients with either PTA or OA. The radiographs of 45 patients were evaluated at least 2 weeks apart by 6 evaluators of different levels of training. Intraobserver and interobserver reliability were calculated by Spearman correlation coefficients with 95% confidence intervals. Agreement was considered almost perfect for coefficients >0.80 and substantial for coefficients of 0.61 to 0.80. In patients with both PTA and OA, intraobserver reliability and interobserver reliability were substantial, with no difference between classification systems. There were no significant differences in intraobserver or interobserver reliability between attending physicians and trainees for either classification system (all P > .10). The presence of fracture implants did not affect reliability in the BM system but did substantially worsen reliability in the HR system (intraobserver P = .04 and interobserver P = .001). The BM and HR classifications both showed substantial intraobserver and interobserver reliability for PTA and OA. Training level differences did not affect reliability for either system. Both trainees and fellowship-trained surgeons may easily and reliably apply each classification system to the evaluation of primary elbow OA and PTA, although the HR system was less reliable in the presence of fracture implants. Copyright © 2015 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
Diagnosing Nodular Regenerative Hyperplasia of the Liver Is Thwarted by Low Interobserver Agreement
Jharap, Bindia; van Asseldonk, Dirk P.; de Boer, Nanne K. H.; Bedossa, Pierre; Diebold, Joachim; Jonker, A. Mieke; Leteurtre, Emmanuelle; Verheij, Joanne; Wendum, Dominique; Wrba, Fritz; Zondervan, Pieter E.; Colombel, Jean-Frédéric; Reinisch, Walter; Mulder, Chris J. J.; Bloemena, Elisabeth; van Bodegraven, Adriaan A.
2015-01-01
Background and Aims Nodular regenerative hyperplasia (NRH) of the liver is associated with several diseases and drugs. Clinical symptoms of NRH may vary from absence of symptoms to full-blown (non-cirrhotic) portal hypertension. However, diagnosing NRH is challenging. The objective of this study was to determine inter- and intraobserver agreement on the histopathologic diagnosis of NRH. Methods Liver specimens (n=48) previously diagnosed as NRH, were reviewed for the presence of NRH by seven pathologists without prior knowledge of the original diagnosis or clinical background. The majority of the liver specimens were from thiopurine using inflammatory bowel disease patients. Histopathologic features contributing to NRH were also assessed. Criteria for NRH were modified by consensus and subsequently validated. Interobserver agreement was evaluated by using the standard kappa index. Results After review, definite NRH, inconclusive NRH and no NRH were found in 35% (23-40%), 21% (13-27%) and 44% (38-56%), respectively (median, IQR). The median interobserver agreement for NRH was poor (κ = 0.20, IQR 0.14-0.28). The intraobserver variability on NRH ranged between 14% and 71%. After modification of the criteria and exclusion of biopsies with technical shortcomings, the interobserver agreement on the diagnosis NRH was fair (κ = 0.45). Conclusions The interobserver agreement on the histopathologic diagnosis of NRH was poor, even when assessed by well-experienced liver pathologists. Modification of the criteria of NRH based on consensus effort and exclusion of biopsies of poor quality led to a fairly increased interobserver agreement. The main conclusion of this study is that NRH is a clinicopathologic diagnosis that cannot reliably be based on histopathology alone. PMID:26054009
Høyer, Christian; Pavar, Susanne; Pedersen, Begitte H; Biurrun Manresa, José A; Petersen, Lars J
2013-08-01
Mercury-in-silastic strain gauge pletysmography (SGP) is a well-established technique for blood flow and blood pressure measurements. The aim of this study was to examine (i) the possible influence of clinical clues, e.g. the presence of wounds and color changes during blood pressure measurements, and (ii) intra- and inter-observer variation of curve interpretation for segmental blood pressure measurements. A total of 204 patients with known or suspected peripheral arterial disease (PAD) were included in a diagnostic accuracy trial. Toe and ankle pressures were measured in both limbs, and primary observers analyzed a total of 804 pressure curve sets. The SGP curves were later reanalyzed separately by two observers blinded to clinical clues. Intra- and inter-observer agreement was quantified using Cohen's kappa and reliability was quantified using intra-class correlation coefficients, coefficients of variance, and Bland-Altman analysis. There was an overall agreement regarding patient diagnostic classification (PAD/not PAD) in 202/204 (99.0%) for intra-observer (κ = 0.969, p < 0.001), and 201/204 (98.5%) for inter-observer readings (κ = 0.953, p < 0.001). Reliability analysis showed excellent correlation between blinded versus non-blinded and inter-observer readings for determination of absolute segmental pressures (all intraclass correlation coefficients ≥ 0.984). The coefficient of variance for determination of absolute segmental blood pressure ranged from 2.9-3.4% for blinded/non-blinded data and from 3.8-5.0% for inter-observer data. This study shows a low inter-observer variation among experienced laboratory technicians for reading strain gauge curves. The low variation between blinded/non-blinded readings indicates that SGP measurements are minimally biased by clinical clues.
Radiographic classifications in Perthes disease
Huhnstock, Stefan; Svenningsen, Svein; Merckoll, Else; Catterall, Anthony; Terjesen, Terje; Wiig, Ola
2017-01-01
Background and purpose Different radiographic classifications have been proposed for prediction of outcome in Perthes disease. We assessed whether the modified lateral pillar classification would provide more reliable interobserver agreement and prognostic value compared with the original lateral pillar classification and the Catterall classification. Patients and methods 42 patients (38 boys) with Perthes disease were included in the interobserver study. Their mean age at diagnosis was 6.5 (3–11) years. 5 observers classified the radiographs in 2 separate sessions according to the Catterall classification, the original and the modified lateral pillar classifications. Interobserver agreement was analysed using weighted kappa statistics. We assessed the associations between the classifications and femoral head sphericity at 5-year follow-up in 37 non-operatively treated patients in a crosstable analysis (Gamma statistics for ordinal variables, γ). Results The original lateral pillar and Catterall classifications showed moderate interobserver agreement (kappa 0.49 and 0.43, respectively) while the modified lateral pillar classification had fair agreement (kappa 0.40). The original lateral pillar classification was strongly associated with the 5-year radiographic outcome, with a mean γ correlation coefficient of 0.75 (95% CI: 0.61–0.95) among the 5 observers. The modified lateral pillar and Catterall classifications showed moderate associations (mean γ correlation coefficient 0.55 [95% CI: 0.38–0.66] and 0.64 [95% CI: 0.57–0.72], respectively). Interpretation The Catterall classification and the original lateral pillar classification had sufficient interobserver agreement and association to late radiographic outcome to be suitable for clinical use. Adding the borderline B/C group did not increase the interobserver agreement or prognostic value of the original lateral pillar classification. PMID:28613966
Broekstra, Dieuwke C; Lanting, Rosanne; Werker, Paul M N; van den Heuvel, Edwin R
2015-08-01
Dupuytren disease (DD) is a fibrosing disease affecting the palmar aponeurosis, and is mostly treated by surgery based on measurement of severity of flexion contracture of the fingers. Literature concerning the measurement reliability is scarce. This study aimed to determine the intra- and inter-observer agreement of four variables for diagnosing DD, determining severity of contracture, and disease extent. One of them is a new measurement on the area of nodules and cords for measuring the disease extent in early disease stages. An agreement study (n = 54) was performed by two trained investigators. Agreement was calculated per finger, based on an intraclass correlation coefficient (ICC) using a latent variable model on subjects for diagnosis and Tubiana stage. For total passive extension deficit (TPED) and the area of nodules and cords, agreement was calculated with an ICC using a one-way random effects model with subject as random effect. Inter-observer agreement was very good for diagnosing DD (ICC: 95.5%-99.9%) and good to very good for classifying Tubiana stage (ICC: 73.5%-94.9%). Agreements for area and TPED were moderate (middle finger) to very good (ICC: 48.4%-98.6% and 45.0%-99.5%, respectively). Intra-observer agreement was slightly higher on average than inter-observer agreement. Overall, the intra- and inter-observer agreement in diagnosing DD, and determining the severity of flexion contracture is high. Also, the newly introduced variable area of nodules and cords has high intra- and inter-observer agreement, indicating that it is suitable to measure disease extent. Copyright © 2015 Elsevier Ltd. All rights reserved.
Kerkhof, M; Hagenbeek, R E; van der Kallen, B F W; Lycklama À Nijeholt, G J; Dirven, L; Taphoorn, M J B; Vos, M J
2016-10-01
Conventional magnetic resonance imaging (MRI) has limited value for differentiation of true tumor progression and pseudoprogression in treated glioblastoma multiforme (GBM). Perfusion weighted imaging (PWI) may be helpful in the differentiation of these two phenomena. Here interobserver variability in routine radiological evaluation of GBM patients is assessed using MRI, including PWI. Three experienced neuroradiologists evaluated MR scans of 28 GBM patients during temozolomide chemoradiotherapy at three time points: preoperative (MR1) and postoperative (MR2) MR scan and the follow-up MR scan after three cycles of adjuvant temozolomide (MR3). Tumor size was measured both on T1 post-contrast and T2 weighted images according to the Response Assessment in Neuro-Oncology criteria. PW images of MR3 were evaluated by visual inspection of relative cerebral blood volume (rCBV) color maps and by quantitative rCBV measurements of enhancing areas with highest rCBV. Image interpretability of PW images was also scored. Finally, the neuroradiologists gave a conclusion on tumor status, based on the interpretation of both T1 and T2 weighted images (MR1, MR2 and MR3) in combination with PWI (MR3). Interobserver agreement on visual interpretation of rCBV maps was good (κ = 0.63) but poor on quantitative rCBV measurements and on interpretability of perfusion images (intraclass correlation coefficient 0.37 and κ = 0.23, respectively). Interobserver agreement on the overall conclusion of tumor status was moderate (κ = 0.48). Interobserver agreement on the visual interpretation of PWI color maps was good. However, overall interpretation of MR scans (using both conventional and PW images) showed considerable interobserver variability. Therefore, caution should be applied when interpreting MRI results during chemoradiation therapy. © 2016 EAN.
Interobserver Reliability of the Total Body Score System for Quantifying Human Decomposition.
Dabbs, Gretchen R; Connor, Melissa; Bytheway, Joan A
2016-03-01
Several authors have tested the accuracy of the Total Body Score (TBS) method for quantifying decomposition, but none have examined the reliability of the method as a scoring system by testing interobserver error rates. Sixteen participants used the TBS system to score 59 observation packets including photographs and written descriptions of 13 human cadavers in different stages of decomposition (postmortem interval: 2-186 days). Data analysis used a two-way random model intraclass correlation in SPSS (v. 17.0). The TBS method showed "almost perfect" agreement between observers, with average absolute correlation coefficients of 0.990 and average consistency correlation coefficients of 0.991. While the TBS method may have sources of error, scoring reliability is not one of them. Individual component scores were examined, and the influences of education and experience levels were investigated. Overall, the trunk component scores were the least concordant. Suggestions are made to improve the reliability of the TBS method. © 2016 American Academy of Forensic Sciences.
Interobserver Agreement on First-Stage Conversation Analytic Transcription
ERIC Educational Resources Information Center
Roberts, Felicia; Robinson, Jeffrey D.
2004-01-01
This investigation assesses interobserver agreement on conversation analytic (CA) transcription. Four professional CA transcribers spent a maximum of 3 hours transcribing 2.5 minutes of a previously unknown, naturally occurring, mundane telephone call. Researchers unitized transcripts into words, sounds, silences, inbreaths, outbreaths, and laugh…
Proximal humeral fracture classification systems revisited.
Majed, Addie; Macleod, Iain; Bull, Anthony M J; Zyto, Karol; Resch, Herbert; Hertel, Ralph; Reilly, Peter; Emery, Roger J H
2011-10-01
This study evaluated several classification systems and expert surgeons' anatomic understanding of these complex injuries based on a consecutive series of patients. We hypothesized that current proximal humeral fracture classification systems, regardless of imaging methods, are not sufficiently reliable to aid clinical management of these injuries. Complex fractures in 96 consecutive patients were investigated by generation of rapid sequence prototyping models from computed tomography Digital Imaging and Communications in Medicine (DICOM) imaging data. Four independent senior observers were asked to classify each model using 4 classification systems: Neer, AO, Codman-Hertel, and a prototype classification system by Resch. Interobserver and intraobserver κ coefficient values were calculated for the overall classification system and for selected classification items. The κ coefficient values for the interobserver reliability were 0.33 for Neer, 0.11 for AO, 0.44 for Codman-Hertel, and 0.15 for Resch. Interobserver reliability κ coefficient values were 0.32 for the number of fragments and 0.30 for the anatomic segment involved using the Neer system, 0.30 for the AO type (A, B, C), and 0.53, 0.48, and 0.08 for the Resch impaction/distraction, varus/valgus and flexion/extension subgroups, respectively. Three-part fractures showed low reliability for the Neer and AO systems. Currently available evidence suggests fracture classifications in use have poor intra- and inter-observer reliability despite the modality of imaging used thus making treating these injuries difficult as weak as affecting scientific research as well. This study was undertaken to evaluate the reliability of several systems using rapid sequence prototype models. Overall interobserver κ values represented slight to moderate agreement. The most reliable interobserver scores were found with the Codman-Hertel classification, followed by elements of Resch's trial system. The AO system had the lowest values. The higher interobserver reliability values for the Codman-Hertel system showed that is the only comprehensive fracture description studied, whereas the novel classification by Resch showed clear definition in respect to varus/valgus and impaction/distraction angulation. Copyright © 2011 Journal of Shoulder and Elbow Surgery Board of Trustees. All rights reserved.
The Ottawa knee rules - a useful clinical decision tool.
Yao, Kaihan; Haque, Tasneem
2012-04-01
Acute knee injuries are a common presentation in the primary care setting. The Ottawa knee rules provide guidance on how to identify which cases of knee injury require radiographic investigation. This article describes the Ottawa knee rules and outlines their sensitivity, reproducibility and application in the clinical setting. The Ottawa knee rules are a valuable tool for clinicians in the routine management of acute knee injuries. Studies show that they are highly sensitive at identifying patients with fractures of the knee and have a high degree of interobserver agreement and reproducible results. Application of the Ottawa knee rules in appropriate clinical scenarios may reduce the number of unnecessary radiographs ordered, streamlining patient throughput and allowing for significant cost savings. Although designed for use in adults, some studies have suggested that the Ottawa knee rules may also be applicable to the paediatric population.
2013-01-01
Background Accurate prediction of Helicobacter pylori infection status on endoscopic images can contribute to early detection of gastric cancer, especially in Asia. We identified the diagnostic yield of endoscopy for H. pylori infection at various endoscopist career levels and the effect of two years of training on diagnostic yield. Methods A total of 77 consecutive patients who underwent endoscopy were analyzed. H. pylori infection status was determined by histology, serology, and the urea breast test and categorized as H. pylori-uninfected, -infected, or -eradicated. Distinctive endoscopic findings were judged by six physicians at different career levels: beginner (<500 endoscopies), intermediate (1500–5000), and advanced (>5000). Diagnostic yield and inter- and intra-observer agreement on H. pylori infection status were evaluated. Values were compared between the two beginners after two years of training. The kappa (K) statistic was used to calculate agreement. Results For all physicians, the diagnostic yield was 88.9% for H. pylori-uninfected, 62.1% for H. pylori-infected, and 55.8% for H. pylori-eradicated. Intra-observer agreement for H. pylori infection status was good (K > 0.6) for all physicians, while inter-observer agreement was lower (K = 0.46) for beginners than for intermediate and advanced (K > 0.6). For all physicians, good inter-observer agreement in endoscopic findings was seen for atrophic change (K = 0.69), regular arrangement of collecting venules (K = 0.63), and hemorrhage (K = 0.62). For beginners, the diagnostic yield of H. pylori-infected/eradicated status and inter-observer agreement of endoscopic findings were improved after two years of training. Conclusions The diagnostic yield of endoscopic diagnosis was high for H. pylori-uninfected cases, but was low for H. pylori-eradicated cases. In beginners, daily training on endoscopic findings improved the low diagnostic yield. PMID:23947684
Bourdel, Nicolas; Modaffari, Paola; Tognazza, Enrica; Pertile, Riccardo; Chauvet, Pauline; Botchorishivili, Revaz; Savary, Dennis; Pouly, Jean Luc; Rabischong, Benoit; Canis, Michel
2016-12-01
Hysteroscopic reliability may be influenced by the experience of the operator and by a lack of morphological diagnostic criteria for endometrial malignant pathologies. The aim of this study was to evaluate the diagnostic accuracy and the inter-observer agreement (IOA) in the management of abnormal uterine bleeding (AUB) among different experienced gynecologists. Each gynecologist, without any other clinical information, was asked to evaluate the anonymous video recordings of 51 consecutive patients who underwent hysteroscopy and endometrial resection for AUB. Experts (>500 hysteroscopies), seniors (20-499 procedures) and junior (≤19 procedures) gynecologists were asked to judge endometrial macroscopic appearance (benign, suspicious or frankly malignant). They also had to propose the histological diagnosis (atrophic or proliferative endometrium; simple, glandulocystic or atypical endometrial hyperplasia and endometrial carcinoma). Observers were free to indicate whether the quality of recordings were not good enough for adequate assessment. IOA (k coefficient), sensitivity, specificity, predictive value and the likelihood ratio were calculated. Five expert, five senior and six junior gynecologists were involved in the study. Considering endometrial cancer and endometrial atypical hyperplasia, sensitivity and specificity were respectively 55.5 % and 84.5 % for juniors, 66.6 % and 81.2 % for seniors and 86.6 % and 87.3 % for experts. Concerning endometrial macroscopic appearance, IOA was poor for juniors (k = 0.10) and fair for seniors and experts (k = 0.23 and 0.22, respectively). IOA was poor for juniors and experts (k = 0.18 and 0.20, respectively) and fair for seniors (k = 0.30) in predicting the histological diagnosis. Sensitivity improves with the observer's experience, but inter-observer agreement and reproducibility of hysteroscopy for endometrial malignancies are not satisfying no matter the level of expertise. Therefore, an accurate and complete endometrial sampling is still needed.
Lee, Su Hyun; Cho, Nariya; Chang, Jung Min; Koo, Hye Ryoung; Kim, Jin You; Kim, Won Hwa; Bae, Min Sun; Yi, Ann; Moon, Woo Kyung
2013-10-28
Purpose To determine whether two-view shear-wave elastography (SWE) improves the performance of radiologists in differentiating benign from malignant breast masses compared with single-view SWE. Materials and Methods This prospective study was conducted with institutional review board approval, and written informed consent was obtained. B-mode ultrasonographic (US) and orthogonal SWE images were obtained for 219 breast masses (136 benign and 83 malignant; mean size, 14.8 mm) in 219 consecutive women (mean age, 47.9 years; range, 20-78 years). Five blinded radiologists independently assessed the likelihood of malignancy for three data sets: B-mode US alone, B-mode US and single-view SWE, and B-mode US and two-view SWE. Interobserver agreement regarding Breast Imaging Reporting and Data System (BI-RADS) category and the area under the receiver operating characteristic curve (AUC) of each data set were compared. Results Interobserver agreement was moderate (κ = 0.560 ± 0.015 [standard error of the mean]) for BI-RADS category assessment with B-mode US alone. When SWE was added to B-mode US, five readers showed substantial interobserver agreement (κ = 0.629 ± 0.017 for single-view SWE; κ = 0.651 ± 0.014 for two-view SWE). The mean AUC of B-mode US was 0.870 (range, 0.855-0.884). The AUC of B-mode US and two-view SWE (average, 0.928; range, 0.904-0.941) was higher than that of B-mode US and single-view SWE (average, 0.900; range, 0.890-0.920), with statistically significant differences for three readers (P ≤ .003). Conclusion The performance of radiologists in differentiating benign from malignant breast masses was improved when B-mode US was combined with two-view SWE compared with that when B-mode US was combined with single-view SWE. © RSNA, 2013 Supplemental material: S1.
Lee, Su Hyun; Cho, Nariya; Chang, Jung Min; Koo, Hye Ryoung; Kim, Jin You; Kim, Won Hwa; Bae, Min Sun; Yi, Ann; Moon, Woo Kyung
2014-02-01
To determine whether two-view shear-wave elastography (SWE) improves the performance of radiologists in differentiating benign from malignant breast masses compared with single-view SWE. This prospective study was conducted with institutional review board approval, and written informed consent was obtained. B-mode ultrasonographic (US) and orthogonal SWE images were obtained for 219 breast masses (136 benign and 83 malignant; mean size, 14.8 mm) in 219 consecutive women (mean age, 47.9 years; range, 20-78 years). Five blinded radiologists independently assessed the likelihood of malignancy for three data sets: B-mode US alone, B-mode US and single-view SWE, and B-mode US and two-view SWE. Interobserver agreement regarding Breast Imaging Reporting and Data System (BI-RADS) category and the area under the receiver operating characteristic curve (AUC) of each data set were compared. Interobserver agreement was moderate (κ = 0.560 ± 0.015 [standard error of the mean]) for BI-RADS category assessment with B-mode US alone. When SWE was added to B-mode US, five readers showed substantial interobserver agreement (κ = 0.629 ± 0.017 for single-view SWE; κ = 0.651 ± 0.014 for two-view SWE). The mean AUC of B-mode US was 0.870 (range, 0.855-0.884). The AUC of B-mode US and two-view SWE (average, 0.928; range, 0.904-0.941) was higher than that of B-mode US and single-view SWE (average, 0.900; range, 0.890-0.920), with statistically significant differences for three readers (P ≤ .003). The performance of radiologists in differentiating benign from malignant breast masses was improved when B-mode US was combined with two-view SWE compared with that when B-mode US was combined with single-view SWE. © RSNA, 2013
Fosbøl, M; Reving, S; Petersen, E H; Rossing, P; Lajer, M; Zerahn, B
2017-01-01
To investigate whether inclusion of quantitative data on blood flow distribution compared with visual qualitative evaluation improve the reliability and diagnostic performance of 99 m Tc-hydroxymethylene diphosphate three-phase bone scintigraphy (TPBS) in patients suspected for charcot neuropathic osteoarthropathy (CNO) of the foot. A retrospective cohort study of TPBS performed on 148 patients with suspected acute CNO referred from a single specialized diabetes care centre. The quantitative blood flow distribution was calculated based on the method described by Deutsch et al. All scintigraphies were re-evaluated by independent, blinded observers twice with and without quantitative data on blood flow distribution at ankle and focus level, respectively. The diagnostic validity of TPBS was determined by subsequent review of clinical data and radiological examinations. A total of 90 patients (61%) had confirmed diagnosis of CNO. The sensitivity, specificity and accuracy of three-phase bone scintigraphy without/with quantitative data were 89%/88%, 58%/62% and 77%/78%, respectively. The intra-observer agreement improved significantly by adding quantitative data in the evaluation (Kappa value 0·79/0·94). The interobserver agreement was not significantly improved. Adding quantitative data on blood flow distribution in the interpretation of TBPS improves intra-observer variation, whereas no difference in interobserver variation was observed. The sensitivity of TPBS in the diagnosis of CNO is high, but holds limited specificity. Diagnostic performance does not improve using quantitative data in the evaluation. This may be due to the reference intervals applied in the study or the absence of a proper gold standard diagnostic procedure for comparison. © 2015 Scandinavian Society of Clinical Physiology and Nuclear Medicine. Published by John Wiley & Sons Ltd.
Jerez-Molina, Carmen; Lázaro-Alcay, Juan J; Ullán-de la Fuente, Ana M
2017-10-17
Cross-cultural adaptation into Spanish of the Induction Compliance Checklist (ICC) for assessing children's behaviour during induction of anaesthesia. A descriptive cross-sectional observational study was conducted on a sample of 81 children aged 2 to 12 years operated in an ambulatory surgery unit of a paediatric hospital in Barcelona. Adaptation by translation-back translation of the tool and analysis of the scale's validity and reliability. Face validity of the tool was guaranteed through a discussion group and inter-observer reliability was evaluated, obtaining an intraclass correlation index of r = 0.956. The ICC scale validated for the Spanish population can be an effective tool for the presurgical evaluation of activities carried out to minimise children's anxiety. The ICC is an easy-to-use scale completed by operating room staff in one minute and would provide important information about children's behaviour, specifically during induction. Copyright © 2017 Elsevier España, S.L.U. All rights reserved.
Sánchez Expósito, Judit; Leal Costa, César; Díaz Agea, José Luis; Carrillo Izquierdo, María Dolores; Jiménez Rodríguez, Diana
2018-02-01
The aim of this study was to analyse the communication skills of students in interactions with simulated critically-ill patients using a new assessment tool to study the relationships between communication skills, teamwork and clinical skills and to analyse the psychometric properties of the tool. A cross-sectional study was conducted to assess the communications skills of 52 students with critically-ill patients through the use of a new measurement tool to score video recordings of simulated clinical scenarios. The 52 students obtained low scores on their skills in communicating with patients. The reliability of the measuring instrument showed good inter-observer agreement (ICC between 0.71 and 0.90) and the validity yielded a positive correlation (p<0.01). The results provide evidence that nursing students lack skills when communicating with critically ill patients in simulated scenarios. The measuring instrument used is therefore deemed valid and reliable for assessing nursing students through a clinical simulation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Moreno-Montañés, Javier; Antón, Vanesa; Antón, Alfonso; Larrosa, José M; Martinez-de-la-Casa, José María; Rebolleda, Gema; Ussa, Fernando; García-Granero, Marta
2017-04-01
It is important to evaluate intraobserver and interobserver agreement using visual field (VF) testing and optical coherence tomography (OCT) software in order to understand whether the use of this software is sufficient to detect glaucoma progression and to make decisions regarding its treatment. To evaluate agreement in VF and OCT software among 5 glaucoma specialists. The printout pages from VF progression software and OCT progression software from 100 patients were randomized, and the 5 glaucoma specialists subjectively and independently evaluated them for glaucoma. Each image was classified as having no progression, questionable progression, or progression. The principal investigator classified the patients previously as without variability (normal) or with high variability among tests (difficult). Using both software, the specialists also evaluated whether the glaucoma damage had progressed and if treatment change was needed. One month later, the same observers reevaluated the patients in a different order to determine intraobserver reproducibility. Intraobserver and interobserver agreement was estimated using κ statistics and Gwet second-order agreement coefficient. The agreement was compared with other factors. Of the 100 observed patients, half were male and all were white; the mean (SD) age was 69.7 (14.1) years. Intraobserver agreement was substantial to almost perfect for VF software (overall κ [95% CI], 0.59 [0.46-0.72] to 0.87 [0.79-0.96]) and similar for OCT software (overall κ [95% CI], 0.59 [0.46-0.71] to 0.85 [0.76-0.94]). Interobserver agreement among the 5 glaucoma specialists with the VF progression software was moderate (κ, 0.48; 95% CI, 0.41-0.55) and similar to OCT progression software (κ, 0.52; 95% CI, 0.44-0.59). Interobserver agreement was substantial in images classified as having no progression but only fair in those classified as having questionable glaucoma progression or glaucoma progression. Interobserver agreement was fair regarding questions about glaucoma progression (κ, 0.39; 95% CI, 0.32-0.48) and consideration about treatment changes (κ, 0.39; 95% CI, 0.32-0.48). The factors associated with agreement were the glaucoma stage and case difficulty. There was substantial intraobserver agreement but moderate interobserver agreement among glaucoma specialists using 2 glaucoma progression software packages. These data suggest that these glaucoma progression software packages are insufficient to obtain high interobserver agreement in both devices except in patients with no progression. The low agreement regarding progression or treatment changes suggests that both software programs used in isolation are insufficient for decision making.
Reliability analysis for digital adolescent idiopathic scoliosis measurements.
Kuklo, Timothy R; Potter, Benjamin K; O'Brien, Michael F; Schroeder, Teresa M; Lenke, Lawrence G; Polly, David W
2005-04-01
Analysis of adolescent idiopathic scoliosis (AIS) requires a thorough clinical and radiographic evaluation to completely assess the three-dimensional deformity. Recently, these radiographic parameters have been analyzed for reliability and reproducibility following manual measurements; however, most of these parameters have not been analyzed with regard to digital measurements. The purpose of this study is to determine the intra- and interobserver reliability of common scoliosis radiographic parameters using a digital software measurement program. Thirty sets of preoperative (posteroanterior [PA], lateral, and side-bending [SB]) and postoperative (PA and lateral) radiographs were analyzed by three independent observers on two separate occasions using a software measurement program (PhDx, Albuquerque, NM). Coronal measures included main thoracic (MT) and thoracolumbar-lumbar (TL/L) Cobb, SB MT Cobb, MT and TL/L apical vertical translation (AVT), C7 to center sacral vertical line (CSVL), T1 tilt, LIV tilt, disk below lowest instrumented vertebra (LIV), coronal balance, and Risser, whereas sagittal measures included T2-T5, T5-T12, T2-T12, T10-L2, T12-S1, and sagittal balance. Analysis of variance for repeated measures or Cohen three-way kappa correlation coefficient analysis was performed as appropriate to calculate the intra- and interobserver reliability for each parameter. The majority of the radiographic parameters assessed demonstrated good or excellent intra- and interobserver reliability. The relationship of the LIV to the CSVL (intraobserver kappaa = 0.48-0.78, fair to excellent; interobserver kappaa = 0.34-0.41, fair to poor), interobserver measurement of AVT (rho = 0.49-0.73, low to good), Risser grade (intraobserver rho = 0.41-0.97, low to excellent; interobserver rho = 0.60-0.70, fair to good), intraobserver measurement of the angulation of the disk inferior to the LIV (rho = 0.53-0.88, fair to good), apical Nash-Moe vertebral rotation (intraobserver rho = 0.50-0.85, fair to good; interobserver rho = 0.53-0.59, fair), and especially regional thoracic kyphosis from T2 to T5 (intraobserver rho = 0.22-0.65, poor to fair; interobserver rho = 0.33-0.47, low) demonstrated lesser reliability. In general, preoperative measures demonstrated greater reliability than postoperative measures, and coronal angular measures were more reliable than sagittal measures. Most common radiographic parameters for AIS assessment demonstrated good or excellent reliability for digital measurement and can be recommended for routine clinical and academic use. Preoperative assessments and coronal measures may be more reliable than postoperative and sagittal measurements. The reliability of digital measurements will be increasingly important as digital radiographic viewing becomes commonplace.
Braun, Martin; Kirsten, Robert; Rupp, Niels J; Moch, Holger; Fend, Falko; Wernert, Nicolas; Kristiansen, Glen; Perner, Sven
2013-05-01
Quantification of protein expression based on immunohistochemistry (IHC) is an important step for translational research and clinical routine. Several manual ('eyeballing') scoring systems are used in order to semi-quantify protein expression based on chromogenic intensities and distribution patterns. However, manual scoring systems are time-consuming and subject to significant intra- and interobserver variability. The aim of our study was to explore, whether new image analysis software proves to be sufficient as an alternative tool to quantify protein expression. For IHC experiments, one nucleus specific marker (i.e., ERG antibody), one cytoplasmic specific marker (i.e., SLC45A3 antibody), and one marker expressed in both compartments (i.e., TMPRSS2 antibody) were chosen. Stainings were applied on TMAs, containing tumor material of 630 prostate cancer patients. A pathologist visually quantified all IHC stainings in a blinded manner, applying a four-step scoring system. For digital quantification, image analysis software (Tissue Studio v.2.1, Definiens AG, Munich, Germany) was applied to obtain a continuous spectrum of average staining intensity. For each of the three antibodies we found a strong correlation of the manual protein expression score and the score of the image analysis software. Spearman's rank correlation coefficient was 0.94, 0.92, and 0.90 for ERG, SLC45A3, and TMPRSS2, respectively (p⟨0.01). Our data suggest that the image analysis software Tissue Studio is a powerful tool for quantification of protein expression in IHC stainings. Further, since the digital analysis is precise and reproducible, computer supported protein quantification might help to overcome intra- and interobserver variability and increase objectivity of IHC based protein assessment.
Simoneit, Céline; Heuwieser, Wolfgang; Arlt, Sebastian P
2012-01-01
This study's objective was to determine respondents' inter-observer agreement on a detailed checklist to evaluate three exemplars (one case report, one randomized controlled study without blinding, and one blinded, randomized controlled study) of the scientific literature in the field of bovine reproduction. Fourteen international scientists in the field of animal reproduction were provided with the three articles, three copies of the checklist, and a supplementary explanation. Overall, 13 responded to more than 90% of the items. Overall repeatability between respondents using Fleiss's κ was 0.35 (fair agreement). Combining the "strongly agree" and "agree" responses and the "strongly disagree" and "disagree" responses increased κ to 0.49 (moderate agreement). Evaluation of information given in the three articles on housing of the animals (35% identical answers) and preconditions or pretreatments (42%) varied widely. Even though the overall repeatability was fair, repeatability concerning the important categories was high (e.g., level of agreement=98%). Our data show that the checklist is a reasonable and practical supporting tool to assess the quality of publications. Therefore, it may be used in teaching and practicing evidence-based veterinary medicine. It can support training in systematic and critical appraisal of information and in clinical decision making.
Damasio, Maria Beatrice; Malattia, Clara; Tanturri de Horatio, Laura; Mattiuz, Chiara; Pistorio, Angela; Bracaglia, Claudia; Barbuti, Domenico; Boavida, Peter; Juhan, Karen Lambot; Ording, Lil Sophie Mueller; Rosendahl, Karen; Martini, Alberto; Magnano, GianMichele; Tomà, Paolo
2012-09-01
MRI is a sensitive tool for the evaluation of synovitis in juvenile idiopathic arthritis (JIA). The purpose of this study was to introduce a novel MRI-based score for synovitis in children and to examine its inter- and intraobserver variability in a multi-centre study. Wrist MRI was performed in 76 children with JIA. On postcontrast 3-D spoiled gradient-echo and fat-suppressed T2-weighted spin-echo images, joint recesses were scored for the degree of synovial enhancement, effusion and overall inflammation independently by two paediatric radiologists. Total-enhancement and inflammation-synovitis scores were calculated. Interobserver agreement was poor to moderate for enhancement and inflammation in all recesses, except in the radioulnar and radiocarpal joints. Intraobserver agreement was good to excellent. For enhancement and inflammation scores, mean differences (95 % CI) between observers were -1.18 (-4.79 to 2.42) and -2.11 (-6.06 to 1.83). Intraobserver variability (reader 1) was 0 (-1.65 to 1.65) and 0.02 (-1.39 to 1.44). Intraobserver agreement was good. Except for the radioulnar and radiocarpal joints, interobserver agreement was not acceptable. Therefore, the proposed scoring system requires further refinement.
Bennett, R J; Jayakody, D M P; Eikelboom, R H; Taljaard, D S; Atlas, M D
2016-02-01
To investigate the ability of cochlear implant (CI) recipients to physically handle and care for their hearing implant device(s) and to identify factors that may influence skills. To assess device management skills, a clinical survey was developed and validated on a clinical cohort of CI recipients. Survey development and validation. A prospective convenience cohort design study. Specialist hearing implant clinic. Forty-nine post-lingually deafened, adult CI recipients, at least 12 months postoperative. Survey test-retest reliability, interobserver reliability and responsiveness. Correlations between management skills and participant demographic, audiometric, clinical outcomes and device factors. The Cochlear Implant Management Skills survey was developed, demonstrating high test-retest reliability (0.878), interobserver reliability (0.972) and responsiveness to intervention (skills training) [t(20) = -3.913, P = 0.001]. Cochlear Implant Management Skills survey scores range from 54.69% to 100% (mean: 83.45%, sd: 12.47). No associations were found between handling skills and participant factors. This is the first study to demonstrate a range in cochlear implant device handling skills in CI recipients and offers clinicians and researchers a tool to systematically and objectively identify shortcomings in CI recipients' device handling skills. © 2015 John Wiley & Sons Ltd.
Automated brain computed tomographic densitometry of early ischemic changes in acute stroke
Stoel, Berend C.; Marquering, Henk A.; Staring, Marius; Beenen, Ludo F.; Slump, Cornelis H.; Roos, Yvo B.; Majoie, Charles B.
2015-01-01
Abstract. The Alberta Stroke Program Early CT score (ASPECTS) scoring method is frequently used for quantifying early ischemic changes (EICs) in patients with acute ischemic stroke in clinical studies. Varying interobserver agreement has been reported, however, with limited agreement. Therefore, our goal was to develop and evaluate an automated brain densitometric method. It divides CT scans of the brain into ASPECTS regions using atlas-based segmentation. EICs are quantified by comparing the brain density between contralateral sides. This method was optimized and validated using CT data from 10 and 63 patients, respectively. The automated method was validated against manual ASPECTS, stroke severity at baseline and clinical outcome after 7 to 10 days (NIH Stroke Scale, NIHSS) and 3 months (modified Rankin Scale). Manual and automated ASPECTS showed similar and statistically significant correlations with baseline NIHSS (R=−0.399 and −0.277, respectively) and with follow-up mRS (R=−0.256 and −0.272), except for the follow-up NIHSS. Agreement between automated and consensus ASPECTS reading was similar to the interobserver agreement of manual ASPECTS (differences <1 point in 73% of cases). The automated ASPECTS method could, therefore, be used as a supplementary tool to assist manual scoring. PMID:26158082
Interobserver error involved in independent attempts to measure cusp base areas of Pan M1s
Bailey, Shara E; Pilbrow, Varsha C; Wood, Bernard A
2004-01-01
Cusp base areas measured from digitized images increase the amount of detailed quantitative information one can collect from post-canine crown morphology. Although this method is gaining wide usage for taxonomic analyses of extant and extinct hominoids, the techniques for digitizing images and taking measurements differ between researchers. The aim of this study was to investigate interobserver error in order to help assess the reliability of cusp base area measurement within extant and extinct hominoid taxa. Two of the authors measured individual cusp base areas and total cusp base area of 23 maxillary first molars (M1) of Pan. From these, relative cusp base areas were calculated. No statistically significant interobserver differences were found for either absolute or relative cusp base areas. On average the hypocone and paracone showed the least interobserver error (< 1%) whereas the protocone and metacone showed the most (2.6–4.5%). We suggest that the larger measurement error in the metacone/protocone is due primarily to either weakly defined fissure patterns and/or the presence of accessory occlusal features. Overall, levels of interobserver error are similar to those found for intraobserver error. The results of our study suggest that if certain prescribed standards are employed then cusp and crown base areas measured by different individuals can be pooled into a single database. PMID:15447691
Tsili, Athina C; Ntorkou, Alexandra; Astrakas, Loukas; Xydis, Vasilis; Tsampalas, Stavros; Sofikitis, Nikolaos; Argyropoulou, Maria I
2017-04-01
To evaluate the difference in apparent diffusion coefficient (ADC) measurements at diffusion-weighted (DW) magnetic resonance imaging of differently shaped regions-of-interest (ROIs) in testicular germ cell neoplasms (TGCNS), the diagnostic ability of differently shaped ROIs in differentiating seminomas from nonseminomatous germ cell neoplasms (NSGCNs) and the interobserver variability. Thirty-three TGCNs were retrospectively evaluated. Patients underwent MR examinations, including DWI on a 1.5-T MR system. Two observers measured mean tumor ADCs using four distinct ROI methods: round, square, freehand and multiple small, round ROIs. The interclass correlation coefficient was analyzed to assess interobserver variability. Statistical analysis was used to compare mean ADC measurements among observers, methods and histologic types. All ROI methods showed excellent interobserver agreement, with excellent correlation (P<0.001). Multiple, small ROIs provided the lower mean ADC in TGCNs. Seminomas had lower mean ADC compared to NSGCNs for each ROI method (P<0.001). Round ROI proved the most accurate method in characterizing TGCNS. Interobserver variability in ADC measurement is excellent, irrespective of the ROI shape. Multiple, small round ROIs and round ROI proved the more accurate methods for ADC measurement in the characterization of TGCNs and in the differentiation between seminomas and NSGCNs, respectively. Copyright © 2017 Elsevier B.V. All rights reserved.
Lai, Jeffrey K C; Robertson, Patricia L; Goh, Christine; Szer, Jeff
2018-02-01
To evaluate the intraobserver and interobserver agreement for bone marrow burden (BMB) scores for individual examinations and for the change in BMB score over time in the same patient. A total of 119 sets of MR images of the lumbar spine and femora from 60 patients with Gaucher disease were included. Each set of MR images was scored using the BMB score independently by two experienced MSK radiologists. One radiologist performed a second read four weeks later. Intraobserver and interobserver agreement was assessed using Bland-Altman analysis and weighted kappa scores. BMB scores (n=119) demonstrated fair intraobserver agreement (weighted kappa=0.53) with a mean difference of -0.20 and 95% limits of agreement (LOA) of (-3.41, 3.01). Inter observer agreement was poor with weighted kappa 0.28 with mean difference of -0.16 and 95% LOA of (-4.45, 4.11). Change in BMB scores over time (n=59) demonstrated poor/fair intraobserver agreement (weighted kappa 0.41, mean difference-0.20 and 95% LOA (-4.35, 3.94)). Interobserver agreement was poor (weighted kappa 0.25, mean difference -0.12 with wide 95% LOA (-6.23, 5.99)). Significant interobserver, and to a lesser extent intraobserver, variation occurs with blinded BMB scoring of Gaucher disease. Copyright © 2016 Elsevier Inc. All rights reserved.
The role of miRNAs in endometrial cancer.
Vasilatou, Diamantina; Sioulas, Vasileios D; Pappa, Vasiliki; Papageorgiou, Sotirios G; Vlahos, Nikolaos F
2015-01-01
miRNAs are small noncoding RNAs that regulate gene expression at the post-transcriptional level. Since their discovery, miRNAs have been associated with every cell function including malignant transformation and metastasis. Endometrial cancer is the most common gynecologic malignancy. However, improvement should be made in interobserver agreement on histological typing and individualized therapeutic approaches. This article summarizes the role of miRNAs in endometrial cancer pathogenesis and treatment.
Park, Juhyun; Kang, Minyong; Jeong, Chang Wook; Oh, Sohee; Lee, Jeong Woo; Lee, Seung Bae; Son, Hwancheol; Jeong, Hyeon; Cho, Sung Yong
2015-08-01
The modified Seoul National University Renal Stone Complexity scoring system (S-ReSC-R) for retrograde intrarenal surgery (RIRS) was developed as a tool to predict stone-free rate (SFR) after RIRS. We externally validated the S-ReSC-R. We retrospectively reviewed 159 patients who underwent RIRS. The S-ReSC-R was assigned from 1 to 12 according to the location and number of sites involved. The stone-free status was defined as no evidence of a stone or with clinically insignificant residual fragment stones less than 2 mm. Interobserver and test-retest reliabilities were evaluated. Statistical performance of the prediction model was assessed by its predictive accuracy, predictive probability, and clinical usefulness. Overall SFR was 73.0%. The SFRs were 86.7%, 70.2%, and 48.6% in low-score (1-2), intermediate-score (3-4), and high-score (5-12) groups, respectively (p<0.001). External validation of S-ReSC-R revealed an area under the curve (AUC) of 0.731 (95% CI 0.650-0.813). The AUC of the three-titered S-ReSC-R was 0.701 (95% CI 0.609-0.794). The calibration plot showed that the predicted probability of SFR had a concordance comparable to that of observed frequency. The Hosmer-Lemeshow goodness of fit test revealed a p-value of 0.01 for the S-ReSC-R and 0.90 for the three-titered S-ReSC-R. Interobserver and test-retest reliabilities revealed an almost perfect level of agreement. The present study proved the predictive value of S-ReSC-R to predict SFR following RIRS in an independent cohort. Interobserver and test-retest reliabilities confirmed that S-ReSC-R was reliable and valid.
Baek, Hye Jin; Kim, Dong Wook; Ryu, Ji Hwa; Lee, Yoo Jin
2013-01-01
Background There has been no study to compare the diagnostic accuracy of an experienced radiologist with a trainee in nasal bone fracture. Objectives To compare the diagnostic accuracy between conventional radiography and computed tomography (CT) for the identification of nasal bone fractures and to evaluate the interobserver reliability between a staff radiologist and a trainee. Patients and Methods A total of 108 patients who underwent conventional radiography and CT after acute nasal trauma were included in this retrospective study. Two readers, a staff radiologist and a second-year resident, independently assessed the results of the imaging studies. Results Of the 108 patients, the presence of a nasal bone fracture was confirmed in 88 (81.5%) patients. The number of non-depressed fractures was higher than the number of depressed fractures. In nine (10.2%) patients, nasal bone fractures were only identified on conventional radiography, including three depressed and six non-depressed fractures. CT was more accurate as compared to conventional radiography for the identification of nasal bone fractures as determined by both readers (P <0.05), all diagnostic indices of an experienced radiologist were similar to or higher than those of a trainee, and κ statistics showed moderate agreement between the two diagnostic tools for both readers. There was no statistical difference in the assessment of interobserver reliability for both imaging modalities in the identification of nasal bone fractures. Conclusion For the identification of nasal bone fractures, CT was significantly superior to conventional radiography. Although a staff radiologist showed better values in the identification of nasal bone fracture and differentiation between depressed and non-depressed fractures than a trainee, there was no statistically significant difference in the interpretation of conventional radiography and CT between a radiologist and a trainee. PMID:24348599
Taffin, Elien Rl; Paepe, Dominique; Campos, Miguel; Duchateau, Luc; Goris, Nesya; De Roover, Katrien; Daminet, Sylvie
2016-11-01
Objectives The Karnofsky score (KS) modified for cats, a scoring system to rate health and quality of life (QOL) in cats, is used in clinical trials, but its reliability and validity are yet to be determined. The present study aims to evaluate the scientific robustness of the KS when adapted for use in a hospital setting. Methods A list of variables to consider during the physical examination, which informs the clinician's score (CS) part of the KS, was added and clinicians were allowed to choose a score anywhere between 0 and 50. The Karnofsky QOL questionnaire was adapted for use in a hospital setting. F-tests with Bonferroni correction and Spearman rank correlation coefficients were used to evaluate reliability and validity of the KS to assess the health and wellbeing of cats in a hospital setting. The records of 54 feline immunodeficiency virus-positive cats, which were recruited for a clinical trial and hospitalised for 6 weeks, were reviewed. Four veterinarians scored the CS, and one veterinarian and a veterinary nurse assessed the QOL score. Results Mean absolute difference between observers was significantly larger for the CS than for the QOL score ( P <0.001) and two veterinarians scored significantly higher than the remaining two veterinarians ( P <0.001). Inter-observer correlation ranged from 0.45-0.75 for the CS. For the QOL score, the absolute difference between observers was small, no significant difference was found between observers and a high degree of inter-observer correlation was noted (r = 0.91). Conclusions and relevance The results indicate low inter-observer reliability for the CS, requiring additional modifications to this part of the KS. The QOL score seems more reliable, and the questionnaire may serve as a reliable tool in the assessment of QOL in cats in a hospital setting. Consequently, further adaptation of the KS is mandatory when simultaneous assessment of both the cat's clinical health and perceived wellbeing is required.
Serel Arslan, S; Demir, N; Karaduman, A A
2017-02-01
This study aimed to develop a scale called Tongue Thrust Rating Scale (TTRS), which categorised tongue thrust in children in terms of its severity during swallowing, and to investigate its validity and reliability. The study describes the developmental phase of the TTRS and presented its content and criterion-based validity and interobserver and intra-observer reliability. For content validation, seven experts assessed the steps in the scale over two Delphi rounds. Two physical therapists evaluated videos of 50 children with cerebral palsy (mean age, 57·9 ± 16·8 months), using the TTRS to test criterion-based validity, interobserver and intra-observer reliability. The Karaduman Chewing Performance Scale (KCPS) and Drooling Severity and Frequency Scale (DSFS) were used for criterion-based validity. All the TTRS steps were deemed necessary. The content validity index was 0·857. A very strong positive correlation was found between two examinations by one physical therapist, which indicated intra-observer reliability (r = 0·938, P < 0·001). A very strong positive correlation was also found between the TTRS scores of two physical therapists, indicating interobserver reliability (r = 0·892, P < 0·001). There was also a strong positive correlation between the TTRS and KCPS (r = 0·724, P < 0·001) and a very strong positive correlation between the TTRS scores and DSFS (r = 0·822 and r = 0·755; P < 0·001). These results demonstrated the criterion-based validity of the TTRS. The TTRS is a valid, reliable and clinically easy-to-use functional instrument to document the severity of tongue thrust in children. © 2016 John Wiley & Sons Ltd.
Nayak, Lakshmi; DeAngelis, Lisa M; Brandes, Alba A; Peereboom, David M; Galanis, Evanthia; Lin, Nancy U; Soffietti, Riccardo; Macdonald, David R; Chamberlain, Marc; Perry, James; Jaeckle, Kurt; Mehta, Minesh; Stupp, Roger; Muzikansky, Alona; Pentsova, Elena; Cloughesy, Timothy; Iwamoto, Fabio M; Tonn, Joerg-Christian; Vogelbaum, Michael A; Wen, Patrick Y; van den Bent, Martin J; Reardon, David A
2017-05-01
The Macdonald criteria and the Response Assessment in Neuro-Oncology (RANO) criteria define radiologic parameters to classify therapeutic outcome among patients with malignant glioma and specify that clinical status must be incorporated and prioritized for overall assessment. But neither provides specific parameters to do so. We hypothesized that a standardized metric to measure neurologic function will permit more effective overall response assessment in neuro-oncology. An international group of physicians including neurologists, medical oncologists, radiation oncologists, and neurosurgeons with expertise in neuro-oncology drafted the Neurologic Assessment in Neuro-Oncology (NANO) scale as an objective and quantifiable metric of neurologic function evaluable during a routine office examination. The scale was subsequently tested in a multicenter study to determine its overall reliability, inter-observer variability, and feasibility. The NANO scale is a quantifiable evaluation of 9 relevant neurologic domains based on direct observation and testing conducted during routine office visits. The score defines overall response criteria. A prospective, multinational study noted a >90% inter-observer agreement rate with kappa statistic ranging from 0.35 to 0.83 (fair to almost perfect agreement), and a median assessment time of 4 minutes (interquartile range, 3-5). The NANO scale provides an objective clinician-reported outcome of neurologic function with high inter-observer agreement. It is designed to combine with radiographic assessment to provide an overall assessment of outcome for neuro-oncology patients in clinical trials and in daily practice. Furthermore, it complements existing patient-reported outcomes and cognition testing to combine for a global clinical outcome assessment of well-being among brain tumor patients. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Astrup, Helene; Kåsin, Britt Marlene; Andersen, Lene Frost
2015-01-01
Background High-quality, Web-based dietary assessment tools for children are needed to reduce cost and improve user-friendliness when studying children’s dietary practices. Objective To evaluate the first Web-based dietary assessment tool for children in Norway, the Web-based Food Record (WebFR), by comparing children’s true school lunch intake with recordings in the WebFR, using direct unobtrusive observation as the reference method. Methods A total of 117 children, 8-9 years, from Bærum, Norway, were recruited from September to December 2013. Children completed 4 days of recordings in the WebFR, with parental assistance, and were observed during school lunch in the same period by 3 observers. Interobserver reliability assessments were satisfactory. Match, omission, and intrusion rates were calculated to assess the quality of the recordings in the WebFR for different food categories, and for all foods combined. Logistic regression analyses were used to investigate whether body mass index (BMI), parental educational level, parental ethnicity or family structure were associated with having a “low match rate” (≤70%). Results Bread and milk were recorded with less bias than spreads, fruits, and vegetables. Mean (SD) for match, omission, and intrusion rates for all foods combined were 73% (27%), 27% (27%), and 19% (26%), respectively. Match rates were statistically significantly associated with parental educational level (low education 52% [32%] versus high 77% [24%], P=.008) and parental ethnicity (non-Norwegian 57% [28%] versus others 75% [26%], P=.04). Only parental ethnicity remained statistically significant in the logistic regression model, showing an adjusted odds ratio of 6.9 and a 95% confidence interval between 1.3 and 36.4. Conclusions Compared with other similar studies, our results indicate that the WebFR is in line with, or better than most of other similar tools, yet enhancements could further improve the WebFR. PMID:26680744
Spine Instability Neoplastic Score: agreement across different medical and surgical specialties.
Arana, Estanislao; Kovacs, Francisco M; Royuela, Ana; Asenjo, Beatriz; Pérez-Ramírez, Úrsula; Zamora, Javier
2016-05-01
Spinal instability is an acknowledged complication of spinal metastases; in spite of recent suggested criteria, it is not clearly defined in the literature. This study aimed to assess intra and interobserver agreement when using the Spine Instability Neoplastic Score (SINS) by all physicians involved in its management. Independent multicenter reliability study for the recently created SINS, undertaken with a panel of medical oncologists, neurosurgeons, radiologists, orthopedic surgeons, and radiation oncologists, was carried out. Ninety patients with biopsy-proven spinal metastases and magnetic resonance imaging, reviewed at the multidisciplinary tumor board of our institution, were included. Intraclass correlation coefficient (ICC) was used for SINS score agreement. Fleiss kappa statistic was used to assess agreement on the location of the most affected vertebral level; agreement on the SINS category ("stable," "potentially stable," or "unstable"); and overall agreement with the classification established by tumor board. Clinical data and imaging were provided to 83 specialists in 44 hospitals across 14 Spanish regions. No assessment criteria were pre-established. Each clinician assessed the SINS score twice, with a minimum 6-week interval. Clinicians were blinded to assessments made by other specialists and to their own previous assessment. Subgroup analyses were performed by clinicians' specialty, experience (≤7, 8-13, ≥14 years), and hospital category (four levels according to size and complexity). This study was supported by Kovacs Foundation. Intra and interobserver agreement on the location of the most affected levels was "almost perfect" (κ>0.94). Intra-observer agreement on the SINS score was "excellent" (ICC=0.77), whereas interobserver agreement was "moderate" (ICC=0.55). Intra-observer agreement in SINS category was "substantial" (k=0.61), whereas interobserver agreement was "moderate" (k=0.42). Overall agreement with the tumor board classification was "substantial" (κ=0.61). Results were similar across specialties, years of experience, and hospital category. Agreement on the assessment of metastatic spine instability is moderate. The SINS can help improve communication among clinicians in oncology care. Copyright © 2015 Elsevier Inc. All rights reserved.
Yue, Dong; Fan Rong, Cheng; Ning, Cai; Liang, Hu; Ai Lian, Liu; Ru Xin, Wang; Ya Hong, Luo
2018-07-01
Background The evaluation of hip arthroplasty is a challenge in computed tomography (CT). The virtual monochromatic spectral (VMS) images with metal artifact reduction software (MARs) in spectral CT can reduce the artifacts and improve the image quality. Purpose To evaluate the effects of VMS images and MARs for metal artifact reduction in patients with unilateral hip arthroplasty. Material and Methods Thirty-five patients underwent dual-energy CT. Four sets of VMS images without MARs and four sets of VMS images with MARs were obtained. Artifact index (AI), CT number, and SD value were assessed at the periprosthetic region and the pelvic organs. The scores of two observers for different images and the inter-observer agreement were evaluated. Results The AIs in 120 and 140 keV images were significantly lower than those in 80 and 100 keV images. The AIs of the periprosthetic region in VMS images with MARs were significantly lower than those in VMS images without MARs, while the AIs of pelvic organs were not significantly different. VMS images with MARs improved the accuracy of CT numbers for the periprosthetic region. The inter-observer agreements were good for all the images. VMS images with MARs at 120 and 140 keV had higher subjective scores and could improve the image quality, leading to reliable diagnosis of prosthesis-related problems. Conclusion VMS images with MARs at 120 and 140 keV could significantly reduce the artifacts from hip arthroplasty and improve the image quality at the periprosthetic region but had no obvious advantage for pelvic organs.
Have levels of evidence improved the quality of orthopaedic research?
Cunningham, Brian P; Harmsen, Samuel; Kweon, Chris; Patterson, Jason; Waldrop, Robert; McLaren, Alex; McLemore, Ryan
2013-11-01
Since 2003 many orthopaedic journals have adopted grading systems for levels of evidence (LOE). It is unclear if the quality of orthopaedic literature has changed since LOE was introduced. We asked three questions: (1) Have the overall number and proportion of Level I and II studies increased in the orthopaedic literature since the introduction of LOE? (2) Is a similar pattern seen in individual orthopaedic subspecialty journals? (3) What is the interobserver reliability of grading LOE? We assigned LOE to therapeutic studies published in 2000, 2005, and 2010 in eight major orthopaedic subspecialty journals. Number and proportion of Level I and II publications were determined. Data were evaluated using log-linear models. Twenty-six reviewers (13 residents and 13 attendings) graded LOE of 20 blinded therapeutic articles from the Journal of Bone and Joint Surgery for 2009. Interobserver agreement relative to the Journal of Bone and Joint Surgery was assessed using a weighted kappa. The total number of Level I and II publications in subspecialty journals increased from 150 in 2000 to 239 in 2010. The proportion of high-quality publications increased with time (p < 0.001). All subspecialty journals other than the Journal of Pediatric Orthopaedics and the Journal of Orthopaedic Trauma showed a similar behavior. Average weighted kappa was 0.791 for residents and 0.842 for faculty (p = 0.209). The number and proportion of Level I and II publications have increased. LOE can be graded reliably with high interobserver agreement. The number and proportion of high-level studies should continue to increase.
Kwon, Mi-Ri; Kim, Chan Kyo; Kim, Jae-Hun
2017-11-01
To investigate the variability of diffusion-weighted imaging (DWI) interpretation of Prostate Imaging Reporting and Data System (PI-RADS) version 2 (v2) in evaluating prostate cancer (PCa). 154 patients with PCa underwent multiparametric 3T MRI, followed by radical prostatectomy. DWI with different b values (b = 0, 100, 1000 and 1500 s mm - 2 ) was obtained. Using the PI-RADS v2, two radiologists independently scored suspicious lesions in each patient and compared DWI of b = 1000 (DWI 1000 ) with 1500 (DWI 1500 ) s mm - 2 . On DWI 1000 and DWI 1500 , the intermethod and interobserver agreements of DWI scores were excellent in all patients (κ ≥ 0.873). In each peripheral zone and transition zone DWI scores, both observers showed excellent intermethod agreement between DWI 1000 and DWI 1500 (κ ≥ 0.897), and interobserver agreement for DWI 1000 and DWI 1500 was good to excellent (κ ≥ 0.796). For estimating clinically significant cancer, the area under receiver operating characteristics curves of DWI 1000 and DWI 1500 were 0.710 and 0.724 for observer 1 (p = 0.11), and 0.649 and 0.656 for observer 2 (p = 0.12), respectively. The PI-RADS v2 scoring at 3T shows excellent agreement between DWI 1000 and DWI 1500 in evaluating PCa, with excellent inter-observer agreement. Advance in knowledge: DWI using b = 1000 s mm -2 instead of b = 1500 s mm -2 reduces examination time or image distortion, with improved the signal-to-noise ratio.
An Instrument to Assess the Obesogenic Environment of Child Care Centers
ERIC Educational Resources Information Center
Ward, Dianne; Hales, Derek; Haverly, Katie; Marks, Julie; Benjamin, Sara; Ball, Sarah; Trost, Stewart
2008-01-01
Objectives: To describe protocol and interobserver agreements of an instrument to evaluate nutrition and physical activity environments at child care. Methods: Interobserver data were collected from 9 child care centers, through direct observation and document review (17 observer pairs). Results: Mean agreement between observer pairs was 87.26%…
Chen, Jian; Zhang, Yan-Ming; Song, Ze-Zhou; Fu, Yan-Fei; Geng, Yu
2018-04-10
The interobserver agreement in the assessment of the grade of carotid plaque neovascularization by contrast-enhanced ultrasonography is poorly established. We examined 140 carotid plaques in 66 patients (all patients had bilateral plaques, and 8 patients had 2 plaques on one side). We performed conventional and contrast-enhanced ultrasonography to analyze the presence of carotid plaque neovascularization, which was graded by two independent observers whose interobserver agreement (κ) was evaluated according to the thickness of carotid plaque. For all carotid plaques, the mean κ was 0.689 (95% confidence interval 0.604-0.774). It was 0.689 (0.569-0.808), 0.637 (0.487-0.787), and 0.740 (0.585-0.896), respectively for carotid plaques with maximal thickness <2 mm, from 2 mm to 3 mm, and >3 mm. The interobserver agreement for assessing carotid plaque neovascularization by using contrast-enhanced ultrasonography is substantial and acceptable for research purposes, regardless of the maximal thickness of the plaque. © 2018 Wiley Periodicals, Inc.
An automated A-value measurement tool for accurate cochlear duct length estimation.
Iyaniwura, John E; Elfarnawany, Mai; Ladak, Hanif M; Agrawal, Sumit K
2018-01-22
There has been renewed interest in the cochlear duct length (CDL) for preoperative cochlear implant electrode selection and postoperative generation of patient-specific frequency maps. The CDL can be estimated by measuring the A-value, which is defined as the length between the round window and the furthest point on the basal turn. Unfortunately, there is significant intra- and inter-observer variability when these measurements are made clinically. The objective of this study was to develop an automated A-value measurement algorithm to improve accuracy and eliminate observer variability. Clinical and micro-CT images of 20 cadaveric cochleae specimens were acquired. The micro-CT of one sample was chosen as the atlas, and A-value fiducials were placed onto that image. Image registration (rigid affine and non-rigid B-spline) was applied between the atlas and the 19 remaining clinical CT images. The registration transform was applied to the A-value fiducials, and the A-value was then automatically calculated for each specimen. High resolution micro-CT images of the same 19 specimens were used to measure the gold standard A-values for comparison against the manual and automated methods. The registration algorithm had excellent qualitative overlap between the atlas and target images. The automated method eliminated the observer variability and the systematic underestimation by experts. Manual measurement of the A-value on clinical CT had a mean error of 9.5 ± 4.3% compared to micro-CT, and this improved to an error of 2.7 ± 2.1% using the automated algorithm. Both the automated and manual methods correlated significantly with the gold standard micro-CT A-values (r = 0.70, p < 0.01 and r = 0.69, p < 0.01, respectively). An automated A-value measurement tool using atlas-based registration methods was successfully developed and validated. The automated method eliminated the observer variability and improved accuracy as compared to manual measurements by experts. This open-source tool has the potential to benefit cochlear implant recipients in the future.
de Carvalho, Rogério Mendonca; Perez, Maria Del Carmen Janerio; Miranda, Fausto
2012-10-01
Traditional volumetry based on Archimedes' principle is the gold standard for the measurement of limb volume, but the routine use of this technique is discouraged because of several disadvantages. The purpose of this study was to evaluate intraobserver and interobserver reliability of direct measurements of wrist-hand volume using a new communicating vessels volumeter based on Pascal's law. A reliability study was conducted. To evaluate the reliability of the communicating vessels volumeter in generating measurements, 30 hands of 15 participants (9 women, 6 men) were measured 3 times each by 3 observers, totaling 270 volumetric results. Measurement time was short (X =3 minutes 42 seconds). The intraclass correlation coefficient (ICC) was .9977 for observer 1 and .9976 for observers 2 and 3. The interobserver ICC was .9998. The standard error of measurement was about 3 mL for all observers; the interobserver result was 1 mL. The interrater coefficient of variance (CV) was 1.15% for the series of 9 measurements collected for each segment; the intrarater CV was 1.20%. Limitations No swollen hands were measured, and measurements were not compared with the gold standard technique. Thus, accuracy of the new volumeter was not determined in this study. A new device has been developed for plethysmography of the extremities, and the results of its use to measure the volume of the wrist-hand segment were reliable in both intraobserver and interobserver analyses.
Wang, Qingle; Zhang, Zhiyong; Shan, Fei; Shi, Yuxin; Xing, Wei; Shi, Liangrong; Zhang, Xingwei
2017-09-01
This study was conducted to assess intra-observer and inter-observer agreements for the measurement of dual-input whole tumor computed tomography perfusion (DCTP) in patients with lung cancer. A total of 88 patients who had undergone DCTP, which had proved a diagnosis of primary lung cancer, were divided into two groups: (i) nodules (diameter ≤3 cm) and masses (diameter >3 cm) by size, and (ii) tumors with and without air density. Pulmonary flow, bronchial flow, and pulmonary index were measured in each group. Intra-observer and inter-observer agreements for measurement were assessed using intraclass correlation coefficient, within-subject coefficient of variation, and Bland-Altman analysis. In all lung cancers, the reproducibility coefficient for intra-observer agreement (range 26.1-38.3%) was superior to inter-observer agreement (range 38.1-81.2%). Further analysis revealed lower agreements for nodules compared to masses. Additionally, inner-air density reduced both agreements for lung cancer. The intra-observer agreement for measuring lung cancer DCTP was satisfied, while the inter-observer agreement was limited. The effects of tumoral size and inner-air density to agreements, especially between two observers, should be emphasized. In future, an automatic computer-aided segment of perfusion value of the tumor should be developed. © 2017 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.
Høyer, C; Paludan, J P D; Pavar, S; Biurrun Manresa, J A; Petersen, L J
2014-03-01
To assess the intra- and inter-observer variation in laser Doppler flowmetry curve reading for measurement of toe and ankle pressures. A prospective single blinded diagnostic accuracy study was conducted on 200 patients with known or suspected peripheral arterial disease (PAD), with a total of 760 curve sets produced. The first curve reading for this study was performed by laboratory technologists blinded to clinical clues and previous readings at least 3 months after the primary data sampling. The pressure curves were later reassessed following another period of at least 3 months. Observer agreement in diagnostic classification according to TASC-II criteria was quantified using Cohen's kappa. Reliability was quantified using intra-class correlation coefficients, coefficients of variance, and Bland-Altman analysis. The overall agreement in diagnostic classification (PAD/not PAD) was 173/200 (87%) for intra-observer (κ = .858) and 175/200 (88%) for inter-observer data (κ = .787). Reliability analysis confirmed excellent correlation for both intra- and inter-observer data (ICC all ≥.931). The coefficients of variance ranged from 2.27% to 6.44% for intra-observer and 2.39% to 8.42% for inter-observer data. Subgroup analysis showed lower observer-variation for reading of toe pressures in patients with diabetes and/or chronic kidney disease than patients not diagnosed with these conditions. Bland-Altman plots showed higher variation in toe pressure readings than ankle pressure readings. This study shows substantial intra- and inter-observer agreement in diagnostic classification and reading of absolute pressures when using laboratory technologists as observers. The study emphasises that observer variation for curve reading is an important factor concerning the overall reproducibility of the method. Our data suggest diabetes and chronic kidney disease have an influence on toe pressure reproducibility. Copyright © 2013 European Society for Vascular Surgery. Published by Elsevier Ltd. All rights reserved.
Zonnebeld, Niek; Maas, Tommy M G; Huberts, Wouter; van Loon, Magda M; Delhaas, Tammo; Tordoir, Jan H M
2017-11-01
Although clinical guidelines on arteriovenous fistula (AVF) creation advocate minimum luminal arterial and venous diameters, assessed by duplex ultrasonography (DUS), the clinical value of routine DUS examination is under debate. DUS might be an insufficiently repeatable and/or reproducible imaging modality because of its operator dependency. The present study aimed to assess intra- and inter-observer agreement of DUS examination in support of AVF surgery planning. Ten end stage renal disease patients were included, to assess intra- and inter-observer agreement of pre-operative DUS measurements. All measurements were performed by two trained and experienced vascular technicians, blinded to measurement readings. From the routine DUS protocol, representative measurements (venous diameters, and arterial diameters and volume flow in the upper arm and forearm) were selected. For intra-observer agreement the measurements were performed in triplicate, with the probe released from the skin between each. Intraclass correlation coefficients were calculated for intra- and inter-observer agreement, and Bland-Altman plots used to graphically display mean measurement differences and limits of agreement. Ten patients (6 male, 59.4±19.7 years) consented to participate, and all predefined measurements were obtained. Intraclass correlation coefficients for intra-observer agreement of diameter measurements were at least 0.90 (95% CI 0.74-0.97; radial artery). Inter-observer agreement was at least 0.83 (0.46-0.96; lateral diameter upper arm cephalic vein). The Bland-Altman plots showed acceptable mean measurement differences and limits of agreement. In experienced hands, excellent intra- and inter-observer agreement can be reached for the discrete pre-operative DUS measurements advocated in clinical guidelines. DUS is therefore a reliable imaging modality to support AVF surgery planning. The content of DUS protocols, however, needs further standardisation. Copyright © 2017 European Society for Vascular Surgery. Published by Elsevier Ltd. All rights reserved.
Chen, Frank; Cen, Steven; Palmer, Suzanne
2017-09-01
To evaluate interobserver agreement with the use of and the positive predictive value (PPV) of Prostate Imaging Reporting and Data System version 2 (PI-RADS v2) for the localization of intermediate- and high-grade prostate cancers on multiparametric magnetic resonance imaging (mpMRI). In this retrospective, institutional review board-approved study, 131 consecutive patients who had mpMRI followed by transrectal ultrasound-MR imaging fusion-guided biopsy of the prostate were included. Two readers who were blinded to initial mpMRI reports, clinical data, and pathologic outcomes reviewed the MR images, identified all prostate lesions, and scored each lesion based on the PI-RADS v2. Interobserver agreement was assessed by intraclass correlation coefficient (ICC), and PPV was calculated for each PI-RADS category. PI-RADS v2 was found to have a moderate level of interobserver agreement between two readers of varying experience, with ICC of 0.74, 0.72, and 0.67 for all lesions, peripheral zone lesions, and transitional zone lesions, respectively. Despite only moderate interobserver agreement, the calculated PPV in the detection of intermediate- and high-grade prostate cancers for each PI-RADS category was very similar between the two readers, with approximate PPV of 0%, 12%, 64%, and 87% for PI-RADS categories 2, 3, 4, and 5, respectively. In our study, PI-RADS v2 has only moderate interobserver agreement, a similar finding in studies of the original PI-RADS and in initial studies of PI-RADS v2. Despite this, PI-RADS v2 appears to be a useful system to predict significant prostate cancer, with PI-RADS scores correlating well with the likelihood of intermediate- and high-grade cancers. Copyright © 2017 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Brunner, Alexander; Gühring, Markus; Schmälzle, Traude; Weise, Kuno; Badke, Andreas
2009-01-01
Evaluation of the kyphosis angle in thoracic and lumbar burst fractures is often used to indicate surgical procedures. The kyphosis angle could be measured as vertebral, segmental and local kyphosis according to the method of Cobb. The vertebral, segmental and local kyphosis according to the method of Cobb were measured at 120 lateral X-rays and sagittal computed tomographies of 60 thoracic and 60 lumbar burst fractures by 3 independent observers on 2 separate occasions. Osteoporotic fractures were excluded. The intra- and interobserver reliability of these angles in X-ray and computed tomogram, using the intra class correlation coefficient (ICC) were evaluated. Highest reproducibility showed the segmental kyphosis followed by the vertebral kyphosis. For thoracic fractures segmental kyphosis shows in X-ray “excellent” inter- and intraobserver reliabilities (ICC 0.826, 0.802) and for lumbar fractures “good” to “excellent” inter- and intraobserver reliabilities (ICC = 0.790, 0.803). In computed tomography, the segmental kyphosis showed “excellent” inter- and intraobserver reliabilities (ICC = 0.824, 0.801) for thoracic and “excellent” inter- and intraobserver reliabilities (ICC = 0.874, 0.835) for the lumbar fractures. Regarding both diagnostic work ups (X-ray and computed tomography), significant differences were evaluated in interobserver reliabilities for vertebral kyphosis measured in lumbar fracture X-rays (p = 0.035) and interobserver reliabilities for local kyphosis, measured in thoracic fracture X-rays (p = 0.010). Regarding both fracture localizations (thoracic and lumbar fractures), significant differences could only be evaluated in interobserver reliabilities for the local kyphosis measured in computed tomographies (p = 0.045) and in intraobserver reliabilities for the vertebral kyphosis measured in X-rays (p = 0.024). “Good” to “excellent” inter- and intraobserver reliabilities for vertebral, segmental and local kyphosis in X-ray make these angles to a helpful tool, indicating surgical procedures. For the practical use in lateral X-ray, we emphasize the determination of the segmental kyphosis, because of the highest reproducibility of this angle. “Good” to “excellent” inter- and intraobserver reliabilities for these three angles could also be evaluated in computed tomographies. Therefore, also in computed tomography, the use of these three angles seems to be generally possible. For a direct correlation of the results in lateral X-ray and in computed tomography, further studies should be needed. PMID:19953277
Nuances of Morphology in Myelodysplastic Diseases in the Age of Molecular Diagnostics.
Shaver, Aaron C; Seegmiller, Adam C
2017-10-01
Morphologic dysplasia is an important factor in diagnosis of myelodysplastic syndrome (MDS). However, the role of dysplasia is changing as new molecular genetic and genomic technologies take a more prominent place in diagnosis. This review discusses the role of morphology in the diagnosis of MDS and its interactions with cytogenetic and molecular testing. Recent changes in diagnostic criteria have attempted to standardize approaches to morphologic diagnosis of MDS, recognizing significant inter-observer variability in assessment of dysplasia. Definitive correlates between cytogenetic/molecular and morphologic findings have been described in only a small set of cases. However, these genetic and morphologic tools do play a complementary role in the diagnosis of both MDS and other myeloid neoplasms. Diagnosis of MDS requires a multi-factorial approach, utilizing both traditional morphologic as well as newer molecular genetic techniques. Understanding these tools, and the interplay between them, is crucial in the modern diagnosis of myeloid neoplasms.
Smartphone photography utilized to measure wrist range of motion.
Wagner, Eric R; Conti Mica, Megan; Shin, Alexander Y
2018-02-01
The purpose was to determine if smartphone photography is a reliable tool in measuring wrist movement. Smartphones were used to take digital photos of both wrists in 32 normal participants (64 wrists) at extremes of wrist motion. The smartphone measurements were compared with clinical goniometry measurements. There was a very high correlation between the clinical goniometry and smartphone measurements, as the concordance coefficients were high for radial deviation, ulnar deviation, wrist extension and wrist flexion. The Pearson coefficients also demonstrated the high precision of the smartphone measurements. The Bland-Altman plots demonstrated 29-31 of 32 smartphone measurements were within the 95% confidence interval of the clinical measurements for all positions of the wrists. There was high reliability between the photography taken by the volunteer and researcher, as well as high inter-observer reliability. Smartphone digital photography is a reliable and accurate tool for measuring wrist range of motion. II.
Inter-observer variation in identifying mammals from their tracks at enclosed track plate stations
William J. Zielinski; Fredrick V. Schlexer
2009-01-01
Enclosed track plate stations are a common method to detect mammalian carnivores. Studies rely on these data to make inferences about geographic range, population status and detectability. Despite their popularity, there has been no effort to document inter-observer variation in identifying the species that leave their tracks. Four previous field crew leaders...
Chanques, Gérald; Ely, E Wesley; Garnier, Océane; Perrigault, Fanny; Eloi, Anaïs; Carr, Julie; Rowan, Christine M; Prades, Albert; de Jong, Audrey; Moritz-Gasser, Sylvie; Molinari, Nicolas; Jaber, Samir
2018-03-01
One third of patients admitted to an intensive care unit (ICU) will develop delirium. However, delirium is under-recognized by bedside clinicians without the use of delirium screening tools, such as the Intensive Care Delirium Screening Checklist (ICDSC) or the Confusion Assessment Method for the ICU (CAM-ICU). The CAM-ICU was updated in 2014 to improve its use by clinicians throughout the world. It has never been validated compared to the new reference standard, the Diagnostic and Statistical Manual of Mental Disorders 5th version (DSM-5). We made a prospective psychometric study in a 16-bed medical-surgical ICU of a French academic hospital, to measure the diagnostic performance of the 2014 updated CAM-ICU compared to the DSM-5 as the reference standard. We included consecutive adult patients with a Richmond Agitation Sedation Scale (RASS) ≥ -3, without preexisting cognitive disorders, psychosis or cerebral injury. Delirium was independently assessed by neuropsychological experts using an operationalized approach to DSM-5, by investigators using the CAM-ICU and the ICDSC, by bedside clinicians and by ICU patients. The sensitivity, specificity, positive and negative predictive values were calculated considering neuropsychologist DSM-5 assessments as the reference standard (primary endpoint). CAM-ICU inter-observer agreement, as well as that between delirium diagnosis methods and the reference standard, was summarized using κ coefficients, which were subsequently compared using the Z-test. Delirium was diagnosed by experts in 38% of the 108 patients included for analysis. The CAM-ICU had a sensitivity of 83%, a specificity of 100%, a positive predictive value of 100% and a negative predictive value of 91%. Compared to the reference standard, the CAM-ICU had a significantly (p < 0.05) higher agreement (κ = 0.86 ± 0.05) than the physicians,' residents' and nurses' diagnoses (κ = 0.65 ± 0.09; 0.63 ± 0.09; 0.61 ± 0.09, respectively), as well as the patient's own impression of feeling delirious (κ = 0.02 ± 0.11). Differences between the ICDSC (κ = 0.69 ± 0.07) and CAM-ICU were not significant (p = 0.054). The CAM-ICU demonstrated a high reliability for inter-observer agreement (κ = 0.87 ± 0.06). The 2014 updated version of the CAM-ICU is valid according to DSM-5 criteria and reliable regarding inter-observer agreement in a research setting. Delirium remains under-recognized by bedside clinicians.
Tollafield, David R
2017-01-01
The management of plantar corns and callus has a low cost-benefit with reduced prioritisation in healthcare. The distinction between types of keratin lesions that forms corns and callus has attracted limited interest. Observation is imperative to improving diagnostic predictions and a number of studies point to some confusion as to how best to achieve this. The use of photographic observation has been proposed to improve our understanding of intractable keratin lesions. Students from a podiatry school reviewed photographs where plantar keratin lesions were divided into four nominal groups; light callus (Grade 1), heavy defined callus (Grade 2), concentric keratin plugs (Grade 3) and callus with deeper density changes under the forefoot (Grade 4). A group of 'experts' assigned from qualified podiatrists validated the observer rated responses by the students. Cohen's weighted statistic (k) was used to measure inter-observer reliability. First year students (unskilled) performed less well when viewing photographs ( k = 0.33) compared to third year students (semi-skilled, k = 0.62). The experts performed better than students ( k = 0.88) providing consistency with wound care models in other studies. Improved clinical annotation of clinical features, supported by classification of keratin- based lesions, combined with patient outcome tools, could improve the scientific rationale to prioritise patient care. Problems associated with photographic assessment involves trying to differentiate similar lesions without the benefit of direct palpation. Direct observation of callus with and without debridement requires further investigation alongside the model proposed in this paper.
Sánchez, Guillermo; Nova, John; Arias, Nilsa; Peña, Bibiana
2008-12-01
The Fitzpatrick phototype scale has been used to determine skin sensitivity to ultraviolet light. The reliability of this scale in estimating sensitivity permits risk evaluation of skin cancer based on phototype. Reliability and changes in intra and inter-observer concordance was determined for the Fitzpatrick phototype scale after the assessment methods for establishing the phototype were standardized. An analytical study of intra and inter-observer concordance was performed. The Fitzpatrick phototype scale was standardized using focus group methodology. To determine intra and inter-observer agreement, the weighted kappa statistical method was applied. The standardization effect was measured using the equal kappa contrast hypothesis and Wald test for dependent measurements. The phototype scale was applied to 155 patients over 15 years of age who were assessed four times by two independent observers. The sample was drawn from patients of the Centro Dermatol6gico Federico Lleras Acosta. During the pre-standardization phase, the baseline and six-week inter-observer weighted kappa were 0.31 and 0.40, respectively. The intra-observer kappa values for observers A and B were 0.47 and 0.51, respectively. After the standardization process, the baseline and six-week inter-observer weighted kappa values were 0.77, and 0.82, respectively. Intra-observer kappa coefficients for observers A and B were 0.78 and 0.82. Statistically significant differences were found between coefficients before and after standardization (p<0.001) in all comparisons. Following a standardization exercise, the Fitzpatrick phototype scale yielded reliable, reproducible and consistent results.
Interobserver delineation variation in lung tumour stereotactic body radiotherapy
Persson, G F; Nygaard, D E; Hollensen, C; Munck af Rosenschöld, P; Mouritsen, L S; Due, A K; Berthelsen, A K; Nyman, J; Markova, E; Roed, A P; Roed, H; Korreman, S; Specht, L
2012-01-01
Objectives In radiotherapy, delineation uncertainties are important as they contribute to systematic errors and can lead to geographical miss of the target. For margin computation, standard deviations (SDs) of all uncertainties must be included as SDs. The aim of this study was to quantify the interobserver delineation variation for stereotactic body radiotherapy (SBRT) of peripheral lung tumours using a cross-sectional study design. Methods 22 consecutive patients with 26 tumours were included. Positron emission tomography/CT scans were acquired for planning of SBRT. Three oncologists and three radiologists independently delineated the gross tumour volume. The interobserver variation was calculated as a mean of multiple SDs of distances to a reference contour, and calculated for the transversal plane (SDtrans) and craniocaudal (CC) direction (SDcc) separately. Concordance indexes and volume deviations were also calculated. Results Median tumour volume was 13.0 cm3, ranging from 0.3 to 60.4 cm3. The mean SDtrans was 0.15 cm (SD 0.08 cm) and the overall mean SDcc was 0.26 cm (SD 0.15 cm). Tumours with pleural contact had a significantly larger SDtrans than tumours surrounded by lung tissue. Conclusions The interobserver delineation variation was very small in this systematic cross-sectional analysis, although significantly larger in the CC direction than in the transversal plane, stressing that anisotropic margins should be applied. This study is the first to make a systematic cross-sectional analysis of delineation variation for peripheral lung tumours referred for SBRT, establishing the evidence that interobserver variation is very small for these tumours. PMID:22919015
Error in geometric morphometric data collection: Combining data from multiple sources.
Robinson, Chris; Terhune, Claire E
2017-09-01
This study compares two- and three-dimensional morphometric data to determine the extent to which intra- and interobserver and intermethod error influence the outcomes of statistical analyses. Data were collected five times for each method and observer on 14 anthropoid crania using calipers, a MicroScribe, and 3D models created from NextEngine and microCT scans. ANOVA models were used to examine variance in the linear data at the level of genus, species, specimen, observer, method, and trial. Three-dimensional data were analyzed using geometric morphometric methods; principal components analysis was employed to examine how trials of all specimens were distributed in morphospace and Procrustes distances among trials were calculated and used to generate UPGMA trees to explore whether all trials of the same individual grouped together regardless of observer or method. Most variance in the linear data was at the genus level, with greater variance at the observer than method levels. In the 3D data, interobserver and intermethod error were similar to intraspecific distances among Callicebus cupreus individuals, with interobserver error being higher than intermethod error. Generally, taxa separate well in morphospace, with different trials of the same specimen typically grouping together. However, trials of individuals in the same species overlapped substantially with one another. Researchers should be cautious when compiling data from multiple methods and/or observers, especially if analyses are focused on intraspecific variation or closely related species, as in these cases, patterns among individuals may be obscured by interobserver and intermethod error. Conducting interobserver and intermethod reliability assessments prior to the collection of data is recommended. © 2017 Wiley Periodicals, Inc.
Are distal radius fracture classifications reproducible? Intra and interobserver agreement.
Belloti, João Carlos; Tamaoki, Marcel Jun Sugawara; Franciozi, Carlos Eduardo da Silveira; Santos, João Baptista Gomes dos; Balbachevsky, Daniel; Chap Chap, Eduardo; Albertoni, Walter Manna; Faloppa, Flávio
2008-05-01
Various classification systems have been proposed for fractures of the distal radius, but the reliability of these classifications is seldom addressed. For a fracture classification to be useful, it must provide prognostic significance, interobserver reliability and intraobserver reproducibility. The aim here was to evaluate the intraobserver and interobserver agreement of distal radius fracture classifications. This was a validation study on interobserver and intraobserver reliability. It was developed in the Department of Orthopedics and Traumatology, Universidade Federal de São Paulo - Escola Paulista de Medicina. X-rays from 98 cases of displaced distal radius fracture were evaluated by five observers: one third-year orthopedic resident (R3), one sixth-year undergraduate medical student (UG6), one radiologist physician (XRP), one orthopedic trauma specialist (OT) and one orthopedic hand surgery specialist (OHS). The radiographs were classified on three different occasions (times T1, T2 and T3) using the Universal (Cooney), Arbeitsgemeinschaft für Osteosynthesefragen/Association for the Study of Internal Fixation (AO/ASIF), Frykman and Fernández classifications. The kappa coefficient (kappa) was applied to assess the degree of agreement. Among the three occasions, the highest mean intraobserver k was observed in the Universal classification (0.61), followed by Fernández (0.59), Frykman (0.55) and AO/ASIF (0.49). The interobserver agreement was unsatisfactory in all classifications. The Fernández classification showed the best agreement (0.44) and the worst was the Frykman classification (0.26). The low agreement levels observed in this study suggest that there is still no classification method with high reproducibility.
External validation of Global Evaluative Assessment of Robotic Skills (GEARS).
Aghazadeh, Monty A; Jayaratna, Isuru S; Hung, Andrew J; Pan, Michael M; Desai, Mihir M; Gill, Inderbir S; Goh, Alvin C
2015-11-01
We demonstrate the construct validity, reliability, and utility of Global Evaluative Assessment of Robotic Skills (GEARS), a clinical assessment tool designed to measure robotic technical skills, in an independent cohort using an in vivo animal training model. Using a cross-sectional observational study design, 47 voluntary participants were categorized as experts (>30 robotic cases completed as primary surgeon) or trainees. The trainee group was further divided into intermediates (≥5 but ≤30 cases) or novices (<5 cases). All participants completed a standardized in vivo robotic task in a porcine model. Task performance was evaluated by two expert robotic surgeons and self-assessed by the participants using the GEARS assessment tool. Kruskal-Wallis test was used to compare the GEARS performance scores to determine construct validity; Spearman's rank correlation measured interobserver reliability; and Cronbach's alpha was used to assess internal consistency. Performance evaluations were completed on nine experts and 38 trainees (14 intermediate, 24 novice). Experts demonstrated superior performance compared to intermediates and novices overall and in all individual domains (p < 0.0001). In comparing intermediates and novices, the overall performance difference trended toward significance (p = 0.0505), while the individual domains of efficiency and autonomy were significantly different between groups (p = 0.0280 and 0.0425, respectively). Interobserver reliability between expert ratings was confirmed with a strong correlation observed (r = 0.857, 95 % CI [0.691, 0.941]). Experts and participant scoring showed less agreement (r = 0.435, 95 % CI [0.121, 0.689] and r = 0.422, 95 % CI [0.081, 0.0672]). Internal consistency was excellent for experts and participants (α = 0.96, 0.98, 0.93). In an independent cohort, GEARS was able to differentiate between different robotic skill levels, demonstrating excellent construct validity. As a standardized assessment tool, GEARS maintained consistency and reliability for an in vivo robotic surgical task and may be applied for skills evaluation in a broad range of robotic procedures.
Dorniak, Karolina; Heiberg, Einar; Hellmann, Marcin; Rawicz-Zegrzda, Dorota; Wesierska, Maria; Galaska, Rafal; Sabisz, Agnieszka; Szurowska, Edyta; Dudziak, Maria; Hedström, Erik
2016-05-26
Pulse wave velocity (PWV) is a biomarker for arterial stiffness, clinically assessed by applanation tonometry (AT). Increased use of phase-contrast cardiac magnetic resonance (CMR) imaging allows for PWV assessment with minor routine protocol additions. The aims were to investigate the acquired temporal resolution needed for accurate and precise measurements of CMR-PWV, and develop a tool for CMR-PWV measurements. Computer phantoms were generated for PWV = 2-20 m/s based on human CMR-PWV data. The PWV measurements were performed in 13 healthy young subjects and 13 patients at risk for cardiovascular disease. The CMR-PWV was measured by through-plane phase-contrast CMR in the ascending aorta and at the diaphragm level. Centre-line aortic distance was determined between flow planes. The AT-PWV was assessed within 2 h after CMR. Three observers (CMR experience: 15, 4, and <1 year) determined CMR-PWV. The developed tool was based on the flow-curve foot transit time for PWV quantification. Computer phantoms showed bias 0.27 ± 0.32 m/s for a temporal resolution of at least 30 ms. Intraobserver variability for CMR-PWV were: 0 ± 0.03 m/s (15 years), -0.04 ± 0.33 m/s (4 years), and -0.02 ± 0.30 m/s (<1 year). Interobserver variability for CMR-PWV was below 0.02 ± 0.38 m/s. The AT-PWV overestimated CMR-PWV by 1.1 ± 0.7 m/s in healthy young subjects and 1.6 ± 2.7 m/s in patients. An acquired temporal resolution of at least 30 ms should be used to obtain accurate and precise thoracic aortic phase-contrast CMR-PWV. A new freely available research tool was used to measure PWV in healthy young subjects and in patients, showing low intra- and interobserver variability also for less experienced CMR observers.
Pavlovic, Chris; Futamatsu, Hideki; Angiolillo, Dominick J; Guzman, Luis A; Wilke, Norbert; Siragusa, Daniel; Wludyka, Peter; Percy, Robert; Northrup, Martin; Bass, Theodore A; Costa, Marco A
2007-04-01
The purpose of this study is to evaluate the accuracy of semiautomated analysis of contrast enhanced magnetic resonance angiography (MRA) in patients who have undergone standard angiographic evaluation for peripheral vascular disease (PVD). Magnetic resonance angiography is an important tool for evaluating PVD. Although this technique is both safe and noninvasive, the accuracy and reproducibility of quantitative measurements of disease severity using MRA in the clinical setting have not been fully investigated. 43 lesions in 13 patients who underwent both MRA and digital subtraction angiography (DSA) of iliac and common femoral arteries within 6 months were analyzed using quantitative magnetic resonance angiography (QMRA) and quantitative vascular analysis (QVA). Analysis was repeated by a second operator and by the same operator in approximately 1 month time. QMRA underestimated percent diameter stenosis (%DS) compared to measurements made with QVA by 2.47%. Limits of agreement between the two methods were +/- 9.14%. Interobserver variability in measurements of %DS were +/- 12.58% for QMRA and +/- 10.04% for QVA. Intraobserver variability of %DS for QMRA was +/- 4.6% and for QVA was +/- 8.46%. QMRA displays a high level of agreement to QVA when used to determine stenosis severity in iliac and common femoral arteries. Similar levels of interobserver and intraobserver variability are present with each method. Overall, QMRA represents a useful method to quantify severity of PVD.
Heineman, Kirsten R; Bos, Arend F; Hadders-Algra, Mijna
2008-04-01
A reliable and valid instrument to assess neuromotor condition in infancy is a prerequisite for early detection of developmental motor disorders. We developed a video-based assessment of motor behaviour, the Infant Motor Profile (IMP), to evaluate motor abilities, movement variability, ability to select motor strategies, movement symmetry, and fluency. The IMP consists of 80 items and is applicable in children from 3 to 18 months. The present study aimed to test intra- and interobserver reliability and concurrent validity of the IMP with the Alberta Infant Motor Scale (AIMS) and Touwen neurological examination. The study group consisted of 40 low-risk term (median gestational age [GA] 40 wks, range 38-42 wks) and 40 high-risk preterm infants (median GA 29.6 wks, range 26-33 wks) with corrected ages 4 to 18 months (31 females, 49 males). Intra- and interobserver agreement of the IMP were satisfactory (Spearman's rho=0.9). Concurrent validity of IMP and AIMS was good (Spearman's rho=0.8, p<0.005). The IMP was able to differentiate between infants with normal neurological condition, simple minor neurological dysfunction (MND), complex MND, and abnormal neurological condition (p<0.005). This means that the IMP may be a promising tool to evaluate neurological integrity during infancy, a suggestion that needs confirmation by means of assessment of larger groups of infants with heterogeneous neurological conditions.
de Vries, Merlijn W; Visscher, Corine; Delwel, Suzanne; van der Steen, Jenny T; Pieper, Marjoleine J C; Scherder, Erik J A; Achterberg, Wilco P; Lobbezoo, Frank
2016-01-01
Objectives. The aim of this study was to establish the reliability of the "chewing" subscale of the OPS-NVI, a novel tool designed to estimate presence and severity of orofacial pain in nonverbal patients. Methods. The OPS-NVI consists of 16 items for observed behavior, classified into four categories and a subjective estimate of pain. Two observers used the OPS-NVI for 237 video clips of people with dementia in Dutch nursing homes during their meal to observe their behavior and to estimate the intensity of orofacial pain. Six weeks later, the same observers rated the video clips a second time. Results. Bottom and ceiling effects for some items were found. This resulted in exclusion of these items from the statistical analyses. The categories which included the remaining items (n = 6) showed reliability varying between fair-to-good and excellent (interobserver reliability, ICC: 0.40-0.47; intraobserver reliability, ICC: 0.40-0.92). Conclusions. The "chewing" subscale of the OPS-NVI showed a fair-to-good to excellent interobserver and intraobserver reliability in this dementia population. This study contributes to the validation process of the OPS-NVI as a whole and stresses the need for further assessment of the reliability of the OPS-NVI with subjects that might already show signs of orofacial pain.
Engkvist, I L; Hagberg, M; Wigaeus-Hjelm, E; Menckel, E; Ekenvall, L
1995-06-01
No documented strategy, including preventive strategies, for systematic investigation of overexertion back accidents among nursing personnel has yet been published. One aim of the present study was to develop standardized instruments for the systematic investigation of back accidents among nursing personnel in order to develop preventive strategies. Another aim was to produce a screening tool that could easily be used for identifying potential overexertion back accident hazards. Two structured interview protocols were developed, one for the injured person and one for the supervisor. An ergonomics checklist was designed for the most important spaces according to accident statistics: patient's room, corridor, toilet, and also one for 'other space', eg X-ray and treatment rooms. The instruments were developed by frequent discussions and adjustments in a task force of researchers and occupational health personnel. The protocols were tested in two steps before a final version was established. The construct validity and interobserver reliability of the checklist were tested by ten ergonomists, who checked a patient's room, a toilet and a corridor with some known hazards. The constructed validity agreement was 90% in 19 of 26 items in the checklist. The interobserver reliability had the same figures as the validity for all items in the checklist. The interview protocols and checklist appear to be suitable for systematic investigation of overexertion back accidents.
Mine, Benjamin; Tancredi, Illario; Aljishi, Ali; Alghamdi, Faisal; Beltran, Margarita; Herchuelz, Maxime; Lubicz, Boris
2016-06-01
To compare contrast-enhanced MR angiography (CE-MRA) and DSA for the follow-up of intracranial aneurysms (IAs) treated with the Woven EndoBridge embolization system DL (WEB DL; Sequent Medical, Aliso Viejo, California, USA). We retrospectively identified all patients treated with a WEB DL between November 2010 and February 2013 in 2 hospitals. The IA occlusion was graded on follow-up CE-MRA and DSA by 4 independent readers and by 2 readers reaching a consensus, respectively. Interobserver agreement for MRA and intertechnique agreement was evaluated by calculating linear weighted κ. Fifteen patients with 16 IAs were included. Mean delay between MRA and DSA was 2 months (range 0-16 months). Interobserver agreement for MRA was substantial to almost perfect (κ=0.686-0.921; mean κ=0.809). Intertechnique agreement was moderate to substantial (κ=0.579-0.724; mean κ=0.669). Only three out of five inadequately occluded IAs were detected by MRA. CE-MRA is a useful tool for the follow-up of IAs treated with a WEB DL. However, early follow-up with DSA remains mandatory to detect inadequately occluded IAs. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
ERIC Educational Resources Information Center
Mudford, Oliver C.; Taylor, Sarah Ann; Martin, Neil T.
2009-01-01
We reviewed all research articles in 10 recent volumes of the "Journal of Applied Behavior Analysis (JABA)": Vol. 28(3), 1995, through Vol. 38(2), 2005. Continuous recording was used in the majority (55%) of the 168 articles reporting data on free-operant human behaviors. Three methods for reporting interobserver agreement (exact agreement,…
Torabian, Kian; Lezzar, Dalia; Piety, Nathaniel Z; George, Alex; Shevkoplyas, Sergey S
2017-09-20
Sickle cell anemia (SCA) is a genetic blood disorder that is particularly lethal in early childhood. Universal newborn screening programs and subsequent early treatment are known to drastically reduce under-five SCA mortality. However, in resource-limited settings, cost and infrastructure constraints limit the effectiveness of laboratory-based SCA screening programs. To address this limitation our laboratory previously developed a low-cost, equipment-free, point-of-care, paper-based SCA test. Here, we improved the stability and performance of the test by replacing sodium hydrosulfite (HS), a key reducing agent in the hemoglobin solubility buffer which is not stable in aqueous solutions, with sodium metabisulfite (MS). The MS formulation of the test was compared to the HS formulation in a laboratory setting by inexperienced users ( n = 3), to determine visual limit of detection (LOD), readout time, diagnostic accuracy, intra- and inter-observer agreement, and shelf life. The MS test was found to have a 10% sickle hemoglobin LOD, 21-min readout time, 97.3% sensitivity and 99.5% specificity for SCA, almost perfect intra- and inter-observer agreement, at least 24 weeks of shelf stability at room temperature, and could be packaged into a self-contained, distributable test kits comprised of off-the-shelf disposable components and food-grade reagents with a total cost of only $0.21 (USD).
Verhaart, René F; Fortunati, Valerio; Verduijn, Gerda M; van der Lugt, Aad; van Walsum, Theo; Veenland, Jifke F; Paulides, Margarethus M
2014-12-01
In current clinical practice, head and neck (H&N) hyperthermia treatment planning (HTP) is solely based on computed tomography (CT) images. Magnetic resonance imaging (MRI) provides superior soft-tissue contrast over CT. The purpose of the authors' study is to investigate the relevance of using MRI in addition to CT for patient modeling in H&N HTP. CT and MRI scans were acquired for 11 patients in an immobilization mask. Three observers manually segmented on CT, MRI T1 weighted (MRI-T1w), and MRI T2 weighted (MRI-T2w) images the following thermo-sensitive tissues: cerebrum, cerebellum, brainstem, myelum, sclera, lens, vitreous humor, and the optical nerve. For these tissues that are used for patient modeling in H&N HTP, the interobserver variation of manual tissue segmentation in CT and MRI was quantified with the mean surface distance (MSD). Next, the authors compared the impact of CT and CT and MRI based patient models on the predicted temperatures. For each tissue, the modality was selected that led to the lowest observer variation and inserted this in the combined CT and MRI based patient model (CT and MRI), after a deformable image registration. In addition, a patient model with a detailed segmentation of brain tissues (including white matter, gray matter, and cerebrospinal fluid) was created (CT and MRIdb). To quantify the relevance of MRI based segmentation for H&N HTP, the authors compared the predicted maximum temperatures in the segmented tissues (Tmax) and the corresponding specific absorption rate (SAR) of the patient models based on (1) CT, (2) CT and MRI, and (3) CT and MRIdb. In MRI, a similar or reduced interobserver variation was found compared to CT (maximum of median MSD in CT: 0.93 mm, MRI-T1w: 0.72 mm, MRI-T2w: 0.66 mm). Only for the optical nerve the interobserver variation is significantly lower in CT compared to MRI (median MSD in CT: 0.58 mm, MRI-T1w: 1.27 mm, MRI-T2w: 1.40 mm). Patient models based on CT (Tmax: 38.0 °C) and CT and MRI (Tmax: 38.1 °C) result in similar simulated temperatures, while CT and MRIdb (Tmax: 38.5 °C) resulted in significantly higher temperatures. The SAR corresponding to these temperatures did not differ significantly. Although MR imaging reduces the interobserver variation in most tissues, it does not affect simulated local tissue temperatures. However, the improved soft-tissue contrast provided by MRI allows generating a detailed brain segmentation, which has a strong impact on the predicted local temperatures and hence may improve simulation guided hyperthermia.
Endodontic radiography: who is reading the digital radiograph?
Tewary, Shalini; Luzzo, Joseph; Hartwell, Gary
2011-07-01
Digital radiographic imaging systems have undergone tremendous improvements since their introduction. Advantages of digital radiographs over conventional films include lower radiation doses compared with conventional films, instantaneous images, archiving and sharing images easily, and manipulation of several radiographic properties that might help in diagnosis. A total of 6 observers including 2 endodontic residents, 3 endodontists, and 1 oral radiologist evaluated 150 molar digital periapical radiographs to determine which of the following conditions existed: normal periapical tissue, widened periodontal ligament, or presence of periapical radiolucency. The evaluators had full control over the radiograph's parameters of the Planmeca Dimaxis software program. All images were viewed on the same computer monitor with ideal vie-wing conditions. The same 6 observers evaluated the same 150 digital images 3 months later. The data were analyzed to determine how well the evaluators agreed with each other (interobserver agreement) for 2 rounds of observations and with themselves (intraobserver agreement). Fleiss kappa statistical analysis was used to measure the level of agreement among multiple raters. The overall Fleiss kappa value for interobserver agreement for the first round of interpretation was 0.34 (P < .001). The overall Fleiss kappa value for interobserver agreement for the second round of interpretation was 0.35 (P < .001). This resulted in fair (0.2-0.4) agreement among the 6 raters at both observation periods. A weighted kappa analysis was used to determine intraobserver agreement, which showed on average a moderate agreement. The results indicate that the interpretation of a dental radiograph is subjective, irrespective of whether conventional or digital radiographs are used. The factors that appeared to have the most impact were the years of experience of the examiner and familiarity of the operator with a given digital system. Copyright © 2011 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
Garbe, Claus; Eigentler, Thomas K; Bauer, Jürgen; Blödorn-Schlicht, Norbert; Cerroni, Lorenzo; Fend, Falko; Hantschke, Markus; Kurschat, Peter; Kutzner, Heinz; Metze, Dieter; Mielke, Volker; Preßler, Harald; Reusch, Michael; Reusch, Ursula; Stadler, Rudolf; Tronnier, Michael; Yazdi, Amir; Metzler, Gisela
2016-09-01
In 2009, the AJCC issued a revised melanoma staging system. In addition to tumor thickness and ulceration, the mitotic rate was introduced as the third major prognostic parameter for the classification of primary cutaneous melanoma. Given that, according to the 2009 AJCC classification, the detection of one or more dermal tumor mitoses leads to an upstaging - from stage Ia to Ib - of melanomas with a tumor thickness of ≤ 1.0 mm, we set out to investigate the reproducibility of this new parameter. In order to assess interobserver reliability, 17 dermatopathologists und pathologists - all well versed in the diagnosis of cutaneous melanoma - analyzed the mitotic rate in 15 thin primary cutaneous melanomas (mean tumor thickness 0.91 mm) using identical slides. Mitotic rates were determined on H&E and phosphohistone H3 (Ser10)-stained samples. Without knowledge of their previous assessment, five of the aforementioned examiners reevaluated the samples after more than one year in order to ascertain intraobserver reliability. Interobserver reliability of the mitotic rate in thin primary melanomas is disappointing and independent of whether H&E or immunohistochemically stained samples are used (kappa value: 0.088 [H&E], 0.154 [IH], respectively). Kappa values improved to 0.345 (H&E) and 0.403 (IH) when using a cutoff of 0/1 vs. 2+ mitoses. Similarly unsatisfactory, kappa values for intraobserver reliability ranged from 0.18 and 0.348, depending on the individual examiner. Given the unsatisfactory reproducibility and large variations in assessing the mitotic rate, it remains a matter of debate whether this diagnostic parameter should play a role in therapeutic decisions. © 2016 Deutsche Dermatologische Gesellschaft (DDG). Published by John Wiley & Sons Ltd.
Erdoğan, Zeynep; Abdülrezzak, Ümmühan; Silov, Güler; Özdal, Ayşegül; Turhal, Özgül
2014-01-01
Objective: The aim of this study was to investigate the variability in the interpretation of parenchymal abnormalities and to assess the differences in interpretation of routine renal scintigraphic findings on posterior view of technetium-99m dimercaptosuccinic acid (pvDMSA) scans and parenchymal phase of technetium-99m mercaptoacetyltriglycine (ppMAG3) scans by using standard criterions to make standardization and semiquantitative evaluation and to have more accurately correlation. Materials and Methods: Two experienced nuclear medicine physicians independently interpreted pvDMSA scans of 204 and ppMAG3 scans of 102 pediatric patients, retrospectively. Comparisons were made by visual inspection of pvDMSA scans, and ppMAG3 scans by using a grading system modified from Itoh et al. According to this, anatomical damage of the renal parenchyma was classified into six types: Grade 0-V. In the calculation of the agreement rates, Kendall correlation (tau-b) analysis was used. Results: According to our findings, excellent agreement was found for DMSA grade readings (DMSA-GR) (tau-b = 0.827) and good agreement for MAG3 grade readings (MAG3-GR) (tau-b = 0.790) between two observers. Most of clear parenchymal lesions detected on pvDMSA scans and ppMAG3 scans identified by observers equally. Studies with negative or minimal lesions reduced correlation degrees for both DMSA-GR and MAG3-GR. Conclusion: Our grading system can be used for standardization of the reports. We conclude that standardization of criteria and terminology in the interpretations may result in higher interobserver consistency, also improve low interobserver reproducibility and objectivity of renal scintigraphy reports. PMID:24761059
Kamphaus, A; Rapp, M; Wessel, L M; Buchholz, M; Massalme, E; Schneidmüller, D; Roeder, C; Kaiser, M M
2015-04-01
There are two child-specific fracture classification systems for long bone fractures: the AO classification of pediatric long-bone fractures (PCCF) and the LiLa classification of pediatric fractures of long bones (LiLa classification). Both are still not widely established in comparison to the adult AO classification for long bone fractures. During a period of 12 months all long bone fractures in children were documented and classified according to the LiLa classification by experts and non-experts. Intraobserver and interobserver reliability were calculated according to Cohen (kappa). A total of 408 fractures were classified. The intraobserver reliability for location in the skeletal and bone segment showed an almost perfect agreement (K = 0.91-0.95) and also the morphology (joint/shaft fracture) (K = 0.87-0.93). Due to different judgment of the fracture displacement in the second classification round, the intraobserver reliability of the whole classification revealed moderate agreement (K = 0.53-0.58). Interobserver reliability showed moderate agreement (K = 0.55) often due to the low quality of the X-rays. Further differences occurred due to difficulties in assigning the precise transition from metaphysis to diaphysis. The LiLa classification is suitable and in most cases user-friendly for classifying long bone fractures in children. Reliability is higher than in established fracture specific classifications and comparable to the AO classification of pediatric long bone fractures. Some mistakes were due to a low quality of the X-rays and some due to difficulties to classify the fractures themselves. Improvements include a more precise definition of the metaphysis and the kind of displacement. Overall the LiLa classification should still be considered as an alternative for classifying pediatric long bone fractures.
A critical appraisal of vertebral fracture assessment in paediatrics.
Kyriakou, Andreas; Shepherd, Sheila; Mason, Avril; Faisal Ahmed, S
2015-12-01
There is a need to improve our understanding of the clinical utility of vertebral fracture assessment (VFA) in paediatrics and this requires a thorough evaluation of its readability, reproducibility, and accuracy for identifying VF. VFA was performed independently by two observers, in 165 children and adolescents with a median age of 13.4 years (range, 3.6, 18). In 20 of these subjects, VFA was compared to lateral vertebral morphometry assessment on lateral spine X-ray (LVM). 1528 (84%) of the vertebrae were adequately visualised by both observers for VFA. Interobserver agreement in vertebral readability was 94% (kappa, 0.73 [95% CI, 0.68, 0.73]). 93% of the non-readable vertebrae were located between T6 and T9. Interobserver agreement per-vertebra for the presence of VF was 99% (kappa, 0.85 [95% CI, 0.79, 0.91]). Interobserver agreement per-subject was 91% (kappa, 0.78 [95% CI, 0.66, 0.87]). Per-vertebra agreement between LVM and VFA was 95% (kappa 0.79 [95% CI, 0.62, 0.92]) and per-subject agreement was 95% (kappa, 0.88 [95% CI, 0.58, 1.0]). Accepting LVM as the gold standard, VFA had a positive predictive value (PPV) of 90% and a negative predictive value (NPV) of 95% in per-vertebra analysis and a PPV of 100% and NPV of 93% in per-subject analysis. VFA reaches an excellent level of agreement between observers and a high level of accuracy in identifying VF in a paediatric population. The readability of vertebrae at the mid thoracic region is suboptimal and interpretation at this level should be exercised with caution. Copyright © 2015 Elsevier Inc. All rights reserved.
International perception of lung sounds: a comparison of classification across some European borders
Aviles-Solis, Juan Carlos; Vanbelle, Sophie; Halvorsen, Peder A; Francis, Nick; Cals, Jochen W L; Andreeva, Elena A; Marques, Alda; Piirilä, Päivi; Pasterkamp, Hans; Melbye, Hasse
2017-01-01
Introduction Lung auscultation is helpful in the diagnosis of lung and heart diseases; however, the diagnostic value of lung sounds may be questioned due to interobserver variation. This situation may also impair clinical research in this area to generate evidence-based knowledge about the role that chest auscultation has in a modern clinical setting. The recording and visual display of lung sounds is a method that is both repeatable and feasible to use in large samples, and the aim of this study was to evaluate interobserver agreement using this method. Methods With a microphone in a stethoscope tube, we collected digital recordings of lung sounds from six sites on the chest surface in 20 subjects aged 40 years or older with and without lung and heart diseases. A total of 120 recordings and their spectrograms were independently classified by 28 observers from seven different countries. We employed absolute agreement and kappa coefficients to explore interobserver agreement in classifying crackles and wheezes within and between subgroups of four observers. Results When evaluating agreement on crackles (inspiratory or expiratory) in each subgroup, observers agreed on between 65% and 87% of the cases. Conger’s kappa ranged from 0.20 to 0.58 and four out of seven groups reached a kappa of ≥0.49. In the classification of wheezes, we observed a probability of agreement between 69% and 99.6% and kappa values from 0.09 to 0.97. Four out of seven groups reached a kappa ≥0.62. Conclusions The kappa values we observed in our study ranged widely but, when addressing its limitations, we find the method of recording and presenting lung sounds with spectrograms sufficient for both clinic and research. Standardisation of terminology across countries would improve international communication on lung auscultation findings. PMID:29435344
Aviles-Solis, Juan Carlos; Vanbelle, Sophie; Halvorsen, Peder A; Francis, Nick; Cals, Jochen W L; Andreeva, Elena A; Marques, Alda; Piirilä, Päivi; Pasterkamp, Hans; Melbye, Hasse
2017-01-01
Lung auscultation is helpful in the diagnosis of lung and heart diseases; however, the diagnostic value of lung sounds may be questioned due to interobserver variation. This situation may also impair clinical research in this area to generate evidence-based knowledge about the role that chest auscultation has in a modern clinical setting. The recording and visual display of lung sounds is a method that is both repeatable and feasible to use in large samples, and the aim of this study was to evaluate interobserver agreement using this method. With a microphone in a stethoscope tube, we collected digital recordings of lung sounds from six sites on the chest surface in 20 subjects aged 40 years or older with and without lung and heart diseases. A total of 120 recordings and their spectrograms were independently classified by 28 observers from seven different countries. We employed absolute agreement and kappa coefficients to explore interobserver agreement in classifying crackles and wheezes within and between subgroups of four observers. When evaluating agreement on crackles (inspiratory or expiratory) in each subgroup, observers agreed on between 65% and 87% of the cases. Conger's kappa ranged from 0.20 to 0.58 and four out of seven groups reached a kappa of ≥0.49. In the classification of wheezes, we observed a probability of agreement between 69% and 99.6% and kappa values from 0.09 to 0.97. Four out of seven groups reached a kappa ≥0.62. The kappa values we observed in our study ranged widely but, when addressing its limitations, we find the method of recording and presenting lung sounds with spectrograms sufficient for both clinic and research. Standardisation of terminology across countries would improve international communication on lung auscultation findings.
Huang, Qi-Fang; Wei, Fang-Fei; Zhang, Zhen-Yu; Raaijmakers, Anke; Asayama, Kei; Thijs, Lutgarde; Yang, Wen-Yi; Mujaj, Blerim; Allegaert, Karel; Verhamme, Peter; Struijker-Boudier, Harry A J; Li, Yan; Staessen, Jan A
2018-03-10
Retinal microvascular traits predict adverse health outcomes. The Singapore I Vessel Assessment (SIVA) software improved automated postprocessing of retinal photographs. In addition to microvessel caliber, it generates measures of arteriolar and venular geometry. Few studies addressed the reproducibility of SIVA measurements across a wide age range. In the current study, 2 blinded graders read images obtained by nonmydriatic retinal photography twice in 20 11-year-old children, born prematurely (n = 10) or at term (n = 10) and in 60 adults (age range, 18.9-86.1 years). Former preterm compared with term children had lower microvessel diameter and disorganized vessel geometry with no differences in intraobserver and interobserver variability. Among adults, microvessel caliber decreased with age and blood pressure and arteriolar geometry was inversely correlated with female sex and age. Intraobserver differences estimated by the Bland-Altman method did not reach significance for any measurement. Across measurements, median reproducibility (RM) expressed as percent of the average trait value was 8.8% in children (median intraclass correlation coefficient [ICC], 0.94) and 8.0% (0.97) in adults. Likewise, interobserver differences did not reach significance with RM (ICC) of 10.6% (0.85) in children and 10.4% (0.93) in adults. Reproducibility was best for microvessel caliber (intraobserver/interobserver RM, 4.7%/6.0%; ICC, 0.98/0.96), worst for venular geometry (17.0%/18.8%; 0.93/0.84), and intermediate for arteriolar geometry (10.9%/14.9%; 0.95/0.86). SIVA produces repeatable measures of the retinal microvasculature in former preterm and term children and in adults, thereby proving its usability from childhood to old age.
Vinod, Shalini K; Min, Myo; Jameson, Michael G; Holloway, Lois C
2016-06-01
Inter-observer variability (IOV) in target volume and organ-at-risk (OAR) delineation is a source of potential error in radiation therapy treatment. The aims of this study were to identify interventions shown to reduce IOV in volume delineation. Medline and Pubmed databases were queried for relevant articles using various keywords to identify articles which evaluated IOV in target or OAR delineation for multiple (>2) observers. The search was limited to English language articles and to those published from 1 January 2000 to 31 December 2014. Reference lists of identified articles were scrutinised to identify relevant studies. Studies were included if they reported IOV in contouring before and after an intervention including the use of additional or alternative imaging. Fifty-six studies were identified. These were grouped into evaluation of guidelines (n = 9), teaching (n = 9), provision of an autocontour (n = 7) and the impact of imaging (n = 31) on IOV. Guidelines significantly reduced IOV in 7/9 studies. Teaching interventions reduced IOV in 8/9 studies, statistically significant in 4. The provision of an autocontour improved consistency of contouring in 6/7 studies, statistically significant in 5. The effect of additional imaging on IOV was variable. Pre-operative CT was useful in reducing IOV in contouring breast and liver cancers, PET scans in lung cancer, rectal cancer and lymphoma and MRI scans in OARs in head and neck cancers. Inter-observer variability in volume delineation can be reduced with the use of guidelines, provision of autocontours and teaching. The use of multimodality imaging is useful in certain tumour sites. © 2016 The Royal Australian and New Zealand College of Radiologists.
Office-Based Point of Care Testing (IgA/IgG-Deamidated Gliadin Peptide) for Celiac Disease.
Lau, Michelle S; Mooney, Peter D; White, William L; Rees, Michael A; Wong, Simon H; Hadjivassiliou, Marios; Green, Peter H R; Lebwohl, Benjamin; Sanders, David S
2018-06-19
Celiac disease (CD) is common yet under-detected. A point of care test (POCT) may improve CD detection. We aimed to assess the diagnostic performance of an IgA/IgG-deamidated gliadin peptide (DGP)-based POCT for CD detection, patient acceptability, and inter-observer variability of the POCT results. From 2013-2017, we prospectively recruited patients referred to secondary care with gastrointestinal symptoms, anemia and/or weight loss (group 1); and patients with self-reported gluten sensitivity with unknown CD status (group 2). All patients had concurrent POCT, IgA-tissue transglutaminase (IgA-TTG), IgA-endomysial antibodies (IgA-EMA), total IgA levels, and duodenal biopsies. Five hundred patients completed acceptability questionnaires, and inter-observer variability of the POCT results was compared among five clinical staff for 400 cases. Group 1: 1000 patients, 58.5% female, age 16-91, median age 57. Forty-one patients (4.1%) were diagnosed with CD. The sensitivities of the POCT, IgA-TTG, and IgA-EMA were 82.9, 78.1, and 70.7%; the specificities were 85.4, 96.3, and 99.8%. Group 2: 61 patients, 83% female; age 17-73, median age 35. The POCT had 100% sensitivity and negative predictive value in detecting CD in group 2. Most patients preferred the POCT to venepuncture (90.4% vs. 2.8%). There was good inter-observer agreement on the POCT results with a Fleiss Kappa coefficient of 0.895. The POCT had comparable sensitivities to serology, and correctly identified all CD cases in a gluten sensitive cohort. However, its low specificity may increase unnecessary investigations. Despite its advantage of convenience and rapid results, it may not add significant value to case finding in an office-based setting.
Clinical application of qualitative assessment for breast masses in shear-wave elastography.
Gweon, Hye Mi; Youk, Ji Hyun; Son, Eun Ju; Kim, Jeong-Ah
2013-11-01
To evaluate the interobserver agreement and the diagnostic performance of various qualitative features in shear-wave elastography (SWE) for breast masses. A total of 153 breast lesions in 152 women who underwent B-mode ultrasound and SWE before biopsy were included. Qualitative analysis in SWE was performed using two different classifications: E values (Ecol; 6-point color score, Ehomo; homogeneity score and Esha; shape score) and a four-color pattern classification. Two radiologists reviewed five data sets: B-mode ultrasound, SWE, and combination of both for E values and four-color pattern. The BI-RADS categories were assessed B-mode and combined sets. Interobserver agreement was assessed using weighted κ statistics. Areas under the receiver operating characteristic curve (AUC), sensitivity, and specificity were analyzed. Interobserver agreement was substantial for Ecol (κ=0.79), Ehomo (κ=0.77) and four-color pattern (κ=0.64), and moderate for Esha (κ=0.56). Better-performing qualitative features were Ecol and four-color pattern (AUCs, 0.932 and 0.925) compared with Ehomo and Esha (AUCs, 0.857 and 0.864; P<0.05). The diagnostic performance of B-mode ultrasound (AUC, 0.950) was not significantly different from combined sets with E value and with four color pattern (AUCs, 0.962 and 0.954). When all qualitative values were negative, leading to downgrade the BI-RADS category, the specificity increased significantly from 16.5% to 56.1% (E value) and 57.0% (four-color pattern) (P<0.001) without improvement in sensitivity. The qualitative SWE features were highly reproducible and showed good diagnostic performance in suspicious breast masses. Adding qualitative SWE to B-mode ultrasound increased specificity in decision making for biopsy recommendation. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Yoon, Soon Ho; Jung, Julip; Hong, Helen; Park, Eun Ah; Lee, Chang Hyun; Lee, Youkyung; Jin, Kwang Nam; Choo, Ji Yung; Lee, Nyoung Keun
2014-01-01
Objective To evaluate the technical feasibility, performance, and interobserver agreement of a computer-aided classification (CAC) system for regional ventilation at two-phase xenon-enhanced CT in patients with chronic obstructive pulmonary disease (COPD). Materials and Methods Thirty-eight patients with COPD underwent two-phase xenon ventilation CT with resulting wash-in (WI) and wash-out (WO) xenon images. The regional ventilation in structural abnormalities was visually categorized into four patterns by consensus of two experienced radiologists who compared the xenon attenuation of structural abnormalities with that of adjacent normal parenchyma in the WI and WO images, and it served as the reference. Two series of image datasets of structural abnormalities were randomly extracted for optimization and validation. The proportion of agreement on a per-lesion basis and receiver operating characteristics on a per-pixel basis between CAC and reference were analyzed for optimization. Thereafter, six readers independently categorized the regional ventilation in structural abnormalities in the validation set without and with a CAC map. Interobserver agreement was also compared between assessments without and with CAC maps using multirater κ statistics. Results Computer-aided classification maps were successfully generated in 31 patients (81.5%). The proportion of agreement and the average area under the curve of optimized CAC maps were 94% (75/80) and 0.994, respectively. Multirater κ value was improved from moderate (κ = 0.59; 95% confidence interval [CI], 0.56-0.62) at the initial assessment to excellent (κ = 0.82; 95% CI, 0.79-0.85) with the CAC map. Conclusion Our proposed CAC system demonstrated the potential for regional ventilation pattern analysis and enhanced interobserver agreement on visual classification of regional ventilation. PMID:24843245
Yoon, Soon Ho; Goo, Jin Mo; Jung, Julip; Hong, Helen; Park, Eun Ah; Lee, Chang Hyun; Lee, Youkyung; Jin, Kwang Nam; Choo, Ji Yung; Lee, Nyoung Keun
2014-01-01
To evaluate the technical feasibility, performance, and interobserver agreement of a computer-aided classification (CAC) system for regional ventilation at two-phase xenon-enhanced CT in patients with chronic obstructive pulmonary disease (COPD). Thirty-eight patients with COPD underwent two-phase xenon ventilation CT with resulting wash-in (WI) and wash-out (WO) xenon images. The regional ventilation in structural abnormalities was visually categorized into four patterns by consensus of two experienced radiologists who compared the xenon attenuation of structural abnormalities with that of adjacent normal parenchyma in the WI and WO images, and it served as the reference. Two series of image datasets of structural abnormalities were randomly extracted for optimization and validation. The proportion of agreement on a per-lesion basis and receiver operating characteristics on a per-pixel basis between CAC and reference were analyzed for optimization. Thereafter, six readers independently categorized the regional ventilation in structural abnormalities in the validation set without and with a CAC map. Interobserver agreement was also compared between assessments without and with CAC maps using multirater κ statistics. Computer-aided classification maps were successfully generated in 31 patients (81.5%). The proportion of agreement and the average area under the curve of optimized CAC maps were 94% (75/80) and 0.994, respectively. Multirater κ value was improved from moderate (κ = 0.59; 95% confidence interval [CI], 0.56-0.62) at the initial assessment to excellent (κ = 0.82; 95% CI, 0.79-0.85) with the CAC map. Our proposed CAC system demonstrated the potential for regional ventilation pattern analysis and enhanced interobserver agreement on visual classification of regional ventilation.
Frémont, P.; Labrecque, M.; Légaré, F.; Baillargeon, L.; Misson, L.
2001-01-01
OBJECTIVE: To develop and test the reliability of a tool for rating websites that provide information on evidence-based medicine. DESIGN: For each site, 60% of the score was given for content (eight criteria) and 40% was given for organization and presentation (nine criteria). Five of 10 randomly selected sites met the inclusion criteria and were used by three observers to test the accuracy of the tool. Each site was rated twice by each observer, with a 3-week interval between ratings. SETTING: Laval University, Quebec city. PARTICIPANTS: Three observers. MAIN OUTCOME MEASURES: The intraclass correlation coefficient (ICC) was used to rate the reliability of the tool. RESULTS: Average overall scores for the five sites were 40%, 79%, 83%, 88%, and 89%. All three observers rated the same two sites in fourth and fifth place and gave the top three ratings to the other three sites. The overall rating of the five sites by the three observers yielded an ICC of 0.93 to 0.97. An ICC of 0.87 was obtained for the two overall ratings conducted 3 weeks apart. CONCLUSION: This new tool offers excellent intraobserver and interobserver measurement reliability and is an excellent means of distinguishing between medical websites of varying quality. For best results, we recommend that the tool be used simultaneously by two observers and that differences be resolved by consensus. PMID:11768925
Cunningham, Devin P; Mostafa, Ayman A; Gordan-Evans, Wanda J; Boudrieau, Randy J; Griffon, Dominique J
2017-08-14
We recently reported that a conformation score derived from the tibial plateau angle (TPA) and the femoral anteversion angle (FAA), best discriminates limbs predisposed to, or affected by cranial cruciate ligament disease (CCLD), from those that are at low risk for CCLD. The specificity and sensitivity of this score were high enough to support further investigations toward its use for large-scale screening of dogs by veterinarians. The next step, which is the objective of the current study, is to determine inter-observer variability of that CCLD score in a large population of Labrador Retrievers. A total of 167 Labradors were enrolled in this cross-sectional study. Limbs of normal dogs over 6 years of age with no history of CCLD were considered at low risk for CCLD. Limbs of dogs with CCLD were considered at high risk for CCLD. Tibial plateau and femoral anteversion angles were measured independently by two investigators to calculate a CCLD score for each limb. Kappa statistics were used to determine the extent of agreement between investigators. Pearson's correlation and intraclass coefficients were calculated to evaluate the correlation between investigators and the relative contribution of each measurement to the variability of the CCLD score. The correlation between CCLD scores calculated by investigators was good (correlation coefficient = 0.68 p < 0.0001). However, interobserver agreement with regards to the predicted status of limbs was fair (kappa value = 0.28), with 37% of limbs being assigned divergent classifications. Variations in CCLD scores correlated best with those of TPA, which was the least consistent parameter between investigators. Absolute interobserver differences were two times greater for FAAs (4.19° ± 3.15) than TPAs (2.23° ± 1.91). The reproducibility of the CCLD score between investigators is fair, justifying caution when interpreting individual scores. Future studies should focus on improving the reproducibility of TPA and FAA measurements, as strategies to improve the agreement between CCLD scores.
The use of ultrasound for postoperative monitoring of cerebral bypass grafts: A technical report.
Morton, Ryan P; Abecassis, Isaac Joshua; Moore, Anne E; Kelly, Cory M; Levitt, Michael R; Kim, Louis J; Sekhar, Laligam N
2017-06-01
Duplex ultrasound and transcranial Doppler are valuable tools for post-operative monitoring of extracranial-intracranial cerebral bypass grafts. Here we describe our technique for the evaluation of both high-flow and low-flow cerebral bypass grafts over a nine year period. 186 bypass grafts were studied daily during the inpatient period between Jan 2005 and Dec 2014 after surgery for various cerebrovascular pathologies. There was a technical success rate of 97%. Duplex ultrasonographic flow measurements had excellent interobserver reliability with an intraclass correlation coefficient (ICC) of 0.89 (p=0.009). Technical nuances are highlighted and a brief discussion of pathology is undertaken. Copyright © 2017 Elsevier Ltd. All rights reserved.
Schellhaas, Barbara; Pfeifer, Lukas; Kielisch, Christian; Goertz, Ruediger Stephan; Neurath, Markus F; Strobel, Deike
2018-06-07
This pilot study aimed at assessing interobserver agreement with two contrast-enhanced ultrasound (CEUS) algorithms for the diagnosis of hepatocellular carcinoma (HCC) in high-risk patients. Focal liver lesions in 55 high-risk patients were assessed independently by three blinded observers with two standardized CEUS algorithms: ESCULAP (Erlanger Synopsis of Contrast-Enhanced Ultrasound for Liver Lesion Assessment in Patients at risk) and ACR-CEUS-LI-RADSv.2016 (American College of Radiology CEUS-Liver Imaging Reporting and Data System). Lesions were categorized according to size and ultrasound contrast enhancement in the arterial, portal-venous and late phase. Interobserver agreement for assessment of enhancement pattern and categorization was compared between both CEUS algorithms. Additionally, diagnostic accuracy for the definitive diagnosis of HCC was compared. Histology and/or CE-MRI and follow-up served as reference standards. 55 patients were included in the study (male/female, 44/ 11; mean age: 65.9 years). 90.9 % had cirrhosis. Histological findings were available in 39/55 lesions (70.9 %). Reference standard of the 55 lesions revealed 48 HCCs, 2 intrahepatic cholangiocellular carcinomas (ICCs), and 5 non-HCC-non-ICC lesions. Interobserver agreement was moderate to substantial for arterial phase hyperenhancement (ĸ = 0.53 - 0.67), and fair to moderate for contrast washout in the portal-venous or late phase (ĸ = 0.33 - 0.53). Concerning the CEUS-based algorithms, the interreader agreement was substantial for the ESCULAP category (ĸ = 0.64 - 0.68) and fair for the CEUS-LI-RADS ® category (ĸ = 0.3 - 0.39). Disagreement between observers was mostly due to different perception of washout. Interobserver agreement is better for ESCULAP than for CEUS-LI-RADS ® . This is mostly due to the fact that perception of contrast washout varies between different observers. However, interobserver agreement is good for arterial phase hyperenhancement, which is the key diagnostic feature for the diagnosis of HCC with CEUS in the cirrhotic liver. © Georg Thieme Verlag KG Stuttgart · New York.
Dibble, Elizabeth H; Lourenco, Ana P; Baird, Grayson L; Ward, Robert C; Maynard, A Stanley; Mainiero, Martha B
2018-01-01
To compare interobserver variability (IOV), reader confidence, and sensitivity/specificity in detecting architectural distortion (AD) on digital mammography (DM) versus digital breast tomosynthesis (DBT). This IRB-approved, HIPAA-compliant reader study used a counterbalanced experimental design. We searched radiology reports for AD on screening mammograms from 5 March 2012-27 November 2013. Cases were consensus-reviewed. Controls were selected from demographically matched non-AD examinations. Two radiologists and two fellows blinded to outcomes independently reviewed images from two patient groups in two sessions. Readers recorded presence/absence of AD and confidence level. Agreement and differences in confidence and sensitivity/specificity between DBT versus DM and attendings versus fellows were examined using weighted Kappa and generalised mixed modeling, respectively. There were 59 AD patients and 59 controls for 1,888 observations (59 × 2 (cases and controls) × 2 breasts × 2 imaging techniques × 4 readers). For all readers, agreement improved with DBT versus DM (0.61 vs. 0.37). Confidence was higher with DBT, p = .001. DBT achieved higher sensitivity (.59 vs. .32), p < .001; specificity remained high (>.90). DBT achieved higher positive likelihood ratio values, smaller negative likelihood ratio values, and larger ROC values. DBT decreases IOV, increases confidence, and improves sensitivity while maintaining high specificity in detecting AD. • Digital breast tomosynthesis decreases interobserver variability in the detection of architectural distortion. • Digital breast tomosynthesis increases reader confidence in the detection of architectural distortion. • Digital breast tomosynthesis improves sensitivity in the detection of architectural distortion.
Mokhles, Palwasha; van den Bosch, Annemien E; Vletter-McGhie, Jackie S; Van Domburg, Ron T; Ruys, Titia P E; Kauer, Floris; Geleijnse, Marcel L; Roos-Hesselink, Jolien W
2013-09-01
The twisting motion of the heart has an important role in the function of the left ventricle. Speckle tracking echocardiography is able to quantify left ventricular (LV) rotation and twist. So far this new technique has not been used in congenital heart disease patients. The aim of our study was to investigate the feasibility and the intra- and inter-observer reproducibility of LV rotation parameters in adult patients with congenital heart disease. The study population consisted of 66 consecutive patients seen in the outpatient clinic (67% male, mean age 31 ± 7.7 years, NYHA class 1 ± 0.3) with a variety of congenital heart disease. First, feasibility was assessed in all patients. Intra- and inter-observer reproducibility was assessed for the patients in which speckle tracking echocardiography was feasible. Adequate image quality, for performing speckle echocardiography, was found in 80% of patients. The bias for the intra-observer reproducibility of the LV twist was 0.0°, with 95% limits of agreement of -2.5° and 2.5° and for interobserver reproducibility the bias was 0.0°, with 95% limits of agreement of -3.0° and 3.0°. Intra- and inter-observer measurements showed a strong correlation (0.86 and 0.79, respectively). Also a good repeatability was seen. The mean time to complete full analysis per subject for the first and second measurement was 9 and 5 minutes, respectively. Speckle tracking echocardiography is feasible in 80% of adult patients with congenital heart disease and shows excellent intra- and inter-observer reproducibility. © 2013, Wiley Periodicals, Inc.
Intra- and interobserver agreement for fetal cerebral measurements in 3D-ultrasonography.
Albers, Maria E W A; Buisman, Erato T I A; Kahn, René S; Franx, Arie; Onland-Moret, N Charlotte; de Heus, Roel
2018-04-10
The aim of this study is to evaluate intra- and interobserver agreement for measurement of intracranial, cerebellar, and thalamic volume with the Virtual Organ Computer-aided AnaLysis (VOCAL) technique in three-dimensional ultrasound images, in comparison to two-dimensional measurements of these brain structures. Three-dimensional ultrasound images of the brains of 80 fetuses at 20-24 weeks' gestational age were obtained from YOUth, a Dutch prospective cohort study. Two observers performed offline measurement of the occipitofrontal diameter, intracranial volume, transcerebellar diameter, cerebellar volume, and thalamic width, area, and volume, independently. VOCAL was used for calculation of the volumes. The two-way random, single measures intraclass correlation coefficient (ICC) was used for analysis of agreement and Bland-Altman plots were configured. Intra- and interobserver agreement was almost perfect for occipitofrontal diameter (intra ICC 0.88, 95% CI 0.82-0.92; inter ICC 0.91, 95% CI 0.85-0.94), intracranial volume (intra ICC 0.96, 95% CI 0.91-0.98; inter ICC 0.97, 95% CI 0.96-0.98) and transcerebellar diameter (intra ICC 0.91, 95% CI 0.86-0.94; inter ICC 0.86, 95% CI 0.78-0.910). For cerebellar volume, the intraobserver agreement was almost perfect (0.85, 95% CI 0.76-0.90), whereas the interobserver agreement was substantial (0.75, 95% CI 0.44-0.88). Agreement was only moderate for thalamic measurements. Bland-Altman plots for the volume measurements are normally distributed with acceptable mean differences and 95% limits of agreement. The intra- and interobserver agreement of the measurement of intracranial and cerebellar volume with VOCAL was almost perfect. These measurements are therefore reliable, and can be used to investigate fetal brain development. Thalamic measurements are not reliable enough. © 2018 Wiley Periodicals, Inc.
Tomizawa, Yutaka; Iyer, Prasad G; Wongkeesong, Louis M; Buttar, Navtej S; Lutzke, Lori S; Wu, Tsung-Teh; Wang, Kenneth K
2013-01-01
AIM: To investigate a classification of endocytoscopy (ECS) images in Barrett’s esophagus (BE) and evaluate its diagnostic performance and interobserver variability. METHODS: ECS was applied to surveillance endoscopic mucosal resection (EMR) specimens of BE ex-vivo. The mucosal surface of specimen was stained with 1% methylene blue and surveyed with a catheter-type endocytoscope. We selected still images that were most representative of the endoscopically suspect lesion and matched with the final histopathological diagnosis to accomplish accurate correlation. The diagnostic performance and inter-observer variability of the new classification scheme were assessed in a blinded fashion by physicians with expertise in both BE and ECS and inexperienced physicians with no prior exposure to ECS. RESULTS: Three staff physicians and 22 gastroenterology fellows classified eight randomly assigned unknown still ECS pictures (two images per each classification) into one of four histopathologic categories as follows: (1) BEC1-squamous epithelium; (2) BEC2-BE without dysplasia; (3) BEC3-BE with dysplasia; and (4) BEC4-esophageal adenocarcinoma (EAC) in BE. Accuracy of diagnosis in staff physicians and clinical fellows were, respectively, 100% and 99.4% for BEC1, 95.8% and 83.0% for BEC2, 91.7% and 83.0% for BEC3, and 95.8% and 98.3% for BEC4. Interobserver agreement of the faculty physicians and fellows in classifying each category were 0.932 and 0.897, respectively. CONCLUSION: This is the first study to investigate classification system of ECS in BE. This ex-vivo pilot study demonstrated acceptable diagnostic accuracy and excellent interobserver agreement. PMID:24379583
McGivney, C L; Sweeney, J; David, F; O'Leary, J M; Hill, E W; Katz, L M
2017-07-01
Previous studies support good intra- and interobserver agreements for endoscopic evaluation of various upper respiratory tract (URT) diseases in horses. However, these studies mainly assessed resting endoscopic examination videos and/or focussed on a single URT abnormality. To estimate intra- and interobserver agreement for identification and grading of all URT abnormalities from resting and overground endoscopy (OGE) videos of Thoroughbreds. Blinded, fully crossed design. Resting and OGE URT videos for n = 43 Thoroughbreds were retrospectively chosen based on identification of common URT disorders. The videos were randomly evaluated in duplicate by 4 raters blinded to all information including prior URT disorder(s) diagnosis. Abnormalities were graded using well-described ordinal scales. Intra- and interobserver agreements were estimated using Cohen's weighted κ and Krippendorff's α, respectively. Intraobserver agreement was perfect/nearly perfect for arytenoid symmetry at exercise, epiglottic entrapment and epiglottic retroversion, substantial for arytenoid asymmetry at rest, palatal dysfunction (PD), medial deviation of the aryepiglottic folds (MDAF), pharyngeal mucus and epiglottic grade at exercise and moderate for vocal fold collapse (VFC), ventromedial luxation of the apex of the corniculate process of the arytenoid (VLAC), nasopharyngeal collapse (NPC) and epiglottic grade at rest. Interobserver agreement was substantial for arytenoid symmetry at exercise and PD and moderate for arytenoid asymmetry at rest, MDAF, VLAC and epiglottic entrapment. It was only fair for VFC, epiglottic grade at exercise, epiglottic retroversion, pharyngeal mucus and NPC and poor for epiglottic grade at rest. Sample size was insufficient to allow assessment of the effect of one abnormality on the grading of another abnormality. Observers were consistent in grading URT disorders. However, significant disparity in grading existed between observers for some conditions affecting reliability. © 2016 EVJ Ltd.
Detection of MET amplification in gastroesophageal tumor specimens using IQFISH.
Jørgensen, Jan Trøst; Nielsen, Karsten Bork; Mollerup, Jens; Jepsen, Anna; Go, Ning
2017-12-01
The gene mesenchymal epithelial transition factor ( MET ) is a proto-oncogene that encodes a transmembrane receptor with intrinsic tyrosine kinase activity known as Met or cMet. MET is found to be amplified in several human cancers including gastroesophageal cancer. Here we report the MET amplification prevalence data from 159 consecutive tumor specimens from patients with gastric (G), gastroesophageal junction (GEJ) and esophageal (E) adenocarcinoma, using a novel fluorescence in situ hybridization (FISH) assay, MET /CEN-7 IQFISH Probe Mix [an investigational use only (IUO) assay]. MET amplification was defined as a MET /CEN-7 ratio ≥2.0. Furthermore, the link between the MET signal distribution and amplification status was investigated. The prevalence of MET amplification was found to be 6.9%. The FISH assay demonstrated a high inter-observer reproducibility. The inter-observer results showed a 100% overall agreement with respect to the MET status (amplified/non-amplified). The inter-observer CV was estimated to 11.8% (95% CI: 10.2-13.4). For the signal distribution, the inter-observer agreement was reported to be 98.7%. We also report an association of MET amplification and a unique signal distribution pattern in the G/GEJ/E tumor specimens. We found that the prevalence of MET amplification was markedly higher in tumors specimens with a heterogeneous (66.7%) versus homogeneous (2.0%) signal distribution. Furthermore, specimens with a heterogeneous signal distribution had a statically significantly higher median MET /CEN-7 ratio (2.35 versus 1.04; P<0.0001). The novel FISH assay showed a high inter-observer reproducibility both with respect to amplification status and signal distribution. Based on the finding in the study it is suggested that MET amplification mainly is associated with tumor cells that is represented by a heterogonous growth pattern.
Relationship between Two Types of Coil Packing Densities Relative to Aneurysm Size.
Park, Keun Young; Kim, Byung Moon; Ihm, Eun Hyun; Baek, Jang Hyun; Kim, Dong Joon; Kim, Dong Ik; Huh, Seung Kon; Lee, Jae Whan
2015-01-01
Coil packing density (PD) can be calculated via a formula (PDF ) or software (PDS ). Two types of PD can be different from each other for same aneurysm. This study aimed to evaluate the interobserver agreement and relationships between the 2 types of PD relative to aneurysm size. Consecutive 420 saccular aneurysms were treated with coiling. PD (PDF , [coil volume]/[volume calculated by formula] and PDS, [coil volume]/[volume measured by software]) was calculated and prospectively recorded. Interobserver agreement was evaluated between PDF and PDS . Additionally, the relationships between PDF and PDS relative to aneurysm size were subsequently analyzed. Interobserver agreement for PDF and PDS was excellent (Intraclass correlation coefficient, PDF ; 0.967 and PDS ; 0.998). The ratio of PDF and PDS was greater for smaller aneurysms and converged toward 1.0 as the maximum dimension (DM ) of aneurysm increased. Compared with PDS , PDF was overestimated by a mean of 28% for DM < 5 mm, by 17% for 5 mm ≤ DM < 10 mm, and by 9% for DM ≥ 10 mm (P < 0.01). Interobserver agreement for PDF and PDS was excellent. However, PDF was overestimated in smaller aneurysms and converged to PDS as aneurysm size increased. Copyright © 2014 by the American Society of Neuroimaging.
Palm, Peter; Josephson, Malin; Mathiassen, Svend Erik; Kjellberg, Katarina
2016-06-01
We evaluated the intra- and inter-observer reliability and criterion validity of an observation protocol, developed in an iterative process involving practicing ergonomists, for assessment of working technique during cash register work for the purpose of preventing upper extremity symptoms. Two ergonomists independently assessed 17 15-min videos of cash register work on two occasions each, as a basis for examining reliability. Criterion validity was assessed by comparing these assessments with meticulous video-based analyses by researchers. Intra-observer reliability was acceptable (i.e. proportional agreement >0.7 and kappa >0.4) for 10/10 questions. Inter-observer reliability was acceptable for only 3/10 questions. An acceptable inter-observer reliability combined with an acceptable criterion validity was obtained only for one working technique aspect, 'Quality of movements'. Thus, major elements of the cashiers' working technique could not be assessed with an acceptable accuracy from short periods of observations by one observer, such as often desired by practitioners. Practitioner Summary: We examined an observation protocol for assessing working technique in cash register work. It was feasible in use, but inter-observer reliability and criterion validity were generally not acceptable when working technique aspects were assessed from short periods of work. We recommend the protocol to be used for educational purposes only.
Dwyer, Tim; Whelan, Daniel B; Khoshbin, Amir; Wasserstein, David; Dold, Andrew; Chahal, Jaskarndip; Nauth, Aaron; Murnaghan, M Lucas; Ogilvie-Harris, Darrell J; Theodoropoulos, John S
2015-04-01
The objective of this study was to establish the intra- and inter-observer reliability of hamstring graft measurement using cylindrical sizing tubes. Hamstring tendons (gracilis and semitendinosus) were harvested from ten cadavers by a single surgeon and whip stitched together to create ten 4-strand hamstring grafts. Ten sports medicine surgeons and fellows sized each graft independently using either hollow cylindrical sizers or block sizers in 0.5-mm increments—the sizing technique used was applied consistently to each graft. Surgeons moved sequentially from graft to graft and measured each hamstring graft twice. Surgeons were asked to state the measured proximal (femoral) and distal (tibial) diameter of each graft, as well as the diameter of the tibial and femoral tunnels that they would drill if performing an anterior cruciate ligament (ACL) reconstruction using that graft. Reliability was established using intra-class correlation coefficients. Overall, both the inter-observer and intra-observer agreement were >0.9, demonstrating excellent reliability. The inter-observer reliability for drill sizes was also excellent (>0.9). Excellent correlation was seen between cylindrical sizing, and drill sizes (>0.9). Sizing of hamstring grafts by multiple surgeons demonstrated excellent intra-observer and intra-observer reliability, potentially validating clinical studies exploring ACL reconstruction outcomes by hamstring graft diameter when standard techniques are used. III.
Hyun, Yil Sik; Bae, Joong Ho; Park, Hye Sun; Eun, Chang Soo
2013-01-01
Accurate diagnosis of gastric intestinal metaplasia is important; however, conventional endoscopy is known to be an unreliable modality for diagnosing gastric intestinal metaplasia (IM). The aims of the study were to evaluate the interobserver variation in diagnosing IM by high-definition (HD) endoscopy and the diagnostic accuracy of this modality for IM among experienced and inexperienced endoscopists. Selected 50 cases, taken with HD endoscopy, were sent for a diagnostic inquiry of gastric IM through visual inspection to five experienced and five inexperienced endoscopists. The interobserver agreement between endoscopists was evaluated to verify the diagnostic reliability of HD endoscopy in diagnosing IM, and the diagnostic accuracy, sensitivity, and specificity were evaluated for validity of HD endoscopy in diagnosing IM. Interobserver agreement among the experienced endoscopists was "poor" (κ = 0.38) and it was also "poor" (κ = 0.33) among the inexperienced endoscopists. The diagnostic accuracy of the experienced endoscopists was superior to that of the inexperienced endoscopists (P = 0.003). Since diagnosis through visual inspection is unreliable in the diagnosis of IM, all suspicious areas for gastric IM should be considered to be biopsied. Furthermore, endoscopic experience and education are needed to raise the diagnostic accuracy of gastric IM. PMID:23678267
Costantini, Massimo; Sciallero, Stefania; Giannini, Augusto; Gatteschi, Beatrice; Rinaldi, Paolo; Lanzanova, Giuseppe; Bonelli, Luigina; Casetti, Tino; Bertinelli, Elisabetta; Giuliani, Orietta; Castiglione, Guido; Mantellini, Paola; Naldoni, Carlo; Bruzzi, Paolo
2003-03-01
Current clinical practice guidelines for patients with colorectal polyps are mainly based on the histologic characteristics of their lesions. However, interobserver variability in the assessment of specific polyp characteristics was evaluated in very few studies. The purpose of this study was to evaluate the interobserver agreement of four pathologists in the diagnosis of histologic type of colorectal polyps and in the degree of dysplasia and of infiltrating carcinoma in adenomas. A stratified random sample of 100 polyps was obtained from the 4,889 polyps resected within the Multicentre Adenoma Colorectal Study (SMAC), and the slides were blindly reviewed by the four pathologists. Agreement was analyzed using kappa statistics. A median kappa of 0.89 (range 0.79-1.0) was estimated for the interobserver agreement for the diagnosis of hyperplastic polyp vs. adenoma. The agreement in the diagnosis of tubular, tubulovillous, and villous type, was given by median kappa values of 0.50, 0.15, and 0.36, respectively. The median kappa for the diagnosis of infiltrating carcinoma was 0.78 (range 0.73-0.84). Agreement on diagnosis of adenoma histologic subtypes, degrees of dysplasia, or infiltrating carcinoma in adenoma was moderate. A simpler classifications might help to better identify patients at different risk of colorectal cancer.
Interobserver variability of sonography for prediction of placenta accreta.
Bowman, Zachary S; Eller, Alexandra G; Kennedy, Anne M; Richards, Douglas S; Winter, Thomas C; Woodward, Paula J; Silver, Robert M
2014-12-01
The sensitivity of sonography to predict accreta has been reported as higher than 90%. However, most studies are from single expert investigators. Our objective was to analyze interobserver variability of sonography for prediction of placenta accreta. Patients with previa with and without accreta were ascertained, and images with placental views were collected, deidentified, and placed in random sequence. Three radiologists and 3 maternal-fetal medicine specialists interpreted each study for the presence of accreta and specific findings reported to be associated with its diagnosis. Investigator-specific sensitivity, specificity, and accuracy were calculated. κ statistics were used to assess variability between individuals and types of investigators. A total of 229 sonographic studies from 55 patients with accreta and 56 control patients were examined. Accuracy ranged from 55.9% to 76.4%. Of imaging studies yielding diagnoses, sensitivity ranged from 53.4% to 74.4%, and specificity ranged from 70.8% to 94.8%. Overall interobserver agreement was moderate (mean κ ± SD = 0.47 ± 0.12). κ values between pairs of investigators ranged from 0.32 (fair agreement) to 0.73 (substantial agreement). Average individual agreement ranged from fair (κ = 0.35) to moderate (κ = 0.53). Blinded from clinical data, sonography has significant interobserver variability for the diagnosis of placenta accreta. © 2013 by the American Institute of Ultrasound in Medicine.
Hyun, Yil Sik; Han, Dong Soo; Bae, Joong Ho; Park, Hye Sun; Eun, Chang Soo
2013-05-01
Accurate diagnosis of gastric intestinal metaplasia is important; however, conventional endoscopy is known to be an unreliable modality for diagnosing gastric intestinal metaplasia (IM). The aims of the study were to evaluate the interobserver variation in diagnosing IM by high-definition (HD) endoscopy and the diagnostic accuracy of this modality for IM among experienced and inexperienced endoscopists. Selected 50 cases, taken with HD endoscopy, were sent for a diagnostic inquiry of gastric IM through visual inspection to five experienced and five inexperienced endoscopists. The interobserver agreement between endoscopists was evaluated to verify the diagnostic reliability of HD endoscopy in diagnosing IM, and the diagnostic accuracy, sensitivity, and specificity were evaluated for validity of HD endoscopy in diagnosing IM. Interobserver agreement among the experienced endoscopists was "poor" (κ = 0.38) and it was also "poor" (κ = 0.33) among the inexperienced endoscopists. The diagnostic accuracy of the experienced endoscopists was superior to that of the inexperienced endoscopists (P = 0.003). Since diagnosis through visual inspection is unreliable in the diagnosis of IM, all suspicious areas for gastric IM should be considered to be biopsied. Furthermore, endoscopic experience and education are needed to raise the diagnostic accuracy of gastric IM.
Mochizuki, Yuta; Kaneko, Takao; Kawahara, Keisuke; Toyoda, Shinya; Kono, Norihiko; Hada, Masaru; Ikegami, Hiroyasu; Musha, Yoshiro
2017-11-20
The quadrant method was described by Bernard et al. and it has been widely used for postoperative evaluation of anterior cruciate ligament (ACL) reconstruction. The purpose of this research is to further develop the quadrant method measuring four points, which we named four-point quadrant method, and to compare with the quadrant method. Three-dimensional computed tomography (3D-CT) analyses were performed in 25 patients who underwent double-bundle ACL reconstruction using the outside-in technique. The four points in this study's quadrant method were defined as point1-highest, point2-deepest, point3-lowest, and point4-shallowest, in femoral tunnel position. Value of depth and height in each point was measured. Antero-medial (AM) tunnel is (depth1, height2) and postero-lateral (PL) tunnel is (depth3, height4) in this four-point quadrant method. The 3D-CT images were evaluated independently by 2 orthopaedic surgeons. A second measurement was performed by both observers after a 4-week interval. Intra- and inter-observer reliability was calculated by means of intra-class correlation coefficient (ICC). Also, the accuracy of the method was evaluated against the quadrant method. Intra-observer reliability was almost perfect for both AM and PL tunnel (ICC > 0.81). Inter-observer reliability of AM tunnel was substantial (ICC > 0.61) and that of PL tunnel was almost perfect (ICC > 0.81). The AM tunnel position was 0.13% deep, 0.58% high and PL tunnel position was 0.01% shallow, 0.13% low compared to quadrant method. The four-point quadrant method was found to have high intra- and inter-observer reliability and accuracy. This method can evaluate the tunnel position regardless of the shape and morphology of the bone tunnel aperture for use of comparison and can provide measurement that can be compared with various reconstruction methods. The four-point quadrant method of this study is considered to have clinical relevance in that it is a detailed and accurate tool for evaluating femoral tunnel position after ACL reconstruction. Case series, Level IV.
Non-enhanced MR imaging of cerebral aneurysms: 7 Tesla versus 1.5 Tesla.
Wrede, Karsten H; Dammann, Philipp; Mönninghoff, Christoph; Johst, Sören; Maderwald, Stefan; Sandalcioglu, I Erol; Müller, Oliver; Özkan, Neriman; Ladd, Mark E; Forsting, Michael; Schlamann, Marc U; Sure, Ulrich; Umutlu, Lale
2014-01-01
To prospectively evaluate 7 Tesla time-of-flight (TOF) magnetic resonance angiography (MRA) in comparison to 1.5 Tesla TOF MRA and 7 Tesla non-contrast enhanced magnetization-prepared rapid acquisition gradient-echo (MPRAGE) for delineation of unruptured intracranial aneurysms (UIA). Sixteen neurosurgical patients (male n = 5, female n = 11) with single or multiple UIA were enrolled in this trial. All patients were accordingly examined at 7 Tesla and 1.5 Tesla MRI utilizing dedicated head coils. The following sequences were obtained: 7 Tesla TOF MRA, 1.5 Tesla TOF MRA and 7 Tesla non-contrast enhanced MPRAGE. Image analysis was performed by two radiologists with regard to delineation of aneurysm features (dome, neck, parent vessel), presence of artifacts, vessel-tissue-contrast and overall image quality. Interobserver accordance and intermethod comparisons were calculated by kappa coefficient and Lin's concordance correlation coefficient. A total of 20 intracranial aneurysms were detected in 16 patients, with two patients showing multiple aneurysms (n = 2, n = 4). Out of 20 intracranial aneurysms, 14 aneurysms were located in the anterior circulation and 6 aneurysms in the posterior circulation. 7 Tesla MPRAGE imaging was superior over 1.5 and 7 Tesla TOF MRA in the assessment of all considered aneurysm and image quality features (e.g. image quality: mean MPRAGE7T: 5.0; mean TOF7T: 4.3; mean TOF1.5T: 4.3). Ratings for 7 Tesla TOF MRA were equal or higher over 1.5 Tesla TOF MRA for all assessed features except for artifact delineation (mean TOF7T: 4.3; mean TOF1.5T 4.4). Interobserver accordance was good to excellent for most ratings. 7 Tesla MPRAGE imaging demonstrated its superiority in the detection and assessment of UIA as well as overall imaging features, offering excellent interobserver accordance and highest scores for all ratings. Hence, it may bear the potential to serve as a high-quality diagnostic tool for pretherapeutic assessment and follow-up of untreated UIA.
Non-Enhanced MR Imaging of Cerebral Aneurysms: 7 Tesla versus 1.5 Tesla
Wrede, Karsten H.; Dammann, Philipp; Mönninghoff, Christoph; Johst, Sören; Maderwald, Stefan; Sandalcioglu, I. Erol; Müller, Oliver; Özkan, Neriman; Ladd, Mark E.; Forsting, Michael; Schlamann, Marc U.; Sure, Ulrich; Umutlu, Lale
2014-01-01
Purpose To prospectively evaluate 7 Tesla time-of-flight (TOF) magnetic resonance angiography (MRA) in comparison to 1.5 Tesla TOF MRA and 7 Tesla non-contrast enhanced magnetization-prepared rapid acquisition gradient-echo (MPRAGE) for delineation of unruptured intracranial aneurysms (UIA). Material and Methods Sixteen neurosurgical patients (male n = 5, female n = 11) with single or multiple UIA were enrolled in this trial. All patients were accordingly examined at 7 Tesla and 1.5 Tesla MRI utilizing dedicated head coils. The following sequences were obtained: 7 Tesla TOF MRA, 1.5 Tesla TOF MRA and 7 Tesla non-contrast enhanced MPRAGE. Image analysis was performed by two radiologists with regard to delineation of aneurysm features (dome, neck, parent vessel), presence of artifacts, vessel-tissue-contrast and overall image quality. Interobserver accordance and intermethod comparisons were calculated by kappa coefficient and Lin's concordance correlation coefficient. Results A total of 20 intracranial aneurysms were detected in 16 patients, with two patients showing multiple aneurysms (n = 2, n = 4). Out of 20 intracranial aneurysms, 14 aneurysms were located in the anterior circulation and 6 aneurysms in the posterior circulation. 7 Tesla MPRAGE imaging was superior over 1.5 and 7 Tesla TOF MRA in the assessment of all considered aneurysm and image quality features (e.g. image quality: mean MPRAGE7T: 5.0; mean TOF7T: 4.3; mean TOF1.5T: 4.3). Ratings for 7 Tesla TOF MRA were equal or higher over 1.5 Tesla TOF MRA for all assessed features except for artifact delineation (mean TOF7T: 4.3; mean TOF1.5T 4.4). Interobserver accordance was good to excellent for most ratings. Conclusion 7 Tesla MPRAGE imaging demonstrated its superiority in the detection and assessment of UIA as well as overall imaging features, offering excellent interobserver accordance and highest scores for all ratings. Hence, it may bear the potential to serve as a high-quality diagnostic tool for pretherapeutic assessment and follow-up of untreated UIA. PMID:24400100
Taylor, Helena O; Morrison, Clinton S; Linden, Olivia; Phillips, Benjamin; Chang, Johnny; Byrne, Margaret E; Sullivan, Stephen R; Forrest, Christopher R
2014-01-01
Although symmetry is hailed as a fundamental goal of aesthetic and reconstructive surgery, our tools for measuring this outcome have been limited and subjective. With the advent of three-dimensional photogrammetry, surface geometry can be captured, manipulated, and measured quantitatively. Until now, few normative data existed with regard to facial surface symmetry. Here, we present a method for reproducibly calculating overall facial symmetry and present normative data on 100 subjects. We enrolled 100 volunteers who underwent three-dimensional photogrammetry of their faces in repose. We collected demographic data on age, sex, and race and subjectively scored facial symmetry. We calculated the root mean square deviation (RMSD) between the native and reflected faces, reflecting about a plane of maximum symmetry. We analyzed the interobserver reliability of the subjective assessment of facial asymmetry and the quantitative measurements and compared the subjective and objective values. We also classified areas of greatest asymmetry as localized to the upper, middle, or lower facial thirds. This cluster of normative data was compared with a group of patients with subtle but increasing amounts of facial asymmetry. We imaged 100 subjects by three-dimensional photogrammetry. There was a poor interobserver correlation between subjective assessments of asymmetry (r = 0.56). There was a high interobserver reliability for quantitative measurements of facial symmetry RMSD calculations (r = 0.91-0.95). The mean RMSD for this normative population was found to be 0.80 ± 0.24 mm. Areas of greatest asymmetry were distributed as follows: 10% upper facial third, 49% central facial third, and 41% lower facial third. Precise measurement permitted discrimination of subtle facial asymmetry within this normative group and distinguished norms from patients with subtle facial asymmetry, with placement of RMSDs along an asymmetry ruler. Facial surface symmetry, which is poorly assessed subjectively, can be easily and reproducibly measured using three-dimensional photogrammetry. The RMSD for facial asymmetry of healthy volunteers clusters at approximately 0.80 ± 0.24 mm. Patients with facial asymmetry due to a pathologic process can be differentiated from normative facial asymmetry based on their RMSDs.
Ha, Richard; Mema, Eralda; Guo, Xiaotao; Mango, Victoria; Desperito, Elise; Ha, Jason; Wynn, Ralph; Zhao, Binsheng
2016-04-01
The amount of fibroglandular tissue (FGT) has been linked to breast cancer risk based on mammographic density studies. Currently, the qualitative assessment of FGT on mammogram (MG) and magnetic resonance imaging (MRI) is prone to intra and inter-observer variability. The purpose of this study is to develop an objective quantitative FGT measurement tool for breast MRI that could provide significant clinical value. An IRB approved study was performed. Sixty breast MRI cases with qualitative assessment of mammographic breast density and MRI FGT were randomly selected for quantitative analysis from routine breast MRIs performed at our institution from 1/2013 to 12/2014. Blinded to the qualitative data, whole breast and FGT contours were delineated on T1-weighted pre contrast sagittal images using an in-house, proprietary segmentation algorithm which combines the region-based active contours and a level set approach. FGT (%) was calculated by: [segmented volume of FGT (mm(3))/(segmented volume of whole breast (mm(3))] ×100. Statistical correlation analysis was performed between quantified FGT (%) on MRI and qualitative assessments of mammographic breast density and MRI FGT. There was a significant positive correlation between quantitative MRI FGT assessment and qualitative MRI FGT (r=0.809, n=60, P<0.001) and mammographic density assessment (r=0.805, n=60, P<0.001). There was a significant correlation between qualitative MRI FGT assessment and mammographic density assessment (r=0.725, n=60, P<0.001). The four qualitative assessment categories of FGT correlated with the calculated mean quantitative FGT (%) of 4.61% (95% CI, 0-12.3%), 8.74% (7.3-10.2%), 18.1% (15.1-21.1%), 37.4% (29.5-45.3%). Quantitative measures of FGT (%) were computed with data derived from breast MRI and correlated significantly with conventional qualitative assessments. This quantitative technique may prove to be a valuable tool in clinical use by providing computer generated standardized measurements with limited intra or inter-observer variability.
Izatt, Maree T; Bateman, Gary R; Adam, Clayton J
2012-07-30
Vertebral rotation found in structural scoliosis contributes to trunkal asymmetry which is commonly measured with a simple Scoliometer device on a patient's thorax in the forward flexed position. The new generation of mobile 'smartphones' have an integrated accelerometer, making accurate angle measurement possible, which provides a potentially useful clinical tool for assessing rib hump deformity. This study aimed to compare rib hump angle measurements performed using a Smartphone and traditional Scoliometer on a set of plaster torsos representing the range of torsional deformities seen in clinical practice. Nine observers measured the rib hump found on eight plaster torsos moulded from scoliosis patients with both a Scoliometer and an Apple iPhone on separate occasions. Each observer repeated the measurements at least a week after the original measurements, and were blinded to previous results. Intra-observer reliability and inter-observer reliability were analysed using the method of Bland and Altman and 95% confidence intervals were calculated. The Intra-Class Correlation Coefficients (ICC) were calculated for repeated measurements of each of the eight plaster torso moulds by the nine observers. Mean absolute difference between pairs of iPhone/Scoliometer measurements was 2.1 degrees, with a small (1 degrees) bias toward higher rib hump angles with the iPhone. 95% confidence intervals for intra-observer variability were +/- 1.8 degrees (Scoliometer) and +/- 3.2 degrees (iPhone). 95% confidence intervals for inter-observer variability were +/- 4.9 degrees (iPhone) and +/- 3.8 degrees (Scoliometer). The measurement errors and confidence intervals found were similar to or better than the range of previously published thoracic rib hump measurement studies. The iPhone is a clinically equivalent rib hump measurement tool to the Scoliometer in spinal deformity patients. The novel use of plaster torsos as rib hump models avoids the variables of patient fatigue and discomfort, inconsistent positioning and deformity progression using human subjects in a single or multiple measurement sessions.
2012-01-01
Background Vertebral rotation found in structural scoliosis contributes to trunkal asymmetry which is commonly measured with a simple Scoliometer device on a patient's thorax in the forward flexed position. The new generation of mobile 'smartphones' have an integrated accelerometer, making accurate angle measurement possible, which provides a potentially useful clinical tool for assessing rib hump deformity. This study aimed to compare rib hump angle measurements performed using a Smartphone and traditional Scoliometer on a set of plaster torsos representing the range of torsional deformities seen in clinical practice. Methods Nine observers measured the rib hump found on eight plaster torsos moulded from scoliosis patients with both a Scoliometer and an Apple iPhone on separate occasions. Each observer repeated the measurements at least a week after the original measurements, and were blinded to previous results. Intra-observer reliability and inter-observer reliability were analysed using the method of Bland and Altman and 95% confidence intervals were calculated. The Intra-Class Correlation Coefficients (ICC) were calculated for repeated measurements of each of the eight plaster torso moulds by the nine observers. Results Mean absolute difference between pairs of iPhone/Scoliometer measurements was 2.1 degrees, with a small (1 degrees) bias toward higher rib hump angles with the iPhone. 95% confidence intervals for intra-observer variability were +/- 1.8 degrees (Scoliometer) and +/- 3.2 degrees (iPhone). 95% confidence intervals for inter-observer variability were +/- 4.9 degrees (iPhone) and +/- 3.8 degrees (Scoliometer). The measurement errors and confidence intervals found were similar to or better than the range of previously published thoracic rib hump measurement studies. Conclusions The iPhone is a clinically equivalent rib hump measurement tool to the Scoliometer in spinal deformity patients. The novel use of plaster torsos as rib hump models avoids the variables of patient fatigue and discomfort, inconsistent positioning and deformity progression using human subjects in a single or multiple measurement sessions. PMID:22846346
Ay, Ali; Bulut, Hulya
2015-08-01
Many ostomy patients experience peristomal skin lesions. A descriptive study was conducted to assess the validity, usability, and reliability of the Peristomal Skin Lesions Assessment instrument (SACS instrument) adapted to Turkish from English. The SACS Instrument consists of 2 main assessments: lesion type (utilizing definitions and photographs) and lesion area by location around the ostomy. The study was performed in 2 stages: 1) the SACS language was changed and its content validity established; and 2) the instrument\\'92s content validity and inter-observer agreement (consistency) were determined among pairs of nurses who used the tool to assess peristomal skin lesions. Patients (included if they were >18 years old and receiving treatment/observation at 1 of the 4 participating stomatherapy units) and 8 stomatherapy nurses also completed appropriate sociodemographic questionnaires. Of the 393 patients screened during the 7-month study, 100 (average age 56.74 \\'b1 14.03 years, 55 men) participated; most (79) had a planned operation. A little more than half (59) of the patients had colorectal cancer and 28 had their stoma site marked preoperatively by a stomatherapy nurse. The most common peristomal skin lesion risk factors were having an ileostomy and unplanned surgery. The content validity index of the entire Turkish SACS instrument was 1, and the inter-observer agreement Kappa statistic was very good (K = 0.90, 95% CI 0.80- 0.99). Individual SACS item K values ranged from K = 0.84 (95% CI 0.63\\'961) to K = 1 (95% CI 1). Most (62.5%) nurses found the terms and pictures used in the SACS classification adequate and suitable, and 50% believed the Turkish version of the SACS instrument was a valid and suitable assessment tool for use by Turkish stomatherapy nurses. Validity and reliability studies involving larger and more diverse patient and nurse samples are warranted.
Raffin, Delphine; Zaragoza, Julia; Georgescou, Gabriella; Mourtada, Youssef; Maruani, Annabel; Ossant, Frédéric; Patat, Frédéric; Vaillant, Loïc; Machet, Laurent
2017-06-01
Neurofibromas (NFs) are benign tumours arising from a nerve sheath, which are present in nearly all patients with neurofibromatosis type 1 (NF1). High-frequency ultrasound (HFU) systems, using frequencies over 20 MHz, were developed to improve visualization of skin tumours by means of increased resolution. To describe NFs by using HFU in patients with NF1. Anonymized HFU (25-MHz) images of NFs were randomized. Initially, two dermatologist investigators, with experience in HFU imaging of the skin, together described the ultrasound images and established eight criteria for NFs. The same task was then repeated by two other dermatologists, also with experience in HFU imaging of the skin, independently, to establish inter-observer agreement. A total of 108 NFs in 29 patients were included. Superficial and subcutaneous NFs were hypoechoic with a round to spindle shape. Plexiform NFs were ill-defined, consisting of multiple hypoechoic linear zones. Good to excellent inter-observer agreement was found for six of the eight criteria (k>0.6). This is the first series describing HFU skin imaging of NFs in patients with NF1. Lateral extension that may correspond to involvement of an adjacent nerve seems to be specific to NFs.
Gili, Pablo; Flores-Rodríguez, Patricia; Yangüela, Julio; Orduña-Azcona, Javier; Martín-Ríos, María Dolores
2013-03-01
Evaluation of the efficacy of monochromatic photography of the ocular fundus in differentiating optic nerve head drusen (ONHD) and optic disc oedema (ODE). Sixty-six patients with ONHD, 31 patients with ODE and 70 healthy subjects were studied. Colour and monochromatic fundus photography with different filters (green, red and autofluorescence) were performed. The results were analysed blindly by two observers. The sensitivity, specificity and interobserver agreement (k) of each test were assessed. Colour photography offers 65.5 % sensitivity and 100 % specificity for the diagnosis of ONHD. Monochromatic photography improves sensitivity and specificity and provides similar results: green filter (71.20 % sensitivity, 96.70 % specificity), red filter (80.30 % sensitivity, 96.80 % specificity), and autofluorescence technique (87.8 % sensitivity, 100 % specificity). The interobserver agreement was good with all techniques used: autofluorescence (k = 0.957), green filter (k = 0.897), red filter (k = 0.818) and colour (k = 0.809). Monochromatic fundus photography permits ONHD and ODE to be differentiated, with good sensitivity and very high specificity. The best results were obtained with autofluorescence and red filter study.
The use of atlas registration and graph cuts for prostate segmentation in magnetic resonance images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Korsager, Anne Sofie, E-mail: asko@hst.aau.dk; Østergaard, Lasse Riis; Fortunati, Valerio
2015-04-15
Purpose: An automatic method for 3D prostate segmentation in magnetic resonance (MR) images is presented for planning image-guided radiotherapy treatment of prostate cancer. Methods: A spatial prior based on intersubject atlas registration is combined with organ-specific intensity information in a graph cut segmentation framework. The segmentation is tested on 67 axial T{sub 2}-weighted MR images in a leave-one-out cross validation experiment and compared with both manual reference segmentations and with multiatlas-based segmentations using majority voting atlas fusion. The impact of atlas selection is investigated in both the traditional atlas-based segmentation and the new graph cut method that combines atlas andmore » intensity information in order to improve the segmentation accuracy. Best results were achieved using the method that combines intensity information, shape information, and atlas selection in the graph cut framework. Results: A mean Dice similarity coefficient (DSC) of 0.88 and a mean surface distance (MSD) of 1.45 mm with respect to the manual delineation were achieved. Conclusions: This approaches the interobserver DSC of 0.90 and interobserver MSD 0f 1.15 mm and is comparable to other studies performing prostate segmentation in MR.« less
McIver, Kerry L.; Brown, William H.; Pfeiffer, Karin A.; Dowda, Marsha; Pate, Russell R.
2016-01-01
Purpose This study describes the development and pilot testing of the Observational System for Recording Physical Activity-Elementary School (OSRAC-E) version. Methods This system was developed to observe and document the levels and types of physical activity and physical and social contexts of physical activity in elementary school students during the school day. Inter-observer agreement scores and summary data were calculated. Results All categories had Kappa statistics above 0.80, with the exception of the activity initiator category. Inter-observer agreement scores were 96% or greater. The OSRAC-E was shown to be a reliable observation system that allows researchers to assess physical activity behaviors, the contexts of those behaviors, and the effectiveness of physical activity interventions in the school environment. Conclusion The OSRAC-E can yield data with high interobserver reliability and provide relatively extensive contextual information about physical activity of students in elementary schools. PMID:26889587
Mulder, F J; Mosmuller, D G M; de Vet, H C W; Mouës, C M; Breugem, C C; van der Molen, A B Mink; Don Griot, J P W
2018-01-01
Objective To develop a reliable and easy-to-use method to assess the nasolabial appearance of 18-year-old patients with unilateral cleft lip and palate (CLP). Design Retrospective analysis of nasolabial aesthetics using a 5-point ordinal scale and newly developed photographic reference scale: the Cleft Aesthetic Rating Scale (CARS). Three cleft surgeons and 20 medical students scored the nasolabial appearance on standardized frontal photographs. Setting VU University Medical Center, Amsterdam. Patients Inclusion criteria: 18-year-old patients, unilateral cleft lip and palate, available photograph of the frontal view. history of facial trauma, congenital syndromes affecting facial appearance. Eighty photographs were available for scoring. Main Outcome Measures The interobserver and intraobserver reliability of the CARS for 18-year-old patients when used by cleft surgeons and medical students. Results The interobserver reliability for the nose and lip together was 0.64 for the cleft surgeons and 0.61 for the medical students. There was an intraobserver reliability of 0.75 and 0.78 from the surgeons and students, respectively, on the nose and lip together. No significant difference was found between the cleft surgeons and medical students in the way they scored the nose ( P = 0.22) and lip ( P = 0.72). Conclusions The Cleft Aesthetic Rating Scale for 18-year-old patients has a substantial overall estimated reliability when the average score is taken from three or more cleft surgeons or medical students assessing the nasolabial aesthetics of CLP patients.
A Standardized DNA Variant Scoring System for Pathogenicity Assessments in Mendelian Disorders
Karbassi, Izabela; Maston, Glenn A.; Love, Angela; DiVincenzo, Christina; Braastad, Corey D.; Elzinga, Christopher D.; Bright, Alison R.; Previte, Domenic; Zhang, Ke; Rowland, Charles M.; McCarthy, Michele; Lapierre, Jennifer L.; Dubois, Felicita; Medeiros, Katelyn A.; Batish, Sat Dev; Jones, Jeffrey; Liaquat, Khalida; Hoffman, Carol A.; Jaremko, Malgorzata; Wang, Zhenyuan; Sun, Weimin; Buller‐Burckle, Arlene; Strom, Charles M.; Keiles, Steven B.
2015-01-01
ABSTRACT We developed a rules‐based scoring system to classify DNA variants into five categories including pathogenic, likely pathogenic, variant of uncertain significance (VUS), likely benign, and benign. Over 16,500 pathogenicity assessments on 11,894 variants from 338 genes were analyzed for pathogenicity based on prediction tools, population frequency, co‐occurrence, segregation, and functional studies collected from internal and external sources. Scores were calculated by trained scientists using a quantitative framework that assigned differential weighting to these five types of data. We performed descriptive and comparative statistics on the dataset and tested interobserver concordance among the trained scientists. Private variants defined as variants found within single families (n = 5,182), were either VUS (80.5%; n = 4,169) or likely pathogenic (19.5%; n = 1,013). The remaining variants (n = 6,712) were VUS (38.4%; n = 2,577) or likely benign/benign (34.7%; n = 2,327) or likely pathogenic/pathogenic (26.9%, n = 1,808). Exact agreement between the trained scientists on the final variant score was 98.5% [95% confidence interval (CI) (98.0, 98.9)] with an interobserver consistency of 97% [95% CI (91.5, 99.4)]. Variant scores were stable and showed increasing odds of being in agreement with new data when re‐evaluated periodically. This carefully curated, standardized variant pathogenicity scoring system provides reliable pathogenicity scores for DNA variants encountered in a clinical laboratory setting. PMID:26467025
A Standardized DNA Variant Scoring System for Pathogenicity Assessments in Mendelian Disorders.
Karbassi, Izabela; Maston, Glenn A; Love, Angela; DiVincenzo, Christina; Braastad, Corey D; Elzinga, Christopher D; Bright, Alison R; Previte, Domenic; Zhang, Ke; Rowland, Charles M; McCarthy, Michele; Lapierre, Jennifer L; Dubois, Felicita; Medeiros, Katelyn A; Batish, Sat Dev; Jones, Jeffrey; Liaquat, Khalida; Hoffman, Carol A; Jaremko, Malgorzata; Wang, Zhenyuan; Sun, Weimin; Buller-Burckle, Arlene; Strom, Charles M; Keiles, Steven B; Higgins, Joseph J
2016-01-01
We developed a rules-based scoring system to classify DNA variants into five categories including pathogenic, likely pathogenic, variant of uncertain significance (VUS), likely benign, and benign. Over 16,500 pathogenicity assessments on 11,894 variants from 338 genes were analyzed for pathogenicity based on prediction tools, population frequency, co-occurrence, segregation, and functional studies collected from internal and external sources. Scores were calculated by trained scientists using a quantitative framework that assigned differential weighting to these five types of data. We performed descriptive and comparative statistics on the dataset and tested interobserver concordance among the trained scientists. Private variants defined as variants found within single families (n = 5,182), were either VUS (80.5%; n = 4,169) or likely pathogenic (19.5%; n = 1,013). The remaining variants (n = 6,712) were VUS (38.4%; n = 2,577) or likely benign/benign (34.7%; n = 2,327) or likely pathogenic/pathogenic (26.9%, n = 1,808). Exact agreement between the trained scientists on the final variant score was 98.5% [95% confidence interval (CI) (98.0, 98.9)] with an interobserver consistency of 97% [95% CI (91.5, 99.4)]. Variant scores were stable and showed increasing odds of being in agreement with new data when re-evaluated periodically. This carefully curated, standardized variant pathogenicity scoring system provides reliable pathogenicity scores for DNA variants encountered in a clinical laboratory setting. © 2015 The Authors. **Human Mutation published by Wiley Periodicals, Inc.
Tsai-Goodman, Beverly; Zhu, Meng Yuan; Al-Rujaib, Mashael; Seed, Mike; Macgowan, Christopher K
2015-04-18
Phase contrast cardiovascular magnetic resonance (PC CMR) has emerged as a clinical tool for blood flow quantification but its use in the foetus has been hampered by the need for gating with the foetal heart beat. The previously described metric optimized gating (MOG) technique has been successfully used to measure foetal blood flow in late gestation foetuses on a 1.5 T CMR magnet. However, there is increasing interest in performing foetal cardiac imaging using 3.0 T CMR. We describe our pilot investigation of foetal blood flow measured using 3.0 T CMR. Foetal blood flows were quantified in 5 subjects at late gestational age (35-38 weeks). Three were normal pregnancies and two were pregnancies with ventricular size discrepancy. Data were obtained at 1.5 T and 3.0 T using a previously described PC CMR protocol. After reconstruction using MOG, blood flow was quantified independently by two observers. Intra- and inter-observer reproducibility of flow measurements at the two field strengths was assessed by Pearson correlation coefficient (R(2)), linear regression and Bland Altman analysis. PC CMR flow measurements were obtained in 36 of 40 target vessels. Strong intra-observer agreement was obtained between measurements at each field strength (R(2) = 0.78, slope = 0.83 ± 0.11), with a mean bias of -1 ml/min/kg and 95% confidence limits of ±71 ml/min/kg. Inter-observer agreement was similarly high for measurements at both 1.5 T (R(2) = 0.86, slope = 0.95 ± 0.13, bias = 6 ± 52 ml/min/kg) and 3.0 T (R(2) = 0.88, slope = 0.94 ± 0.13, bias = 4 ± 47 ml/min/kg). Across all PC CMR measurements, SNR per pixel was expectedly higher at 3.0 T relative to 1.5 T (165 ± 50%). The relative differences in flow measurements between observers were low (range: 4-16%) except for pulmonary blood flow which showed much higher variability at 1.5 T (34%) versus that at 3.0 T (11%). This was attributed to the poorly visualized, small pulmonary vessels at 1.5 T, which made delineation inconsistent between observers. This is the first pilot study to measure foetal blood flow using PC CMR at 3.0 T. The flow data obtained were in good correlation with those measured at 1.5 T, both within and between observers. With increased SNR at 3.0 T, smaller pulmonary vessels were better visualized which improved inter-observer agreement of associated flows.
Statistical strategy for anisotropic adventitia modelling in IVUS.
Gil, Debora; Hernández, Aura; Rodriguez, Oriol; Mauri, Josepa; Radeva, Petia
2006-06-01
Vessel plaque assessment by analysis of intravascular ultrasound sequences is a useful tool for cardiac disease diagnosis and intervention. Manual detection of luminal (inner) and media-adventitia (external) vessel borders is the main activity of physicians in the process of lumen narrowing (plaque) quantification. Difficult definition of vessel border descriptors, as well as, shades, artifacts, and blurred signal response due to ultrasound physical properties trouble automated adventitia segmentation. In order to efficiently approach such a complex problem, we propose blending advanced anisotropic filtering operators and statistical classification techniques into a vessel border modelling strategy. Our systematic statistical analysis shows that the reported adventitia detection achieves an accuracy in the range of interobserver variability regardless of plaque nature, vessel geometry, and incomplete vessel borders.
Sainz, José A; Fernández-Palacín, Ana; Borrero, Carlota; Aquise, Adriana; Ramos, Zenaida; García-Mejido, José A
2018-04-01
The aim of this study was to evaluate the inter- and intraobserver correlation of the different intrapartum-transperineal-ultrasound-parameters(ITU) (angle of progression (AoP), progression-distance (PD), head-direction (HD), midline-angle (MLA) and head-perineum distance (HPD)) with contraction and pushing. We evaluated 28 nulliparous women at full dilatation under epidural analgesia. We performed a transperineal ultrasound evaluating AoP and PD in the longitudinal plane, and MLA and HPD in the transverse plane. Interclass correlation coefficients (ICC) with 95% CIs and Bland-Altman analysis were used to assess intra- and interobserver measurement's repeatability. The ICC of the ITU for the same observer was adequate for all the parameters (p < .005) AoP 0.98 (95%CI, 0.96-0.99), PD 0.98 (95%CI, 0.97-0.99), MLA 0.99 (95%CI, 0.97-0.99), HPD 0.96 (95%CI, 0.88-0.99). The ICC of the ITU for interobserver was: AoP 0.93 (95%CI, 0.79-0.98), PD 0.92 (95%CI, 0.76-0.97), MLA 0.77 (95%CI, 0.42-0.92), HPD 0.47 (95%CI, -0.12-0.8). The HD had an interobserver correlation of 0.53 (95%CI, 0.1-0.9) (Kappa C). The mean difference of the AoP was 2.42°, of the PD 1 mm and 0.28° MLA (Bland-Altman test). ITU has an adequate intra- and interobserver correlation for its use with contraction and pushing under epidural analgesia. Impact statement What is already known on this subject: The intrapartum transperineal ultrasound parameters can be used with contraction and pushing under epidural analgesia. What the results of this study add to what we know: ITU may be used to evaluate the difficulty of instrumental delivery/to evaluate the difficulty of instrumentation in vaginal operative deliveries and this study concludes that ITU is reproducible during uterine contraction with pushing. What the implications are of these findings for clinical practice and/or further research: Therefore, ITU could be used without difficulty with an adequate intra- and interobserver correlation for the prediction of instrumentation difficulty in operative vaginal deliveries.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verhaart, René F., E-mail: r.f.verhaart@erasmusmc.nl; Paulides, Margarethus M.; Fortunati, Valerio
Purpose: In current clinical practice, head and neck (H and N) hyperthermia treatment planning (HTP) is solely based on computed tomography (CT) images. Magnetic resonance imaging (MRI) provides superior soft-tissue contrast over CT. The purpose of the authors’ study is to investigate the relevance of using MRI in addition to CT for patient modeling in H and N HTP. Methods: CT and MRI scans were acquired for 11 patients in an immobilization mask. Three observers manually segmented on CT, MRI T1 weighted (MRI-T1w), and MRI T2 weighted (MRI-T2w) images the following thermo-sensitive tissues: cerebrum, cerebellum, brainstem, myelum, sclera, lens, vitreousmore » humor, and the optical nerve. For these tissues that are used for patient modeling in H and N HTP, the interobserver variation of manual tissue segmentation in CT and MRI was quantified with the mean surface distance (MSD). Next, the authors compared the impact of CT and CT and MRI based patient models on the predicted temperatures. For each tissue, the modality was selected that led to the lowest observer variation and inserted this in the combined CT and MRI based patient model (CT and MRI), after a deformable image registration. In addition, a patient model with a detailed segmentation of brain tissues (including white matter, gray matter, and cerebrospinal fluid) was created (CT and MRI{sub db}). To quantify the relevance of MRI based segmentation for H and N HTP, the authors compared the predicted maximum temperatures in the segmented tissues (T{sub max}) and the corresponding specific absorption rate (SAR) of the patient models based on (1) CT, (2) CT and MRI, and (3) CT and MRI{sub db}. Results: In MRI, a similar or reduced interobserver variation was found compared to CT (maximum of median MSD in CT: 0.93 mm, MRI-T1w: 0.72 mm, MRI-T2w: 0.66 mm). Only for the optical nerve the interobserver variation is significantly lower in CT compared to MRI (median MSD in CT: 0.58 mm, MRI-T1w: 1.27 mm, MRI-T2w: 1.40 mm). Patient models based on CT (T{sub max}: 38.0 °C) and CT and MRI (T{sub max}: 38.1 °C) result in similar simulated temperatures, while CT and MRI{sub db} (T{sub max}: 38.5 °C) resulted in significantly higher temperatures. The SAR corresponding to these temperatures did not differ significantly. Conclusions: Although MR imaging reduces the interobserver variation in most tissues, it does not affect simulated local tissue temperatures. However, the improved soft-tissue contrast provided by MRI allows generating a detailed brain segmentation, which has a strong impact on the predicted local temperatures and hence may improve simulation guided hyperthermia.« less
Vang, Russell; Gupta, Mamta; Wu, Lee-Shu-Fune; Yemelyanova, Anna V; Kurman, Robert J; Murphy, Kathleen M; Descipio, Cheryl; Ronnett, Brigitte M
2012-03-01
Distinction of hydatidiform moles (HMs) from nonmolar specimens (NMs) and subclassification of HMs as complete hydatidiform moles (CHMs) and partial hydatidiform moles (PHMs) are important for clinical practice and investigational studies; yet, diagnosis based solely on morphology is affected by interobserver variability. Molecular genotyping can distinguish these entities by discerning androgenetic diploidy, diandric triploidy, and biparental diploidy to diagnose CHMs, PHMs, and NMs, respectively. Eighty genotyped cases (27 CHMs, 27 PHMs, and 26 NMs) were selected from a series of 200 potentially molar specimens previously diagnosed using p57 immunostaining and genotyping. Cases were classified by 3 gynecologic pathologists on the basis of H&E slides (masked to p57 immunostaining and genotyping results) into 1 of 3 categories (CHM, PHM, or NM) during 2 diagnostic rounds; a third round incorporating p57 immunostaining results was also conducted. Consensus diagnoses (those rendered by 2 of 3 pathologists) were determined. Genotyping results were used as the gold standard for assessing diagnostic performance. Sensitivity of a diagnosis of CHM ranged from 59% to 100% for individual pathologists and from 70% to 81% by consensus; specificity ranged from 91% to 96% for individuals and from 94% to 98% by consensus. Sensitivity of a diagnosis of PHM ranged from 56% to 93% for individual pathologists and from 70% to 78% by consensus; specificity ranged from 58% to 92% for individuals and from 74% to 85% by consensus. The percentage of correct classification of all cases by morphology ranged from 55% to 75% for individual pathologists and from 70% to 75% by consensus. The κ values for interobserver agreement ranged from 0.59 to 0.73 (moderate to good) for a diagnosis of CHM, from 0.15 to 0.43 (poor to moderate) for PHM, and from 0.13 to 0.42 (poor to moderate) for NM. The κ values for intraobserver agreement ranged from 0.44 to 0.67 (moderate to good). Addition of the p57 immunostain improved sensitivity of a diagnosis of CHM to a range of 93% to 96% for individual pathologists and 96% by consensus; specificity was improved from a range of 96% to 98% for individual pathologists and 96% by consensus; there was no substantial impact on diagnosis of PHMs and NMs. Interobserver agreement for interpretation of the p57 immunostain was 0.96 (almost perfect). Even with morphologic assessment by gynecologic pathologists and p57 immunohistochemistry, 20% to 30% of cases will be misclassified, and, in particular, distinction of PHMs and NMs will remain problematic.
Capillary refill time: a study of interobserver reliability among nurses and nurse assistants.
Brabrand, Mikkel; Hosbond, Susanne; Folkestad, Lars
2011-02-01
The interobserver variability of capillary refill time (CRT) has been questioned. Earlier studies of interobserver variability of CRT have been on a large number of patients but with few observers. The objective of our study was to investigate how a large group of nurses and nurse assistants would grade CRT. We recorded a video of the index finger of six medical patients and these were shown to nurses and nurse assistants. They were asked to record the CRT and whether they found this value to be normal. The data were analyzed using the Fleiss Kappa Coefficient Analysis and graded according to the Landis and Koch correlation. Correlation between the exact numbers was evaluated using interclass correlation. Nine nurse assistants and 37 nurses participated. The patients were aged between 44 and 87 years. All but one patient had a systolic blood pressure reading above 130 mmHg. All had arterial blood oxygen saturation above 92% and all but one had normal body temperature. The κ value for normality was 0.56. The interclass correlation of measurement of CRT was 0.62. This is the largest interobserver study of CRT when looking at the number of observers. We found an only moderate agreement for the exact value of CRT and a moderate agreement for normality. We believe that CRT should be used with caution in clinical practice.
Inter-observer variability within BI-RADS and RANZCR mammographic density assessment schemes
NASA Astrophysics Data System (ADS)
Damases, Christine N.; Mello-Thoms, Claudia; McEntee, Mark F.
2016-03-01
This study compares variability associated with two visual mammographic density (MD) assessment methods using two separate samples of radiologists. The image test-set comprised of images obtained from 20 women (age 42-89 years). The images were assessed for their MD by twenty American Board of Radiology (ABR) examiners and twenty-six radiologists registered with the Royal Australian and New Zealand College of Radiologists (RANZCR). Images were assessed using the same technology and conditions, however the ABR radiologists used the BI-RADS and the RANZCR radiologists used the RANZCR breast density synoptic. Both scales use a 4-point assessment. The images were then grouped as low- and high-density; low including BIRADS 1 and 2 or RANZCR 1 and 2 and high including BI-RADS 3 and 4 or RANZCR 3 and 4. Four-point BI-RADS and RANZCR showed no or negligible correlation (ρ=-0.029 p<0.859). The average inter-observer agreement on the BI-RADS scale had a Kappa of 0.565; [95% CI = 0.519 - 0.610], and ranged between 0.328-0.669 while the inter-observer agreement using the RANZCR scale had a Kappa of 0.360; [95% CI = 0.308 - 0.412] and a range of 0.078-0.499. Our findings show a wider range of inter-observer variability among RANZCR registered radiologists than the ABR examiners.
Nguyen, Donna; Minnal, Vandana R.
2016-01-01
Purpose. To evaluate interobserver, intervisit, and interinstrument agreements for gonioscopy and Fourier domain anterior segment optical coherence tomography (FD ASOCT) for classifying open and narrow angle eyes. Methods. Eighty-six eyes with open or narrow anterior chamber angles were included. The superior angle was classified open or narrow by 2 of 5 glaucoma specialists using gonioscopy and imaged by FD ASOCT in the dark. The superior angle of each FD ASOCT image was graded as open or narrow by 2 masked readers. The same procedures were repeated within 6 months. Kappas for interobserver and intervisit agreements for each instrument and interinstrument agreements were calculated. Results. The mean age was 50.9 (±18.4) years. Interobserver agreements were moderate to good for both gonioscopy (0.57 and 0.69) and FD ASOCT (0.58 and 0.75). Intervisit agreements were moderate to excellent for both gonioscopy (0.53 to 0.86) and FD ASOCT (0.57 and 0.85). Interinstrument agreements were fair to good (0.34 to 0.63), with FD ASOCT classifying more angles as narrow than gonioscopy. Conclusions. Both gonioscopy and FD ASOCT examiners were internally consistent with similar interobserver and intervisit agreements for angle classification. Agreement between instruments was fair to good, with FD ASOCT classifying more angles as narrow than gonioscopy. PMID:27990300
Brunner, J; Krummenauer, F; Lehr, H A
2000-04-01
Study end-points in microcirculation research are usually video-taped images rather than numeric computer print-outs. Analysis of these video-taped images for the quantification of microcirculatory parameters usually requires computer-based image analysis systems. Most software programs for image analysis are custom-made, expensive, and limited in their applicability to selected parameters and study end-points. We demonstrate herein that an inexpensive, commercially available computer software (Adobe Photoshop), run on a Macintosh G3 computer with inbuilt graphic capture board provides versatile, easy to use tools for the quantification of digitized video images. Using images obtained by intravital fluorescence microscopy from the pre- and postischemic muscle microcirculation in the skinfold chamber model in hamsters, Photoshop allows simple and rapid quantification (i) of microvessel diameters, (ii) of the functional capillary density and (iii) of postischemic leakage of FITC-labeled high molecular weight dextran from postcapillary venules. We present evidence of the technical accuracy of the software tools and of a high degree of interobserver reliability. Inexpensive commercially available imaging programs (i.e., Adobe Photoshop) provide versatile tools for image analysis with a wide range of potential applications in microcirculation research.
Buczinski, S; Faure, C; Jolivet, S; Abdallah, A
2016-07-01
To determine inter-observer agreement for a clinical scoring system for the detection of bovine respiratory disease complex in calves, and the impact of classification of calves as sick or healthy based on different cut-off values. Two third-year veterinary students (Observer 1 and 2) and one post-graduate student (Observer 3) received 4 hours of training on scoring dairy calves for signs of respiratory disease, including rectal temperature, cough, eye and nasal discharge, and ear position. Observers 1 and 2 scored 40 pre-weaning dairy calves 24 hours apart (80 observations) over three visits to a calf-rearing facility, and Observers 1, 2 and 3 scored 20 calves on one visit. Inter-observer agreement was assessed using percentage of agreement (PA) and Kappa statistics for individual clinical signs, comparing Observers 1 and 2. Agreement between the three observers for total clinical score was assessed using cut-off values of ≥4, ≥5 and ≥6 to indicate unhealthy calves. Inter-observer PA for rectal temperature was 0.68, for cough 0.78, for nasal discharge 0.62, for eye discharge 0.63, and for ear position 0.85. Kappa values for all clinical signs indicated slight to fair agreement (<0.4), except temperature that had moderate agreement (0.6). The Fleiss' Kappa for total score, using cut-offs of ≥4, ≥5 and ≥6 to indicate unhealthy calves, was 0.35, 0.06 and 0.13, respectively, indicating slight to fair agreement. There was important inter-observer discrepancies in scoring clinical signs of respiratory disease, using relatively inexperienced observers. These disagreements may ultimately mean increased false negative or false positive diagnoses and incorrect treatment of cases. Visual assessment of clinical signs associated with bovine respiratory disease needs to be thoroughly validated when disease monitoring is based on the use of a clinical scoring system.
High inter-observer agreement of observer-perceived pain assessment in the emergency department.
Hangaard, Martin Høhrmann; Malling, Brian; Mogensen, Christian Backer
2018-02-21
Triage is used to prioritize the patients in the emergency department. The majority of the triage systems include the patients' pain score to assess their level of acuity by using a combination of patient reported pain and observer-perceived pain; the latter therefore requires a certain degree of inter-observer agreement. The aim of the present study was to assess the inter-observer agreement of perceived pain among emergency department nurses and to evaluate if it was influenced by predetermined factors like age and gender. A project assistant randomly recruited two nurses, who were not allowed to interact with each other, to assess patient pain intensity on the numeric ranking scale. The project assistant afterwards entered the pain scores in a predesigned electronic questionnaire. We used weighted Fleiss-Cohen (quadratic) kappa statistics, Bland-Altman statistics and logistic regression analysis to assess the inter-observer agreement. One hundred and sixty-two patients were included. They had a median age of 38 years and 45% were females. 30% of the patients were acute surgical patients and 70% acute orthopedic patients. The average time between the pain assessments were 1,7 min. The Bland Altman analysis found a mean difference in pain score of 0.2 and 95% limits of agreement of +/- 3 point. When the NRS scores were translated to commonly used pain categories (no, mild, moderate or severe pain) we found a 70% agreement with a mean difference in categories of 0.05 and 95% limits of agreement of +/- 1 category. Patient age, gender, localization of pain, examination room or presence of a significant other did not affect the inter-observer agreement. We found 70% agreement on pain category between the nurses and it is justified that nurse-perceived pain assessment is used for triage in the emergency department.
Ueno, Yoshiko; Maeda, Tetsuo; Tanaka, Utaru; Tanimura, Kenji; Kitajima, Kazuhiro; Suenaga, Yuko; Takahashi, Satoru; Yamada, Hideto; Sugimura, Kazuro
2016-09-01
To evaluate the interobserver variability and diagnostic performance of a developed magnetic resonance imaging (MRI)-based scoring system for invasive placenta previa. Prenatal MR images of 70 women were retrospectively evaluated, 18 of whom were diagnosed with invasive placenta. The six MR features (dark band on T2 -weighted images, intraplacental abnormal vascularity, placental bulge, heterogeneous placenta, myometrial thinning, and placental protrusion sign) were scored on 5-point Likert scale separately, and the cumulative radiological score (CRS) was defined as the sum of each score. Two more experienced radiologists (readers A and B) and two less experienced residents (readers C and D) calculated the CRS. Interobserver variability was assessed by measuring the intraclass correlation coefficient. Diagnostic performance was evaluated by means of receiver operating characteristic (ROC) analysis. Interobserver variability for CRS was excellent for the more experienced radiologists (0.85), and good for all readers (0.72) and the less experienced residents (0.66). The area under the ROC curve (Az) and accuracy (Acc) for CRS were significantly higher or equivalent to those of other MR features for all readers (Az and Acc for reader A; CRS, 0.92, 91.4%; intraplacental T2 dark band, 0.83, P = 0.009, 81.4%, P = 0.03; intraplacental abnormal vascularity, 0.9, P = 0.3, 90.0%, P = 1.00; placental bulge, 0.81, P = 0.0008, 80.0%, P = 0.02; heterogeneous placenta, 0.85, P = 0.11, 74.3%, P = 0.002; myometrial thinning, 0.84, P = 0.06, 60.0%, P < 0.0001; placental protrusion sign, 0.81, P = 0.01, 81.4%, P = 0.26). This developed MRI-based scoring system demonstrated excellent or good interobserver variability, and good diagnostic performance for invasive placenta previa. J. Magn. Reson. Imaging 2016;44:573-583. © 2016 International Society for Magnetic Resonance in Medicine.
Detection of MET amplification in gastroesophageal tumor specimens using IQFISH
Nielsen, Karsten Bork; Mollerup, Jens; Jepsen, Anna; Go, Ning
2017-01-01
Background The gene mesenchymal epithelial transition factor (MET) is a proto-oncogene that encodes a transmembrane receptor with intrinsic tyrosine kinase activity known as Met or cMet. MET is found to be amplified in several human cancers including gastroesophageal cancer. Methods Here we report the MET amplification prevalence data from 159 consecutive tumor specimens from patients with gastric (G), gastroesophageal junction (GEJ) and esophageal (E) adenocarcinoma, using a novel fluorescence in situ hybridization (FISH) assay, MET/CEN-7 IQFISH Probe Mix [an investigational use only (IUO) assay]. MET amplification was defined as a MET/CEN-7 ratio ≥2.0. Furthermore, the link between the MET signal distribution and amplification status was investigated. Results The prevalence of MET amplification was found to be 6.9%. The FISH assay demonstrated a high inter-observer reproducibility. The inter-observer results showed a 100% overall agreement with respect to the MET status (amplified/non-amplified). The inter-observer CV was estimated to 11.8% (95% CI: 10.2–13.4). For the signal distribution, the inter-observer agreement was reported to be 98.7%. We also report an association of MET amplification and a unique signal distribution pattern in the G/GEJ/E tumor specimens. We found that the prevalence of MET amplification was markedly higher in tumors specimens with a heterogeneous (66.7%) versus homogeneous (2.0%) signal distribution. Furthermore, specimens with a heterogeneous signal distribution had a statically significantly higher median MET/CEN-7 ratio (2.35 versus 1.04; P<0.0001). Conclusions The novel FISH assay showed a high inter-observer reproducibility both with respect to amplification status and signal distribution. Based on the finding in the study it is suggested that MET amplification mainly is associated with tumor cells that is represented by a heterogonous growth pattern. PMID:29285491
Kent, Michael N; Olsen, Thomas G; Feeser, Theresa A; Tesno, Katherine C; Moad, John C; Conroy, Michael P; Kendrick, Mary Jo; Stephenson, Sean R; Murchland, Michael R; Khan, Ayesha U; Peacock, Elizabeth A; Brumfiel, Alexa; Bottomley, Michael A
2017-12-01
Digital pathology represents a transformative technology that impacts dermatologists and dermatopathologists from residency to academic and private practice. Two concerns are accuracy of interpretation from whole-slide images (WSI) and effect on workflow. Studies of considerably large series involving single-organ systems are lacking. To evaluate whether diagnosis from WSI on a digital microscope is inferior to diagnosis of glass slides from traditional microscopy (TM) in a large cohort of dermatopathology cases with attention on image resolution, specifically eosinophils in inflammatory cases and mitotic figures in melanomas, and to measure the workflow efficiency of WSI compared with TM. Three dermatopathologists established interobserver ground truth consensus (GTC) diagnosis for 499 previously diagnosed cases proportionally representing the spectrum of diagnoses seen in the laboratory. Cases were distributed to 3 different dermatopathologists who diagnosed by WSI and TM with a minimum 30-day washout between methodologies. Intraobserver WSI/TM diagnoses were compared, followed by interobserver comparison with GTC. Concordance, major discrepancies, and minor discrepancies were calculated and analyzed by paired noninferiority testing. We also measured pathologists' read rates to evaluate workflow efficiency between WSI and TM. This retrospective study was caried out in an independent, national, university-affiliated dermatopathology laboratory. Intraobserver concordance of diagnoses between WSI and TM methods and interobserver variance from GTC, following College of American Pathology guidelines. Mean intraobserver concordance between WSI and TM was 94%. Mean interobserver concordance was 94% for WSI and GTC and 94% for TM and GTC. Mean interobserver concordance between WSI, TM, and GTC was 91%. Diagnoses from WSI were noninferior to those from TM. Whole-slide image read rates were commensurate with WSI experience, achieving parity with TM by the most experienced user. Diagnosis from WSI was found equivalent to diagnosis from glass slides using TM in this statistically powerful study of 499 dermatopathology cases. This study supports the viability of WSI for primary diagnosis in the clinical setting.
Serological markers in inflammatory bowel disease: the pros and cons.
Lerner, Aaron; Shoenfeld, Yehuda
2002-02-01
Accurate serological assays are desirable for the diagnosis of inflammatory bowel disease. Among several serological markers anti-Saccharomyces cerevisiae mannan antibodies and perinuclear antineutrophil cytoplasmic autoantibodies are highly disease specific for Crohn's disease and ulcerative colitis, respectively. Combining the two improves their specificity. Sensitivity, however, is still low. Due to lack of standardization and vast interobserver variability, they cannot be used as the only diagnostic criteria but can assist clinicians in diagnosing and categorizing patients with inflammatory bowel disease as well as in helping them to take therapeutic decisions.
A Study on the Reliability of Sasang Constitutional Body Trunk Measurement
Jang, Eunsu; Kim, Jong Yeol; Lee, Haejung; Kim, Honggie; Baek, Younghwa; Lee, Siwoo
2012-01-01
Objective. Body trunk measurement for human plays an important diagnostic role not only in conventional medicine but also in Sasang constitutional medicine (SCM). The Sasang constitutional body trunk measurement (SCBTM) consists of the 5-widths and the 8-circumferences which are standard locations currently employed in the SCM society. This study suggests to what extent a comprehensive training can improve the reliability of the SCBTM. Methods. We recruited 10 male subjects and 5 male observers with no experience of anthropometric measurement. We conducted measurements twice before and after a comprehensive training. Relative technical error of measurement (%TEMs) was produced to assess intra and inter observer reliabilities. Results. Post-training intra-observer %TEMs of the SCBTM were 0.27% to 1.85% reduced from 0.27% to 6.26% in pre-training, respectively. Post-training inter-observer %TEMs of those were 0.56% to 1.66% reduced from 1.00% to 9.60% in pre-training, respectively. Post-training % total TEMs which represent the whole reliability were 0.68% to 2.18% reduced from maximum value of 10.18%. Conclusion. A comprehensive training makes the SCBTM more reliable, hence giving a sufficiently confident diagnostic tool. It is strongly recommended to give a comprehensive training in advance to take the SCBTM. PMID:21822442
Computer-automated ABCD versus dermatologists with different degrees of experience in dermoscopy.
Piccolo, Domenico; Crisman, Giuliana; Schoinas, Spyridon; Altamura, Davide; Peris, Ketty
2014-01-01
Dermoscopy is a very useful and non-invasive technique for in vivo observation and preoperative diagnosis of pigmented skin lesions (PSLs) inasmuch as it enables analysis of surface and subsurface structures that are not discernible to the naked eye. The authors used the ABCD rule of dermoscopy to test the accuracy of melanoma diagnosis with respect to a panel of 165 PSLs and the intra- and inter-observer diagnostic agreement obtained between three dermatologists with different degrees of experience, one General Practitioner and a DDA for computer-assisted diagnosis (Nevuscreen(®), Arkè s.a.s., Avezzano, Italy). 165 Pigmented Skin Lesions from 165 patients were selected. Histopathological examination revealed 132 benign melanocytic skin lesions and 33 melanomas. The kappa statistic, sensitivity, specificity and predictive positive and negative values were calculated to measure agreement between all the human observers and in comparison with the automated DDA. Our results revealed poor reproducibility of the semi-quantitative algorithm devised by Stolz et al. independently of observers' experience in dermoscopy. Nevuscreen(®) (Arkè s.a.s., Avezzano, Italy) proved to be 'user friendly' to all observers, thus enabling a more critical evaluation of each lesion and representing a helpful tool for clinicians without significant experience in dermoscopy in improving and achieving more accurate diagnosis of PSLs.
Dessauvagie, Benjamin F; Lee, Andrew H S; Meehan, Katie; Nijhawan, Anju; Tan, Puay Hoon; Thomas, Jeremy; Tie, Bibiana; Treanor, Darren; Umar, Seemeen; Hanby, Andrew M; Millican-Slater, Rebecca
2018-02-13
Fibroepithelial lesions (FELs) of the breast span a morphological continuum including lesions where distinction between cellular fibroadenoma (FA) and benign phyllodes tumour (PT) is difficult. The distinction is clinically important with FAs managed conservatively while equivocal lesions and PTs are managed with surgery. We sought to audit core biopsy diagnoses of equivocal FELs by digital pathology and to investigate whether digital point counting is useful in clarifying FEL diagnoses. Scanned slide images from cores and subsequent excisions of 69 equivocal FELs were examined in a multicentre audit by eight pathologists to determine the agreement and accuracy of core needle biopsy (CNB) diagnoses and by digital point counting of stromal cellularity and expansion to determine if classification could be improved. Interobserver variation was high on CNB with a unanimous diagnosis from all pathologists in only eight cases of FA, diagnoses of both FA and PT on the same CNB in 15 and a 'weak' mean kappa agreement between pathologists (k=0.36). 'Moderate' agreement was observed on CNBs among breast specialists (k=0.44) and on excision samples (k=0.49). Up to 23% of lesions confidently diagnosed as FA on CNB were PT on excision and up to 30% of lesions confidently diagnosed as PT on CNB were FA on excision. Digital point counting did not aid in the classification of FELs. Accurate and reproducible diagnosis of equivocal FELs is difficult, particularly on CNB, resulting in poor interobserver agreement and suboptimal accuracy. Given the diagnostic difficulty, and surgical implications, equivocal FELs should be reported in consultation with experienced breast pathologists as a small number of benign FAs can be selected out from equivocal lesions. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Peregrina, Alejandro; Azer, Shereen S; Tao, Erin E; Johnston, William M
2016-12-01
Curvature of the posterior border of the mandibular ramus at the occlusal plane has been described as a morphological trait for males. Controversy over the accuracy of this method remains among researchers; studies employing similar methods report accuracy rates for successful gender identification ranging from 59% to 99%. This blind study assessed evaluators' ability to determine gender based on the presence or absence of curvature of the posterior margin of the mandibular ramus through panoramic radiographs. Randomly selected panoramic radiographs were obtained from The Ohio State University College of Dentistry for 413 adult male (M) and female (F) subjects. Two evaluators separately assigned ratings using a similar method to the Loth and Henenberg methodology to each subject on the right and left sides of mandibular rami. The ratings were based upon three criteria: (1) presence of curvature at the occlusal plane (M), (2) presence of curvature but not at the occlusal plane (F), and (3) lack of curvature (F). Pearson exact chi-squared test was used to evaluate the statistical strength of the ratings. The evaluators were only in agreement for both the right and left rami in roughly two-thirds (66.8%) of cases when there was no excessive tooth loss (ETL); however, the inter-observer agreement improved to 82.1% for those rami associated with ETL. Inter-observer agreement occurred in 72.9% of female rami and in only 64.4% of male rami. The results of this study indicated that assessment of posterior border curvature of mandibular rami through panoramic radiographs was not a reliable indicator of gender and was further plagued by unacceptably high levels of inter-observer disagreement. © 2016 by the American College of Prosthodontists.
Diagnosing Femoroacetabular Impingement From Plain Radiographs
Ayeni, Olufemi R.; Chan, Kevin; Whelan, Daniel B.; Gandhi, Rajiv; Williams, Dale; Harish, Srinivasan; Choudur, Hema; Chiavaras, Mary M.; Karlsson, Jon; Bhandari, Mohit
2014-01-01
Background: A diagnosis of femoroacetabular impingement (FAI) requires careful history and physical examination, as well as an accurate and reliable radiologic evaluation using plain radiographs as a screening modality. Radiographic markers in the diagnosis of FAI are numerous and not fully validated. In particular, reliability in their assessment across health care providers is unclear. Purpose: To determine inter- and intraobserver reliability between orthopaedic surgeons and musculoskeletal radiologists. Study Design: Cohort study (diagnosis); Level of evidence, 3. Methods: Six physicians (3 orthopaedic surgeons, 3 musculoskeletal radiologists) independently evaluated a broad spectrum of FAI pathologies across 51 hip radiographs on 2 occasions separated by at least 4 weeks. Reviewers used 8 common criteria to diagnose FAI, including (1) pistol-grip deformity, (2) size of alpha angle, (3) femoral head-neck offset, (4) posterior wall sign abnormality, (5) ischial spine sign abnormality, (6) coxa profunda abnormality, (7) crossover sign abnormality, and (8) acetabular protrusion. Agreement was calculated using the intraclass correlation coefficient (ICC). Results: When establishing an FAI diagnosis, there was poor interobserver reliability between the surgeons and radiologists (ICC batch 1 = 0.33; ICC batch 2 = 0.15). In contrast, there was higher interobserver reliability within each specialty, ranging from fair to good (surgeons: ICC batch 1 = 0.72; ICC batch 2 = 0.70 vs radiologists: ICC batch 1 = 0.59; ICC batch 2 = 0.74). Orthopaedic surgeons had the highest interobserver reliability when identifying pistol-grip deformities (ICC = 0.81) or abnormal alpha angles (ICC = 0.81). Similarly, radiologists had the highest agreement for detecting pistol-grip deformities (ICC = 0.75). Conclusion: These results suggest that surgeons and radiologists agree among themselves, but there is a need to improve the reliability of radiographic interpretations for FAI between the 2 specialties. The observed degree of low reliability may ultimately lead to missed, delayed, or inappropriate treatments for patients with symptomatic FAI. PMID:26535344
Grados, F; Roux, C; de Vernejoul, M C; Utard, G; Sebert, J L; Fardellone, P
2001-01-01
The assessment of vertebral fracture in patients with osteoporosis by conventional radiography has been improved over the past 10 years using either the semiquantitative (SQ) method devised by Genant et al. or quantitative morphometry. However, there is still no internationally agreed definition for vertebral fracture and there have been few comparative studies between these different approaches. Our study assessed the reproducibility of the SQ method and of four commonly used morphometric algorithms (Melton's, Eastell's, Minne's and McCloskey's methods) for assessing prevalent vertebral fractures, and examined the agreement of each morphometric algorithm with a SQ consensus reading performed by three experts. With this consensus reading in place of a gold standard, we determined relative measures of sensitivity, specificity and optimal cutoff threshold for each morphometric algorithm. The study was conducted in 39 postmenopausal women who had at least one osteoporotic vertebral fracture. Normal values were derived from 84 healthy postmenopausal women with apparently normal vertebral bodies. Our results indicate that the concordance of SQ method was excellent (intraobserver agreement on serial radiographs = 96.4%, kappa = 0.91; agreement between individual readings and the consensus reading = 98%, kappa = 0.95). Three morphometric approaches demonstrated good intra- and interobserver concordance (Melton: intraobserver agreement on serial radiographs = 92.7%, kappa = 0.82, interobserver agreement = 91.1%, kappa = 0.79; Eastell: intraobserver agreement on serial radiographs = 87.6%, kappa = 0.66, interobserver agreement = 88.6%, kappa = 0.68; McCloskey: intraobserver agreement on serial radiographs = 91.5%, kappa = 0.72, interobserver agreement = 93.9%, kappa = 0.78). Except for McCloskey's method, the optimal cutoff thresholds defined in our study by highest kappa score or Youden index in comparison with the SQ consensus reading were near the cutoff thresholds that were arbitrarily fixed. The four morphometric algorithms provided a good agreement with the results of the SQ consensus reading, but the more complex algorithm did not provide better results and even if we adjusted the cutoff threshold, no morphometric algorithm agreed perfectly with the SQ consensus reading. We conclude that morphometric approaches currently used should not be employed alone to detect prevalent vertebral fractures in studies on osteoporosis, but should rather be used in combination with a visual assessment. The SQ approach that allows differential diagnosis of vertebral deformities and has demonstrated a better reproducibility can be employed alone when it is performed by experienced and well-trained readers.
Bonin, Glen; Lauer, Susanne K; Guzman, David Sanchez-Migallon; Nevarez, Javier; Tully, Thomas N; Hosgood, Giselle; Gaschen, Lorrie
2009-06-01
Information on perching-joint angles in birds is limited. Joint immobilization in a physiologic perching angle has the potential to result more often in complete restoration of limb function. We evaluated perching-joint angles in 10 healthy cockatiels (Nymphicus hollandicus), 10 Hispaniolan Amazons (Amazona ventralis), and 9 barred owls (Strix varia) and determined intra- and interobserver variability for goniometric measurements in 2 different radiographic projections. Intra- and interobserver variation was less than 7% for all stifle and intertarsal joint measurements but frequently exceeded 10% for the hip-joint measurements. Hip, stifle, and intertarsal perching angles differed significantly among cockatiels, Hispaniolan Amazon parrots, and barred owls. The accuracy of measurements performed on straight lateral radiographic projections with superimposed limbs was not consistently superior to measurements on oblique projections with a slightly rotated pelvis. Stifle and intertarsal joint angles can be measured on radiographs by different observers with acceptable variability, but intra- and interobserver variability for hip-joint-angle measurements is higher.
Wiland, Homer O; Procop, Gary W; Goldblum, John R; Tuohy, Marion; Rybicki, Lisa; Patil, Deepa T
2013-06-01
Polymerase chain reaction (PCR)-based assays using stool samples are currently the most effective method of detecting Clostridium difficile. This study examines the feasibility of this assay using mucosal biopsy samples and evaluates the interobserver reproducibility in diagnosing and distinguishing ischemic colitis from C difficile colitis. Thirty-eight biopsy specimens were reviewed and classified by 3 observers into C difficile and ischemic colitis. The findings were correlated with clinical data. PCR was performed on 34 cases using BD GeneOhm C difficile assay. The histologic interobserver agreement was excellent (κ= 0.86) and the agreement between histologic and clinical diagnosis was good (κ = 0.84). All 19 ischemic colitis cases tested negative (100% specificity) and 3 of 15 cases of C difficile colitis tested positive (20% sensitivity). C difficile colitis can be reliably distinguished from ischemic colitis using histologic criteria. The C difficile PCR test on endoscopic biopsy specimens has excellent specificity but limited sensitivity.
Feasibility of four-dimensional preoperative simulation for elbow debridement arthroplasty.
Yamamoto, Michiro; Murakami, Yukimi; Iwatsuki, Katsuyuki; Kurimoto, Shigeru; Hirata, Hitoshi
2016-04-02
Recent advances in imaging modalities have enabled three-dimensional preoperative simulation. A four-dimensional preoperative simulation system would be useful for debridement arthroplasty of primary degenerative elbow osteoarthritis because it would be able to detect the impingement lesions. We developed a four-dimensional simulation system by adding the anatomical axis to the three-dimensional computed tomography scan data of the affected arm in one position. Eleven patients with primary degenerative elbow osteoarthritis were included. A "two rings" method was used to calculate the flexion-extension axis of the elbow by converting the surface of the trochlea and capitellum into two rings. A four-dimensional simulation movie was created and showed the optimal range of motion and the impingement area requiring excision. To evaluate the reliability of the flexion-extension axis, interobserver and intraobserver reliabilities regarding the assessment of bony overlap volumes were calculated twice for each patient by two authors. Patients were treated by open or arthroscopic debridement arthroplasties. Pre- and postoperative examinations included elbow range of motion measurement, and completion of the patient-rated questionnaire Hand20, Japanese Orthopaedic Association-Japan Elbow Society Elbow Function Score, and the Mayo Elbow Performance Score. Measurement of the bony overlap volume showed an intraobserver intraclass correlation coefficient of 0.93 and 0.90, and an interobserver intraclass correlation coefficient of 0.94. The mean elbow flexion-extension arc significantly improved from 101° to 125°. The mean Hand20 score significantly improved from 52 to 22. The mean Japanese Orthopaedic Association-Japan Elbow Society Elbow Function Score significantly improved from 67 to 88. The mean Mayo Elbow Performance Score significantly improved from 71 to 91 at the final follow-up evaluation. We showed that four-dimensional, preoperative simulation can be generated by adding the rotation axis to the one-position, three-dimensional computed tomography image of the affected arm. This method is feasible for elbow debridement arthroplasty.
Reliability and concurrent validity of the Infant Motor Profile.
Heineman, Kirsten R; Middelburg, Karin J; Bos, Arend F; Eidhof, Lieke; La Bastide-Van Gemert, Sacha; Van Den Heuvel, Edwin R; Hadders-Algra, Mijna
2013-06-01
The Infant Motor Profile (IMP) is a qualitative assessment of motor behaviour in infancy. It consists of five domains: movement variation, variability, fluency, symmetry, and performance. The aim of this study was to assess interobserver reliability and concurrent validity of the IMP with the Alberta Infant Motor Scale (AIMS) and an age-specific neurological examination. Fifty-nine preterm infants (25 females, 34 males; median gestational age 29.7wks, median birthweight 1285g) and 146 term infants (74 females, 72 males; median gestational age 40.1wks, birthweight 3500g) were included. Assessments were performed at corrected ages of 4, 6, 10, 12, and 18 months and consisted of the IMP, AIMS, and an age-specific neurological examination. Interobserver reliability was investigated on a sample of 25 video recordings. Non-parametric statistics were used to analyse the data. Interobserver reliability was high (intraclass correlation coefficient 0.95). At all ages, AIMS scores correlated weakly to fairly with total IMP scores (Spearman's ρ 0.36-0.55), but moderately to strongly with scores on the performance domain of the IMP (Spearman's ρ 0.47-0.84). A clear relation was found between total IMP score and outcome of the neurological examination (Kruskal-Wallis p<0.001 at all ages). Interobserver reliability of the IMP is good. Concurrent validity with the AIMS is best for the IMP performance domain. Concurrent validity with age-specific neurological examination is very good. © The Authors. Developmental Medicine & Child Neurology © 2013 Mac Keith Press.
Reliability of classification for post-traumatic ankle osteoarthritis.
Claessen, Femke M A P; Meijer, Diederik T; van den Bekerom, Michel P J; Gevers Deynoot, Barend D J; Mallee, Wouter H; Doornberg, Job N; van Dijk, C Niek
2016-04-01
The purpose of this study was to identify the most reliable classification system for clinical outcome studies to categorize post-traumatic-fracture-osteoarthritis. A total of 118 orthopaedic surgeons and residents-gathered in the Ankle Platform Study Collaborative Science of Variation Group-evaluated 128 anteroposterior and lateral radiographs of patients after a bi- or trimalleolar ankle fracture on a Web-based platform in order to rate post-traumatic osteoarthritis according to the classification systems coined by (1) van Dijk, (2) Kellgren, and (3) Takakura. Reliability was evaluated with the use of the Siegel and Castellan's multirater kappa measure. Differences between classification systems were compared using the two-sample Z-test. Interobserver agreement of surgeons who participated in the survey was fair for the van Dijk osteoarthritis scale (k = 0.24), and poor for the Takakura (k = 0.19) and the Kellgren systems (k = 0.18) according to the categorical rating of Landis and Koch. This difference in one categorical rating was found to be significant (p < 0.001, CI 0.046-0.053) with the high numbers of observers and cases available. This study documents fair interobserver agreement for the van Dijk osteoarthritis scale, and poor interobserver agreement for the Takakura and Kellgren osteoarthritis classification systems. Because of the low interobserver agreement for the van Dijk, Kellgren, and Takakura classification systems, those systems cannot be used for clinical decision-making. Development of diagnostic criteria on basis of consecutive patients, Level II.
Segmentation precision of abdominal anatomy for MRI-based radiotherapy
Noel, Camille E.; Zhu, Fan; Lee, Andrew Y.; Yanle, Hu; Parikh, Parag J.
2014-01-01
The limited soft tissue visualization provided by computed tomography, the standard imaging modality for radiotherapy treatment planning and daily localization, has motivated studies on the use of magnetic resonance imaging (MRI) for better characterization of treatment sites, such as the prostate and head and neck. However, no studies have been conducted on MRI-based segmentation for the abdomen, a site that could greatly benefit from enhanced soft tissue targeting. We investigated the interobserver and intraobserver precision in segmentation of abdominal organs on MR images for treatment planning and localization. Manual segmentation of 8 abdominal organs was performed by 3 independent observers on MR images acquired from 14 healthy subjects. Observers repeated segmentation 4 separate times for each image set. Interobserver and intraobserver contouring precision was assessed by computing 3-dimensional overlap (Dice coefficient [DC]) and distance to agreement (Hausdorff distance [HD]) of segmented organs. The mean and standard deviation of intraobserver and interobserver DC and HD values were DCintraobserver = 0.89 ± 0.12, HDintraobserver = 3.6 mm ± 1.5, DCinterobserver = 0.89 ± 0.15, and HDinterobserver = 3.2 mm ± 1.4. Overall, metrics indicated good interobserver/intraobserver precision (mean DC > 0.7, mean HD < 4 mm). Results suggest that MRI offers good segmentation precision for abdominal sites. These findings support the utility of MRI for abdominal planning and localization, as emerging MRI technologies, techniques, and onboard imaging devices are beginning to enable MRI-based radiotherapy. PMID:24726701
Fiorella, David; Arthur, Adam; Byrne, James; Pierot, Laurent; Molyneux, Andy; Duckwiler, Gary; McCarthy, Thomas; Strother, Charles
2015-08-01
The WEB (WEB aneurysm embolization system, Sequent Medical, Aliso Viejo, California, USA) is a self-expanding, nitinol, mesh device designed to achieve aneurysm occlusion after endosaccular deployment. The WEB Occlusion Scale (WOS) is a standardized angiographic assessment scale for reporting aneurysm occlusion achieved with intrasaccular mesh implants. This study was performed to assess the interobserver variability of the WOS. Seven experienced neurovascular specialists were trained to apply the WOS. These physicians independently reviewed angiographic image sets from 30 patients treated with the WEB under blinded conditions. No additional clinical information was provided. Raters graded each image according to the WOS (complete occlusion, residual neck or residual aneurysm). Final statistics were calculated using the dichotomous outcomes of complete occlusion or incomplete occlusion. The interobserver agreement was measured by the generalized κ statistic. In this series of 30 test case aneurysms, observers rated 12-17 as completely occluded, 3-9 as nearly completely occluded, and 9-11 as demonstrating residual aneurysm filling. Agreement was perfect across all seven observers for the presence or absence of complete occlusion in 22 of 30 cases. Overall, interobserver agreement was substantial (κ statistic 0.779 with a 95% CI of 0.700 to 0.857). The WOS allows a consistent means of reporting angiographic occlusion for aneurysms treated with the WEB device. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Yang, Ping-Liang; Wong, David T; Dai, Shuang-Bo; Song, Hai-Bo; Ye, Ling; Liu, Jin; Liu, Bin
2009-05-01
There is no reliable method to monitor renal blood flow intraoperatively. In this study, we evaluated the feasibility and reproducibility of left renal blood flow measurements using transesophageal echocardiography during cardiac surgery. In this prospective noninterventional study, left renal blood flow was measured with transesophageal echocardiography during three time points (pre-, intra-, and postcardiopulmonary bypass) in 60 patients undergoing cardiac surgery. Sonograms from 6 subjects were interpreted by 2 blinded independent assessors at the time of acquisition and 6 mo later. Interobserver and intraobserver reproducibility were quantified by calculating variability and intraclass correlation coefficients. Patients with Doppler angles of >30 degrees (20 of 60 subjects) were eliminated from renal blood flow measurements. Left renal blood flow was successfully measured and analyzed in 36 of 60 (60%) subjects. Both interobserver and intraobserver variability were <10%. Interobserver and intraobserver reproducibility in left renal blood flow measurements were good to excellent (intraclass correlation coefficients 0.604-0.999). Left renal arterial luminal diameter for the pre, intra, and postcardiopulmonary bypass phases, ranged from 3.8 to 4.1 mm, renal arterial velocity from 25 to 35 cm/s, and left renal blood flow from 192 to 299 mL/min. In patients undergoing cardiac surgery, it was feasible in 60% of the subjects to measure left renal blood flow using intraoperative transesophageal echocardiography. The interobserver and intraobserver reproducibility of renal blood flow measurements was good to excellent.
Variability in Cobb angle measurements using reformatted computerized tomography scans.
Adam, Clayton J; Izatt, Maree T; Harvey, Jason R; Askin, Geoffrey N
2005-07-15
Survey of intraobserver and interobserver measurement variability. To assess the use of reformatted computerized tomography (CT) images for manual measurement of coronal Cobb angles in idiopathic scoliosis. Cobb angle measurements in idiopathic scoliosis are traditionally made from standing radiographs, whereas CT is often used for assessment of vertebral rotation. Correlating Cobb angles from standing radiographs with vertebral rotations from supine CT is problematic because the geometry of the spine changes significantly from standing to supine positions, and 2 different imaging methods are involved. We assessed the use of reformatted thoracolumbar CT images for Cobb angle measurement. Preoperative CT of 12 patients with idiopathic scoliosis were used to generate reformatted coronal images. Five observers measured coronal Cobb angles on 3 occasions from each of the images. Intraobserver and interobserver variability associated with Cobb measurement from reformatted CT scans was assessed and compared with previous studies of measurement variability using plain radiographs. For major curves, 95% confidence intervals for intraobserver and interobserver variability were +/-6.6 degrees and +/-7.7 degrees, respectively. For minor curves, the intervals were +/-7.5 degrees and +/-8.2 degrees, respectively. Intraobserver and interobserver technical error of measurement was 2.4 degrees and 2.7 degrees, with reliability coefficients of 88% and 84%, respectively. There was no correlation between measurement variability and curve severity. Reformatted CT images may be used for manual measurement of coronal Cobb angles in idiopathic scoliosis with similar variability to manual measurement of plain radiographs.
Yip, Eugene; Yun, Jihyun; Gabos, Zsolt; Baker, Sarah; Yee, Don; Wachowicz, Keith; Rathee, Satyapal; Fallone, B Gino
2018-01-01
Real-time tracking of lung tumors using magnetic resonance imaging (MRI) has been proposed as a potential strategy to mitigate the ill-effects of breathing motion in radiation therapy. Several autocontouring methods have been evaluated against a "gold standard" of a single human expert user. However, contours drawn by experts have inherent intra- and interobserver variations. In this study, we aim to evaluate our user-trained autocontouring algorithm with manually drawn contours from multiple expert users, and to contextualize the accuracy of these autocontours within intra- and interobserver variations. Six nonsmall cell lung cancer patients were recruited, with institutional ethics approval. Patients were imaged with a clinical 3 T Philips MR scanner using a dynamic 2D balanced SSFP sequence under free breathing. Three radiation oncology experts, each in two separate sessions, contoured 130 dynamic images for each patient. For autocontouring, the first 30 images were used for algorithm training, and the remaining 100 images were autocontoured and evaluated. Autocontours were compared against manual contours in terms of Dice's coefficient (DC) and Hausdorff distances (d H ). Intra- and interobserver variations of the manual contours were also evaluated. When compared with the manual contours of the expert user who trained it, the algorithm generates autocontours whose evaluation metrics (same session: DC = 0.90(0.03), d H = 3.8(1.6) mm; different session DC = 0.88(0.04), d H = 4.3(1.5) mm) are similar to or better than intraobserver variations (DC = 0.88(0.04), and d H = 4.3(1.7) mm) between two sessions. The algorithm's autocontours are also compared to the manual contours from different expert users with evaluation metrics (DC = 0.87(0.04), d H = 4.8(1.7) mm) similar to interobserver variations (DC = 0.87(0.04), d H = 4.7(1.6) mm). Our autocontouring algorithm delineates tumor contours (<20 ms per contour), in dynamic MRI of lung, that are comparable to multiple human experts (several seconds per contour), but at a much faster speed. At the same time, the agreement between autocontours and manual contours is comparable to the intra- and interobserver variations. This algorithm may be a key component of the real time tumor tracking workflow for our hybrid Linac-MR device in the future. © 2017 American Association of Physicists in Medicine.
New methodology to reconstruct in 2-D the cuspal enamel of modern human lower molars.
Modesto-Mata, Mario; García-Campos, Cecilia; Martín-Francés, Laura; Martínez de Pinillos, Marina; García-González, Rebeca; Quintino, Yuliet; Canals, Antoni; Lozano, Marina; Dean, M Christopher; Martinón-Torres, María; Bermúdez de Castro, José María
2017-08-01
In the last years different methodologies have been developed to reconstruct worn teeth. In this article, we propose a new 2-D methodology to reconstruct the worn enamel of lower molars. Our main goals are to reconstruct molars with a high level of accuracy when measuring relevant histological variables and to validate the methodology calculating the errors associated with the measurements. This methodology is based on polynomial regression equations, and has been validated using two different dental variables: cuspal enamel thickness and crown height of the protoconid. In order to perform the validation process, simulated worn modern human molars were employed. The associated errors of the measurements were also estimated applying methodologies previously proposed by other authors. The mean percentage error estimated in reconstructed molars for these two variables in comparison with their own real values is -2.17% for the cuspal enamel thickness of the protoconid and -3.18% for the crown height of the protoconid. This error significantly improves the results of other methodologies, both in the interobserver error and in the accuracy of the measurements. The new methodology based on polynomial regressions can be confidently applied to the reconstruction of cuspal enamel of lower molars, as it improves the accuracy of the measurements and reduces the interobserver error. The present study shows that it is important to validate all methodologies in order to know the associated errors. This new methodology can be easily exportable to other modern human populations, the human fossil record and forensic sciences. © 2017 Wiley Periodicals, Inc.
Genders, Stijn W; Mourits, Daphne L; Jasem, Mohammad; Kloos, Roel J H M; Saeed, Peerooz; Mourits, Maarten Ph
2015-02-01
To present the first parallax-free exophthalmometer design. Exophthalmometry is an important clinical tool. We provide a historic overview of clinical exophthalmometer designs, and we review current problems encountered in exophthalmometry. We present a new and parallax-free exophthalmometer design that we have evaluated in 49 patients visiting our orbital clinic. The mean age of the patients was 49.8 years and 72% were female. The Pearson interobserver variation was 0.97, and 94% of the Hertel values measured by the two observers were within the limits (1.6 mm) of agreement. This meter appears to be a reliable instrument for exophthalmometry. It is the first instrument that allows for a complete parallax-free measurement.
iPhone in the Management of the Berlin Heart EXCOR Ventricular Assist Device.
Badheka, Aditya; Allareddy, Veerajalandhar
Berlin Heart Inc. EXCOR is an extracorporeal pneumatically pulsatile ventricular assist device approved for use in pediatric age group since 2011 in the United States. It is a well-established life-saving therapy for the bridge to heart transplant or to provide circulatory support in a transplanted patient. The most commonly reported problem was "membrane defect" in a postmarketing major device reporting. In general, the filling and emptying of the pump can be easily visualized, but the interobserver variability exist. In this first novel report, we used the iPhone slow motion video to quantify and compare the differences in filling and emptying that positively impacted the management of the Berlin Heart. This is an initial exploratory concept that will need further studies to validate this bedside tool.
Razavi, Asma; Newth, Christopher J L; Khemani, Robinder G; Beltramo, Fernando; Ross, Patrick A
2017-06-01
To evaluate physician assessment of cardiac output and systemic vascular resistance in patients with shock compared with an ultrasonic cardiac output monitor (USCOM). To explore potential changes in therapy decisions if USCOM data were available using physician intervention answers. Double-blinded, prospective, observational study in a tertiary hospital pediatric intensive care unit. Forty children (<18years) admitted with shock, requiring ongoing volume resuscitation or inotropic support. Two to 3 physicians clinically assessed cardiac output and systemic vascular resistance, categorizing them as high, normal, or low. An investigator simultaneously measured cardiac index (CI) and systemic vascular resistance index (SVRI) with USCOM categorized as high, normal, or low. Overall agreement between physician and USCOM for CI (48.5% [κ = 0.18]) and SVRI (45.9% [κ = 0.16]) was poor. Interobserver agreement was also poor for CI (58.7% [κ = 0.33]) and SVRI (52.3% [κ = 0.28]). Comparing theoretical physician interventions to "acceptable" or "unacceptable" clinical interventions, based on USCOM measurement, 56 (21%) physician interventions were found to be "unacceptable." There is poor agreement between physician-assessed CI and SVRI and USCOM, with significant interobserver variability among physicians. Objective measurement of CI and SVRI may reduce variability and improve diagnostic accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
Non-invasive diagnosis of liver fibrosis in chronic hepatitis C
Schiavon, Leonardo de Lucca; Narciso-Schiavon, Janaína Luz; de Carvalho-Filho, Roberto José
2014-01-01
Assessment of liver fibrosis in chronic hepatitis C virus (HCV) infection is considered a relevant part of patient care and key for decision making. Although liver biopsy has been considered the gold standard for staging liver fibrosis, it is an invasive technique and subject to sampling errors and significant intra- and inter-observer variability. Over the last decade, several noninvasive markers were proposed for liver fibrosis diagnosis in chronic HCV infection, with variable performance. Besides the clear advantage of being noninvasive, a more objective interpretation of test results may overcome the mentioned intra- and inter-observer variability of liver biopsy. In addition, these tests can theoretically offer a more accurate view of fibrogenic events occurring in the entire liver with the advantage of providing frequent fibrosis evaluation without additional risk. However, in general, these tests show low accuracy in discriminating between intermediate stages of fibrosis and may be influenced by several hepatic and extra-hepatic conditions. These methods are either serum markers (usually combined in a mathematical model) or imaging modalities that can be used separately or combined in algorithms to improve accuracy. In this review we will discuss the different noninvasive methods that are currently available for the evaluation of liver fibrosis in chronic hepatitis C, their advantages, limitations and application in clinical practice. PMID:24659877
de Jong, Rianne; Lutkenhaus, Lotte; van Wieringen, Niek; Visser, Jorrit; Wiersma, Jan; Crama, Koen; Geijsen, Debby; Bel, Arjan
2016-08-01
In radiotherapy for rectum cancer, the target volume is highly deformable. An adaptive plan selection strategy can mitigate the effect of these variations. The purpose of this study was to evaluate the feasibility of an adaptive strategy by assessing the interobserver variation in CBCT-based plan selection. Eleven patients with rectum cancer, treated with a non-adaptive strategy, were selected. Five CBCT scans were available per patient. To simulate the plan selection strategy, per patient three PTVs were created by varying the anterior upper mesorectum margin. For each CBCT scan, twenty observers selected the smallest PTV that encompassed the target volume. After this initial baseline measurement, the gold standard was determined during a consensus meeting, followed by a second measurement one month later. Differences between both measurements were assessed using the Wilcoxon signed-rank test. In the baseline measurement, the concordance with the gold standard was 69% (range: 60-82%), which improved to 75% (range: 60-87%) in the second measurement (p=0.01). For the second measurement, 10% of plan selections were smaller than the gold standard. With a plan selection consistency between observers of 75%, a plan selection strategy for rectum cancer patients is feasible. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Quantifying facial paralysis using the Kinect v2.
Gaber, Amira; Taher, Mona F; Wahed, Manal Abdel
2015-01-01
Assessment of facial paralysis (FP) and quantitative grading of facial asymmetry are essential in order to quantify the extent of the condition as well as to follow its improvement or progression. As such, there is a need for an accurate quantitative grading system that is easy to use, inexpensive and has minimal inter-observer variability. A comprehensive automated system to quantify and grade FP is the main objective of this work. An initial prototype has been presented by the authors. The present research aims to enhance the accuracy and robustness of one of this system's modules: the resting symmetry module. This is achieved by including several modifications to the computation method of the symmetry index (SI) for the eyebrows, eyes and mouth. These modifications are the gamma correction technique, the area of the eyes, and the slope of the mouth. The system was tested on normal subjects and showed promising results. The mean SI of the eyebrows decreased slightly from 98.42% to 98.04% using the modified method while the mean SI for the eyes and mouth increased from 96.93% to 99.63% and from 95.6% to 98.11% respectively while using the modified method. The system is easy to use, inexpensive, automated and fast, has no inter-observer variability and is thus well suited for clinical use.
NASA Astrophysics Data System (ADS)
Ramakrishna, Bharath; Saiprasad, Ganesh; Safdar, Nabile; Siddiqui, Khan; Chang, Chein-I.; Siegel, Eliot
2008-03-01
Osteoarthritis (OA) is the most common form of arthritis and a major cause of morbidity affecting millions of adults in the US and world wide. In the knee, OA begins with the degeneration of joint articular cartilage, eventually resulting in the femur and tibia coming in contact, and leading to severe pain and stiffness. There has been extensive research examining 3D MR imaging sequences and automatic/semi-automatic techniques for 2D/3D articular cartilage extraction. However, in routine clinical practice the most popular technique still remain radiographic examination and qualitative assessment of the joint space. This may be in large part because of a lack of tools that can provide clinically relevant diagnosis in adjunct (in near real time fashion) with the radiologist and which can serve the needs of the radiologists and reduce inter-observer variation. Our work aims to fill this void by developing a CAD application that can generate clinically relevant diagnosis of the articular cartilage damage in near real time fashion. The algorithm features a 2D Active Shape Model (ASM) for modeling the bone-cartilage interface on all the slices of a Double Echo Steady State (DESS) MR sequence, followed by measurement of the cartilage thickness from the surface of the bone, and finally by the identification of regions of abnormal thinness and focal/degenerative lesions. A preliminary evaluation of CAD tool was carried out on 10 cases taken from the Osteoarthritis Initiative (OAI) database. When compared with 2 board-certified musculoskeletal radiologists, the automatic CAD application was able to get segmentation/thickness maps in little over 60 seconds for all of the cases. This observation poses interesting possibilities for increasing radiologist productivity and confidence, improving patient outcomes, and applying more sophisticated CAD algorithms to routine orthopedic imaging tasks.
Lee, Byung Hoon; Choi, Kyung-Hwa; Seo, Dong Yeon; Choi, Sang Min; Kim, Gab Lae
2016-04-01
To incorporate a diagnostic technique for measuring subtalar motion, namely "talar rotation", into the manual supination-anterior drawer stress radiographs for evaluation of the severity of rotational instability, and to determine its clinical relevance. Sixty-six patients with combined injuries of the anterior talofibular (ATFL) and calcaneofibular ligament (CFL) underwent three bilateral manual stress radiographs, and mean increments of anterior talar translation (mm), talar tilt (°), and talar rotation (%) in the injured ankle compared to the normal opposite side were measured with the technique. Intraobserver and interobserver reliability of each measure was assessed, and the difference in the degree of increments was compared according to the presence of additional cervical ligament insufficiency. Ankle stress radiographic intraobserver and interobserver agreement was ICC = 0.91 and 0.82 for talar rotation (%), ICC = 0.64 and 0.51 for anterior talar translation, and ICC = 0.78 and 0.71 for talar tilt angle, respectively. In group 2 including patients with combined injuries of the ATFL and CFL along with additional cervical ligament insufficiency, a significantly higher increment of talar rotation, mean 6.4% (SD 3.4%), was observed compared to that of talar rotation, mean 4.1% (SD 2.7 ), in the other group (group 1) with an intact cervical ligament (p < 0.001). A new comprehensive stress radiographic technique for diagnosis of chronic lateral ankle instability presented in this study might be a reliable and representable measurement tool to assess additional injury or instability of the subtalar joint. Prospective cohort study, Level II.
2013-01-01
Background Transplant recipients are expected to adhere to a lifelong immunosuppressant therapeutic regimen. However, nonadherence to treatment is an underestimated problem for which no properly validated measurement tool is available for Portuguese-speaking patients. We aimed to initially validate the Basel Assessment of Adherence to Immunosuppressive Medications Scale (BAASIS®) to accurately estimate immunosuppressant nonadherence in Brazilian transplant patients. Methods The BAASIS® (English version) was transculturally adapted and its psychometric properties were assessed. The transcultural adaptation was performed using the Guillemin protocol. Psychometric testing included reliability (intraobserver and interobserver reproducibility, agreement, Kappa coefficient, and the Cronbach’s alpha) and validity (content, criterion, and construct validities). Results The final version of the transculturally adapted BAASIS® was pretested, and no difficulties in understanding its content were found. The intraobserver and interobserver reproducibility variances (0.007 and 0.003, respectively), the Cronbach’s alpha (0.7), Kappa coefficient (0.88) and the agreement (95.2%) suggest accuracy, preciseness and reliability. For construct validity, exploratory factorial analysis demonstrated unidimensionality of the first three questions (r = 0.76, r = 0.80, and r = 0.68). For criterion validity, the adapted BAASIS® was correlated with another self-report instrument, the Measure of Adherence to Treatment, and showed good congruence (r = 0.65). Conclusions The BAASIS® has adequate psychometric properties and may be employed in advance to measure adherence to posttransplant immunosuppressant treatments. This instrument will be the first one validated to use in this specific transplant population and in the Portuguese language. PMID:23692889
Evaluation of a clinical dehydration scale in children requiring intravenous rehydration.
Kinlin, Laura M; Freedman, Stephen B
2012-05-01
To evaluate the reliability and validity of a previously derived clinical dehydration scale (CDS) in a cohort of children with gastroenteritis and evidence of dehydration. Participants were 226 children older than 3 months who presented to a tertiary care emergency department and required intravenous rehydration. Reliability was assessed at treatment initiation, by comparing the scores assigned independently by a trained research nurse and a physician. Validity was assessed by using parameters reflective of disease severity: weight gain, baseline laboratory results, willingness of the physician to discharge the patient, hospitalization, and length of stay. Interobserver reliability was moderate, with a weighted κ of 0.52 (95% confidence interval [CI] 0.41, 0.63). There was no correlation between CDS score and percent weight gain, a proxy measure of fluid deficit (Spearman correlation coefficient = -0.03; 95% CI -0.18, 0.12). There were, however, modest and statistically significant correlations between CDS score and several other parameters, including serum bicarbonate (Pearson correlation coefficient = -0.35; 95% CI -0.46, -0.22) and length of stay (Pearson correlation coefficient = 0.24; 95% CI 0.11, 0.36). The scale's discriminative ability was assessed for the outcome of hospitalization, yielding an area under the receiver operating characteristic curve of 0.65 (95% CI 0.57, 0.73). In children administered intravenous rehydration, the CDS was characterized by moderate interobserver reliability and weak associations with objective measures of disease severity. These data do not support its use as a tool to dictate the need for intravenous rehydration or to predict clinical course.
Terslev, Lene; Gutierrez, Marwin; Schmidt, Wolfgang A; Keen, Helen I; Filippucci, Emilio; Kane, David; Thiele, Ralf; Kaeley, Gurjit; Balint, Peter; Mandl, Peter; Delle Sedie, Andrea; Hammer, Hilde Berner; Christensen, Robin; Möller, Ingrid; Pineda, Carlos; Kissin, Eugene; Bruyn, George A; Iagnocco, Annamaria; Naredo, Esperanza; D'Agostino, Maria Antonietta
2015-11-01
To summarize the work performed by the Outcome Measures in Rheumatology (OMERACT) Ultrasound (US) Working Group on the validation of US as a potential outcome measure in gout. Based on the lack of definitions, highlighted in a recent literature review on US as an outcome tool in gout, a series of iterative exercises were carried out to obtain consensus-based definitions on US elementary components in gout using a Delphi exercise and subsequently testing these definitions in static images and in patients with proven gout. Cohen's κ was used to test agreement, and values of 0-0.20 were considered poor, 0.20-0.40 fair, 0.40-0.60 moderate, 0.60-0.80 good, and 0.80-1 excellent. With an agreement of > 80%, consensus-based definitions were obtained for the 4 elementary lesions highlighted in the literature review: tophi, aggregates, erosions, and double contour (DC). In static images interobserver reliability ranged from moderate to almost perfect, and similar results were found for the intrareader reliability. In patients the intraobserver agreement was good for all lesions except DC (moderate). The interobserver agreement was poor for aggregates and DC but moderate for the other components. These first steps in evaluating the validity of US as an outcome measure for gout show that the reliability of the definitions ranged from moderate to excellent in static images and somewhat lower in patients, indicating that a standardized scanning technique may be needed, before testing the responsiveness of those definitions in a composite US score.
Forensic postmortem computed tomography: volumetric measurement of the heart and liver.
Jakobsen, Lykke Schrøder; Lundemose, Sissel; Banner, Jytte; Lynnerup, Niels; Jacobsen, Christina
2016-12-01
The purpose of this study was to investigate the utility of postmortem computed tomography (PMCT) images in estimating organ sizes and to examine the use of the cardiothoracic ratio (CTR). We included 45 individuals (19 females), who underwent a medico-legal autopsy. Using the computer software program Mimics ® , we determined in situ heart and liver volumes derived from linear measurements (width, height and depth) on a whole body PMCT-scan, and compared the volumes with ex vivo volumes derived by CT-scan of the eviscerated heart and liver. The ex vivo volumes were also compared with the organ weights. Further, we compared the CTR with the ex vivo heart volume and a heart weight-ratio (HWR). Intra- and inter-observer analyses were performed. We found no correlation between the in situ and ex vivo volumes of the heart and liver. However, a highly significant correlation was found between the ex vivo volumes and weights of the heart and liver. No correlations between CTR and the ex vivo heart volume nor with HWR was found. Concerning cardiomegaly, we found no agreement between the CTR and HWR. The intra- and inter-observer analyses showed no significant differences. Noninvasive in situ PMCT methods for organ measuring, as performed in this study, are not useful tools in forensic pathology. The best method to estimate organ volume is a CT-scan of the eviscerated organ. PMCT-determined CTR seems to be useless for ascertaining cardiomegaly, as it neither correlated with the ex vivo heart volume nor with the HWR.
Macchi, Claudio; Biricolti, Claudia; Cappelli, Lorenza; Galli, Francesca; Molino-Lova, Raffaele; Cecchi, Francesca; Corigliano, Alvaro; Miniati, Benedetta; Conti, Andrea A; Gulisano, Massimo; Catini, Claudio; Gensini, Gian Franco
2002-01-01
A key feature in physiotherapeutic treatment of patients with motion disturbances is the appropriate ranging of the trunk and pelvis motility. Eighty subjects randomly selected and free from known pathology of the muscular-skeletal and/or of the neurological system classed into four groups according to the age and the sex have been assessed, by using a new, simple and easy administrable tool. Our results demonstrate that the new measurement tool showed a very low intra- and inter-observer variability, that healthy subjects showed a more adduced and elevated right scapula if compared to the contralateral one and, as regard as the pelvic motion, a broader joint excursion in passive motion compared with active motion in the overall group, a broader joint excursion in young subjects compared with elderly ones, and a broader joint excursion in female subjects compared with males subjects. In conclusion our study allowed to identify a range of physiological asymmetry and pelvis motility. Such a range of physiological asymmetry might be useful as a reference for the physiotherapists.
[Development And Validation Of A Breastfeeding Knowledge And Skills Questionnaire].
Gómez Fernández-Vegue, M; Menéndez Orenga, M
2015-12-01
Pediatricians play a key role in the onset and duration of breastfeeding. Although it is known that they lack formal education on this subject, there are currently no validated tools available to assess pediatrician knowledge regarding breastfeeding. To develop and validate a Breastfeeding Knowledge and Skills Questionnaire for Pediatricians. Once the knowledge areas were defined, a representative sample of pediatricians was chosen to carry out the survey. After pilot testing, non-discriminating questions were removed. Content validity was assessed by 14 breastfeeding experts, who examined the test, yielding 22 scorable items (maximum score: 26 points). To approach criterion validity, it was hypothesized that a group of pediatricians with a special interest in breastfeeding (1) would obtain better results than pediatricians from a hospital without a maternity ward (2), and the latter would obtain a higher score than the medical residents of Pediatrics training in the same hospital (3). The questionnaire was also evaluated before and after a basic course in breastfeeding. Breastfeeding experts have an index of agreement of >.90 for each item. The 3 groups (n=82) were compared, finding significant differences between group (1) and the rest. Moreover, an improvement was observed in the participants who attended the breastfeeding course (n=31), especially among those with less initial knowledge. Regarding reliability, internal consistency (KR-20=.87), interobserver agreement, and temporal stability were examined, with satisfactory results. A practical and self-administered tool is presented to assess pediatrician knowledge regarding breastfeeding, with a documented validity and reliability. Copyright © 2014 Asociación Española de Pediatría. Published by Elsevier España, S.L.U. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadeghi, P; Smith, W; Tom Baker Cancer Centre, Calgary, AB
2015-06-15
Purpose This study quantifies errors associated with MR-guided High Dose Rate (HDR) gynecological brachytherapy. Uncertainties in this treatment results from contouring, organ motion between imaging and treatment delivery, dose calculation, and dose delivery. We focus on interobserver and inter-modality variability in contouring and the motion of organs at risk (OARs) in the time span between the MR and CT scans (∼1 hour). We report the change in organ volume and position of center of mass (CM) between the two imaging modalities. Methods A total of 8 patients treated with MR-guided HDR brachytherapy were included in this study. Two observers contouredmore » the bladder and rectum on both MR and CT scans. The change in OAR volume and CM position between the MR and CT imaging sessions on both image sets were calculated. Results The absolute mean bladder volume change between the two imaging modalities is 67.1cc. The absolute mean inter-observer difference in bladder volume is much lower at 15.5cc (MR) and 11.0cc (CT). This higher inter-modality volume difference suggests a real change in the bladder filling between the two imaging sessions. Change in Rectum volume inter-observer standard error of means (SEM) is 3.18cc (MR) and 3.09cc (CT), while the inter-modality SEM is 3.65cc (observer 1), and 2.75cc (observer 2). The SEM for rectum CM position in the superior-inferior direction was approximately three times higher than in other directions for both the inter—observer (0.77 cm, 0.92 cm for observers 1 and 2, respectively) and inter-modality (0.91 cm, 0.95 cm for MR and CT, respectively) variability. Conclusion Bladder contours display good consistency between different observers on both CT and MR images. For rectum contouring the highest inconsistency stems from the observers’ choice of the superior-inferior borders. A complete analysis of a larger patient cohort will enable us to separate the true organ motion from the inter-observer variability.« less
Trachsel, D S; Bitschnau, C; Waldern, N; Weishaupt, M A; Schwarzwald, C C
2010-11-01
Frequent supraventricular or ventricular arrhythmias during and after exercise are considered pathological in horses. Prevalence of arrhythmias seen in apparently healthy horses is still a matter of debate and may depend on breed, athletic condition and exercise intensity. To determine intra- and interobserver agreement for detection of arrhythmias at rest, during and after exercise using a telemetric electrocardiography device. The electrocardiogram (ECG) recordings of 10 healthy Warmblood horses (5 of which had an intracardiac catheter in place) undergoing a standardised treadmill exercise test were analysed at rest (R), during warm-up (W), during exercise (E), as well as during 0-5 min (PE(0-5)) and 6-45 min (PE(6-45)) recovery after exercise. The number and time of occurrence of physiological and pathological 'rhythm events' were recorded. Events were classified according to origin and mode of conduction. The agreement of 3 independent, blinded observers with different experience in ECG reading was estimated considering time of occurrence and classification of events. For correct timing and classification, intraobserver agreement for observer 1 was 97% (R), 100% (W), 20% (E), 82% (PE(0-5)) and 100% (PE(6-45)). Interobserver agreement between observer 1 vs. observer 2 and between observer 1 vs. 3, respectively, was 96 and 92.6% (R), 83 and 31% (W), 0 and 13% (E), 23 and 18% (PE(0-5)), and 67 and 55% (PE(6-45)). When including the events with correct timing but disagreement for classification, the intraobserver agreement increased to 94% during PE(0-5) and the interobserver agreement reached 83 and 50% (W), 20 and 50% (E), 41 and 47% (PE(0-5)), and 83.5 and 65% (PE(6-45)). The interobserver agreement increased with observer experience. Intra- and interobserver agreement for recognition and classification of events was good at R, but poor during E and poor-moderate during recovery periods. These results highlight the limitations of stress ECG in horses and the need for high-quality recordings and adequate observer training. © 2010 EVJ Ltd.
Ali, Sam; Byanyima, Rosemary Kusaba; Ononge, Sam; Ictho, Jerry; Nyamwiza, Jean; Loro, Emmanuel Lako Ernesto; Mukisa, John; Musewa, Angella; Nalutaaya, Annet; Ssenyonga, Ronald; Kawooya, Ismael; Temper, Benjamin; Katamba, Achilles; Kalyango, Joan; Karamagi, Charles
2018-05-04
Ultrasonography is essential in the prenatal diagnosis and care for the pregnant mothers. However, the measurements obtained often contain a small percentage of unavoidable error that may have serious clinical implications if substantial. We therefore evaluated the level of intra and inter-observer error in measuring mean sac diameter (MSD) and crown-rump length (CRL) in women between 6 and 10 weeks' gestation at Mulago hospital. This was a cross-sectional study conducted from January to March 2016. We enrolled 56 women with an intrauterine single viable embryo. The women were scanned using a transvaginal (TVS) technique by two observers who were blinded of each other's measurements. Each observer measured the CRL twice and the MSD once for each woman. Intra-class correlation coefficients (ICCs), 95% limits of agreement (LOA) and technical error of measurement (TEM) were used for analysis. Intra-observer ICCs for CRL measurements were 0.995 and 0.993 while inter-observer ICCs were 0.988 for CRL and 0.955 for MSD measurements. Intra-observer 95% LOA for CRL were ± 2.04 mm and ± 1.66 mm. Inter-observer LOA were ± 2.35 mm for CRL and ± 4.87 mm for MSD. The intra-observer relative TEM for CRL were 4.62% and 3.70% whereas inter-observer relative TEM were 5.88% and 5.93% for CRL and MSD respectively. Intra- and inter-observer error of CRL and MSD measurements among pregnant women at Mulago hospital were acceptable. This implies that at Mulago hospital, the error in pregnancy dating is within acceptable margins of ±3 days in first trimester, and the CRL and MSD cut offs of ≥7 mm and ≥ 25 mm respectively are fit for diagnosis of miscarriage on TVS. These findings should be extrapolated to the whole country with caution. Sonographers can achieve acceptable and comparable diagnostic accuracy levels of MSD and CLR measurements with proper training and adherence to practice guidelines.
A tool for computer-aided diagnosis of retinopathy of prematurity
NASA Astrophysics Data System (ADS)
Zhao, Zheen; Wallace, David K.; Freedman, Sharon F.; Aylward, Stephen R.
2008-03-01
In this paper we present improvements to a software application, named ROPtool, that aids in the timely and accurate detection and diagnosis of retinopathy of prematurity (ROP). ROP occurs in 68% of infants less than 1251 grams at birth, and it is a leading cause of blindness for prematurely born infants. The standard of care for its diagnosis is the subjective assessment of retinal vessel dilation and tortuosity. There is significant inter-observer variation in those assessments. ROPtool analyzes retinal images, extracts user-selected blood vessels from those images, and quantifies the tortuosity of those vessels. The presence of ROP is then gauged by comparing the tortuosity of an infant's retinal vessels with measures made from a clinical-standard image of severely tortuous retinal vessels. The presence of such tortuous retinal vessels is referred to as 'plus disease'. In this paper, a novel metric of tortuosity is proposed. From the ophthalmologist's point of view, the new metric is an improvement from our previously published algorithm, since it uses smooth curves instead of straight lines to simulate 'normal vessels'. Another advantage of the new ROPtool is that minimal user interactions are required. ROPtool utilizes a ridge traversal algorithm to extract retinal vessels. The algorithm reconstructs connectivity along a vessel automatically. This paper supports its claims by reporting ROC curves from a pilot study involving 20 retinal images. The areas under two ROC curves, from two experts in ROP, using the new metric to diagnose 'tortuosity sufficient for plus disease', varied from 0.86 to 0.91.
Lingam, Ravi Kumar; Mundada, Pravin; Lee, Vickie
2018-01-10
To examine the novel use of non-echo-planar diffusion weighted MRI (DWI) in depicting activity and treatment response in active Grave's orbitopathy (GO) by assessing, with inter-observer agreement, for a correlation between its apparent diffusion coefficients (ADCs) and conventional Short tau Inversion Recovery (STIR) MRI signal-intensity ratios (SIRs). A total of 23 actively inflamed muscles and 30 muscle response episodes were analysed in patients with active GO who underwent medical treatment. The MRI orbit scans included STIR sequences and non-echo-planar DWI were evaluated. Two observers independently assessed the images qualitatively for the presence of activity in the extraocular muscles (EOMs) and recorded the STIR signal-intensity (SI), SIR (SI ratio of EOM/temporalis muscle), and ADC values of any actively inflamed muscle on the pre-treatment scans and their corresponding values on the subsequent post-treatment scans. Inter-observer agreement was examined. There was a significant positive correlation (0.57, p < 0.001) between ADC and both SIR and STIR SI of the actively inflamed EOM. There was also a significant positive correlation (0.75, p < 0.001) between SIR and ADC values depicting change in muscle activity associated with treatment response. There was good inter-observer agreement. Our preliminary results indicate that quantitative evaluation with non-echo-planar DWI ADC values correlates well with conventional STIR SIR in detecting active GO and monitoring its treatment response, with good inter-observer agreement.
Singh, Ranjodh; Zhou, Zhiping; Tisnado, Jamie; Haque, Sofia; Peck, Kyung K; Young, Robert J; Tsiouris, Apostolos John; Thakur, Sunitha B; Souweidane, Mark M
2016-11-01
OBJECTIVE Accurately determining diffuse intrinsic pontine glioma (DIPG) tumor volume is clinically important. The aims of the current study were to 1) measure DIPG volumes using methods that require different degrees of subjective judgment; and 2) evaluate interobserver agreement of measurements made using these methods. METHODS Eight patients from a Phase I clinical trial testing convection-enhanced delivery (CED) of a therapeutic antibody were included in the study. Pre-CED, post-radiation therapy axial T2-weighted images were analyzed using 2 methods requiring high degrees of subjective judgment (picture archiving and communication system [PACS] polygon and Volume Viewer auto-contour methods) and 1 method requiring a low degree of subjective judgment (k-means clustering segmentation) to determine tumor volumes. Lin's concordance correlation coefficients (CCCs) were calculated to assess interobserver agreement. RESULTS The CCCs of measurements made by 2 observers with the PACS polygon and the Volume Viewer auto-contour methods were 0.9465 (lower 1-sided 95% confidence limit 0.8472) and 0.7514 (lower 1-sided 95% confidence limit 0.3143), respectively. Both were considered poor agreement. The CCC of measurements made using k-means clustering segmentation was 0.9938 (lower 1-sided 95% confidence limit 0.9772), which was considered substantial strength of agreement. CONCLUSIONS The poor interobserver agreement of PACS polygon and Volume Viewer auto-contour methods highlighted the difficulty in consistently measuring DIPG tumor volumes using methods requiring high degrees of subjective judgment. k-means clustering segmentation, which requires a low degree of subjective judgment, showed better interobserver agreement and produced tumor volumes with delineated borders.
Solomon, Nadia; Fields, Paul J.; Tamarozzi, Francesca; Brunetti, Enrico; Macpherson, Calum N. L.
2017-01-01
Cystic echinococcosis (CE), a parasitic zoonosis, results in cyst formation in the viscera. Cyst morphology depends on developmental stage. In 2003, the World Health Organization (WHO) published a standardized ultrasound (US) classification for CE, for use among experts as a standard of comparison. This study examined the reliability of this classification. Eleven international CE and US experts completed an assessment of eight WHO classification images and 88 test images representing cyst stages. Inter- and intraobserver reliability and observer performance were assessed using Fleiss' and Cohen's kappa. Interobserver reliability was moderate for WHO images (κ = 0.600, P < 0.0001) and substantial for test images (κ = 0.644, P < 0.0001), with substantial to almost perfect interobserver reliability for stages with pathognomonic signs (CE1, CE2, and CE3) for WHO (0.618 < κ < 0.904) and test images (0.642 < κ < 0.768). Comparisons of expert performances against the majority classification for each image were significant for WHO (0.413 < κ < 1.000, P < 0.005) and test images (0.718 < κ < 0.905, P < 0.0001); and intraobserver reliability was significant for WHO (0.520 < κ < 1.000, P < 0.005) and test images (0.690 < κ < 0.896, P < 0.0001). Findings demonstrate moderate to substantial interobserver and substantial to almost perfect intraobserver reliability for the WHO classification, with substantial to almost perfect interobserver reliability for pathognomonic stages. This confirms experts' abilities to reliably identify WHO-defined pathognomonic signs of CE, demonstrating that the WHO classification provides a reproducible way of staging CE. PMID:28070008
Ghobrial, Fady Emil Ibrahim; Eldin, Manal Salah; Razek, Ahmed Abdel Khalek Abdel; Atwan, Nadia Ibrahim; Shamaa, Sameh Sayed Ahmed
2017-01-01
To assess inter-observer agreement of revised RECIST criteria (version 1.1) for computed tomography assessment of hepatic metastases of breast cancer. A prospective study was conducted in 28 female patients with breast cancer and with at least one measurable metastatic lesion in the liver that was treated with 3 cycles of anthracycline-based chemotherapy. All patients underwent computed tomography of the abdomen with 64-row multi- detector CT at baseline and after 3 cycles of chemotherapy for response assessment. Image analysis was performed by 2 observers, based on the RECIST criteria (version 1.1). Computed tomography revealed partial response of hepatic metastases in 7 patients (25%) by one observer and in 10 patients (35.7%) by the other observer, with good inter-observer agreement (k=0.75, percent agreement of 89.29%). Stable disease was detected in 19 patients (67.8%) by one observer and in 16 patients (57.1%) by the other observer, with good agreement (k=0.774, percent agreement of 89.29%). Progressive disease was detected in 2 patients (7.2%) by both observers, with perfect agreement (k=1, percent agreement of 100%). The overall inter-observer agreement in the CT-based response assessment of hepatic metastasis between the two observers was good ( k =0.793, percent agreement of 89.29%). We concluded that computed tomography is a reliable and reproducible imaging modality for response assessment of hepatic metastases of breast cancer according to the RECIST criteria (version 1.1).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wernicke, A. Gabriella, E-mail: gaw9008@med.cornell.ed; Parashar, Bhupesh; Kulidzhanov, Fridon
2011-05-01
Purpose: Accurate detection of radiation-induced fibrosis (RIF) is crucial in management of breast cancer survivors. Tissue compliance meter (TCM) has been validated in musculature. We validate TCM in healthy breast tissue with respect to interobserver and intraobserver variability before applying it in RIF. Methods and Materials: Three medical professionals obtained three consecutive TCM measurements in each of the four quadrants of the right and left breasts of 40 women with no breast disease or surgical intervention. The intraclass correlation coefficient (ICC) assessed interobserver variability. The paired t test and Pearson correlation coefficient (r) were used to assess intraobserver variability withinmore » each rater. Results: The median age was 45 years (range, 24-68 years). The median bra size was 35C (range, 32A-40DD). Of the participants, 27 were white (67%), 4 black (10%), 5 Asian (13%), and 4 Hispanic (10%). ICCs indicated excellent interrater reliability (low interobserver variability) among the three raters, by breast and quadrant (all ICC {>=}0.99). The paired t test and Pearson correlation coefficient both indicated low intraobserver variability within each rater (right vs. left breast), stratified by quadrant (all r{>=} 0.94, p < 0.0001). Conclusions: The interobserver and intraobserver variability is small using TCM in healthy mammary tissue. We are now embarking on a prospective study using TCM in women with breast cancer at risk of developing RIF that may guide early detection, timely therapeutic intervention, and assessment of success of therapy for RIF.« less
Validation of the Italian version of the Coma Recovery Scale-Revised (CRS-R).
Sacco, Simona; Altobelli, Emma; Pistarini, Caterina; Cerone, Davide; Cazzulani, Benedetta; Carolei, Antonio
2011-01-01
To validate the Italian version of the Coma Recovery Scale-Revised (CRS-R). Two observers applied the Italian version of the CRS-R to selected patients. On day 1, observer A and B independently scored each patient; the comparison of their observations was used to evaluate inter-observer agreement. On day 2, observer A completed a second evaluation and the comparison of this observation with that obtained on day 1 by the same observer was used to evaluate test-re-test agreement. For each evaluation, also diagnostic impression (vegetative state/minimally conscious state) was reported. Thirty-eight patients were evaluated (mean age ± SD, 58.9 ± 13.8 years). Inter-observer (ρ = 0.81; p < 0.001) as well as test-re-test agreement (ρ = 0.97; p < 0.001) for the total score was high. Inter-observer agreement was excellent for the communication sub-scale, good for the auditory, visual and motor sub-scales and moderate for the oromotor/verbal and arousal sub-scales. Test-re-test agreement was excellent for the visual, motor, oromotor/verbal and communication sub-scales, good for the auditory sub-scale and moderate for the arousal sub-scale. When considering the diagnostic impression, inter-observer agreement was good (κ = 0.75; p < 0.001) and test-re-test agreement was excellent (κ = 0.92; p < 0.001). The Italian version of the CRS-R can be administered reliably and can be also employed to discriminate patients in vegetative and in minimally conscious state.
Syed, Mushabbar A; Oshinski, John N; Kitchen, Charles; Ali, Arshad; Charnigo, Richard J; Quyyumi, Arshed A
2009-08-01
Carotid MRI measurements are increasingly being employed in research studies for atherosclerosis imaging. The majority of carotid imaging studies use 1.5 T MRI. Our objective was to investigate intra-observer and inter-observer variability in carotid measurements using high resolution 3 T MRI. We performed 3 T carotid MRI on 10 patients (age 56 +/- 8 years, 7 male) with atherosclerosis risk factors and ultrasound intima-media thickness > or =0.6 mm. A total of 20 transverse images of both right and left carotid arteries were acquired using T2 weighted black-blood sequence. The lumen and outer wall of the common carotid and internal carotid arteries were manually traced; vessel wall area, vessel wall volume, and average wall thickness measurements were then assessed for intra-observer and inter-observer variability. Pearson and intraclass correlations were used in these assessments, along with Bland-Altman plots. For inter-observer variability, Pearson correlations ranged from 0.936 to 0.996 and intraclass correlations from 0.927 to 0.991. For intra-observer variability, Pearson correlations ranged from 0.934 to 0.954 and intraclass correlations from 0.831 to 0.948. Calculations showed that inter-observer variability and other sources of error would inflate sample size requirements for a clinical trial by no more than 7.9%, indicating that 3 T MRI is nearly optimal in this respect. In patients with subclinical atherosclerosis, 3 T carotid MRI measurements are highly reproducible and have important implications for clinical trial design.
A study of lip prints and its reliability as a forensic tool
Verma, Yogendra; Einstein, Arouquiaswamy; Gondhalekar, Rajesh; Verma, Anoop K.; George, Jiji; Chandra, Shaleen; Gupta, Shalini; Samadi, Fahad M.
2015-01-01
Introduction: Lip prints, like fingerprints, are unique to an individual and can be easily recorded. Therefore, we compared direct and indirect lip print patterns in males and females of different age groups, studied the inter- and intraobserver bias in recording the data, and observed any changes in the lip print patterns over a period of time, thereby, assessing the reliability of lip prints as a forensic tool. Materials and Methods: Fifty females and 50 males in the age group of 15 to 35 years were selected for the study. Lips with any deformity or scars were not included. Lip prints were registered by direct and indirect methods and transferred to a preformed registration sheet. Direct method of lip print registration was repeated after a six-month interval. All the recorded data were analyzed statistically. Results: The predominant patterns were vertical and branched. More females showed the branched pattern and males revealed an equal prevalence of vertical and reticular patterns. There was an interobserver agreement, which was 95%, and there was no change in the lip prints over time. Indirect registration of lip prints correlated with direct method prints. Conclusion: Lip prints can be used as a reliable forensic tool, considering the consistency of lip prints over time and the accurate correlation of indirect prints to direct prints. PMID:26668449
The reliability of four widely used patellar height ratios.
van Duijvenbode, Dennis; Stavenuiter, Michel; Burger, Bart; van Dijke, Cees; Spermon, Jacco; Hoozemans, Marco
2016-03-01
The objective of this study was to evaluate the inter-observer reliability and the intra-observer reliability of four patellar height ratios: Insall-Salvati (IS), modified Insall-Salvati (MIS), Blackburne-Peel (BP) and Caton-Deschamps (CD). The patellar height ratios were assessed by four independent examiners using weight-bearing lateral knee radiographs in 30° flexion. Intra-class correlation coefficients and Fleiss' kappa's were determined. The inter-observer reliability was excellent for the IS and moderate for the other ratios. When the ratio values were categorized, the inter-observer reliability was strong for the IS, moderate for the MIS and BP, and poor for the CD. The intra-observer reliability was excellent for the IS, MIS and CD, and strong for the BP. When the ratio values were categorized, the intra-observer reliability was strong for the IS and MIS, and moderate for the other ratios. Although the IS showed best reliability, we advise to use the MIS as it showed the second best reliability but is, according to the literature, associated with better validity.
Mannath, J; Subramanian, V; Telakis, E; Lau, K; Ramappa, V; Wireko, M; Kaye, P V; Ragunath, K
2013-02-01
Autofluorescence imaging (AFI), which is a "red flag" technique during Barrett's surveillance, is associated with significant false positive results. The aim of this study was to assess the inter-observer agreement (IOA) in identifying AFI-positive lesions and to assess the overall accuracy of AFI. Anonymized AFI and high resolution white light (HRE) images were prospectively collected. The AFI images were presented in random order, followed by corresponding AFI + HRE images. Three AFI experts and 3 AFI non-experts scored images after a training presentation. The IOA was calculated using kappa and accuracy was calculated with histology as gold standard. Seventy-four sets of images were prospectively collected from 63 patients (48 males, mean age 69 years). The IOA for number of AF positive lesions was fair when AFI images were presented. This improved to moderate with corresponding AFI and HRE images [experts 0.57 (0.44-0.70), non-experts 0.47 (0.35-0.62)]. The IOA for the site of AF lesion was moderate for experts and fair for non-experts using AF images, which improved to substantial for experts [κ = 0.62 (0.50-0.72)] but remained at fair for non-experts [κ = 0.28 (0.18-0.37)] with AFI + HRE. Among experts, the accuracy of identifying dysplasia was 0.76 (0.7-0.81) using AFI images and 0.85 (0.79-0.89) using AFI + HRE images. The accuracy was 0.69 (0.62-0.74) with AFI images alone and 0.75 (0.70-0.80) using AFI + HRE among non-experts. The IOA for AF positive lesions is fair to moderate using AFI images which improved with addition of HRE. The overall accuracy of identifying dysplasia was modest, and was better when AFI and HRE images were combined.
Lee, Lik Hang; Yantiss, Rhonda K; Sadot, Eran; Ren, Bing; Calvacanti, Marcela Santos; Hechtman, Jaclyn F; Ivelja, Sinisa; Huynh, Be; Xue, Yue; Shitilbans, Tatiana; Guend, Hamza; Stadler, Zsofia K; Weiser, Martin R; Vakiani, Efsevia; Gönen, Mithat; Klimstra, David S; Shia, Jinru
2017-04-01
Colorectal medullary carcinoma, recognized by the World Health Organization as a distinct histologic subtype, is commonly regarded as a specific entity with an improved prognosis and unique molecular pathogenesis. A fundamental but as yet unaddressed question, however, is whether it can be diagnosed reproducibly. In this study, by analyzing 80 colorectal adenocarcinomas whose dominant growth pattern was solid (thus encompassing medullary carcinoma and its mimics), we provided a detailed description of the morphological spectrum from "classic medullary histology" to nonmedullary poorly differentiated histologies and demonstrated significant overlapping between categories. By assessing a selected subset (n=30) that represented the spectrum of histologies, we showed that the interobserver agreement for diagnosing medullary carcinoma by using 2010 World Health Organization criteria was poor; the κ value among 5 gastrointestinal pathologists was only 0.157 (95% confidence interval, 0.127-0.263; P=.001). When we arbitrarily classified the entire cohort into "classic" and "indeterminate" medullary tumors (group 1, n=19; group 2, n=26, respectively) and nonmedullary poorly differentiated tumors (group 3, n=35), groups 1 and 2 were more likely to exhibit mismatch repair protein deficiency than group 3 (P<.001); however, improved survival could not be detected in either group compared with group 3. Our findings suggest that the diagnosis of medullary carcinoma, as currently applied, may only serve as a morphological descriptor indicating an increased likelihood of mismatch-repair deficiency. Additional evidence including a more objective classification system is needed before medullary carcinoma can be regarded as a distinct entity with prognostic relevance. Until such evidence becomes available, caution should be exercised when making this diagnosis, as well as when comparing results across different studies. Copyright © 2016 Elsevier Inc. All rights reserved.
Galea, Angela; Adlan, Tarig; Gay, David; Roobottom, Carl; Dubbins, Paul; Riordan, Richard
2015-09-01
The aim of this study was to compare the sensitivity and specificity of chest digital tomosynthesis (DTS) with chest radiography (CXR) for the detection of noncalcified pulmonary nodules and hilar lesions using computed tomography (CT) as the reference standard. A total of 78 patients with suspected noncalcified pulmonary lesions on CXR were included in the study. Two radiologists, blinded to the history and CT, analyzed the CXR and the DTS images (separately), whereas a third radiologist analyzed the CXR and DTS images together. Noncalcified intrapulmonary nodules and hilar lesions were recorded for analysis. The interobserver agreement for CXR and DTS was assessed, and the time taken to report the images was recorded. A total of 202 lesions were recorded in 78 patients. There were 111 true lesions confirmed on CT in 53 patients; in 25 patients subsequent CT excluded a lesion. The overall sensitivity was 32% for CXR and 49% for DTS. This improved to 54% when the posteroanterior CXR and DTS were reviewed together (CXR-DTS). The overall specificities for CXR, DTS, and CXR-DTS were 49%, 96%, and 98%, respectively. There were 56 suspected hilar lesions with subgroup sensitivities of 76% for CXR, 65% for DTS, and 76% for CXR-DTS. The specificity for hilar lesions was 59%, 92%, and 97% for CXR, DTS, and CXR-DTS, respectively. DTS significantly improves the detectability of noncalcified nodules when compared with and when used in combination with CXR. The specificity and interobserver agreement of DTS in the diagnosis of suspected noncalcified pulmonary nodules and hilar lesions are significantly better than those of CXR and approaches those of CT.
Maltez de Almeida, João Ricardo; Gomes, André Boechat; Barros, Thomas Pitangueira; Fahel, Paulo Eduardo; de Seixas Rocha, Mário
2015-07-01
The purposes of this study were to investigate whether dynamic contrast-enhanced MRI is adequate for subcategorization of suspicious lesions (BI-RADS category 4) and to evaluate whether use of DWI improves diagnostic performance. The study group was composed of 103 suspicious lesions found in 83 subjects. Patient ages and lesion sizes were compiled, and two radiologists reanalyzed the images; subcategorized the findings as BI-RADS 4A, 4B, or 4C; and calculated apparent diffusion coefficient (ADC) values. The stratified variables were tested by univariate analysis and inserted in two multivariate predictive models, which were used to generate ROC curves and compare AUCs. Positive predictive values (PPVs) for each subcategory and ADC level were calculated, and interobserver agreement was tested. Forty-four (42.7%) suspicious findings proved malignant. Except for age (p = 0.08), all stratified predictor variables were significant in univariate analyses (p < 0.01). Logistic regression models did not differ substantially after comparison of the ROC curves (p = 0.09), but the one including ADC values was slightly better: AUC of 0.89 (95% CI, 0.82-0.95) against AUC of 0.85 (95% CI, 0.78-0.93). PPV increased progressively in each BI-RADS 4 subcategory (4A, 0.15; 4B, 0.37; 4C, 0.84). ADC values of 1.10 × 10(-3) mm(2)/s or less had the second highest PPV (0.77). Interobserver agreement was substantial at a kappa value of 0.80 (95% CI, 0.70-0.90; p < 0.01). Risk stratification of suspicious lesions (BI-RADS category 4) can be satisfactorily performed with DCE-MRI and slightly improved when DWI is introduced.
Noda, Wataru; Tanaka-Matsumi, Junko
2009-03-01
The present study evaluates the effect of a classroom-based behavioral intervention package to improve Japanese elementary school children's sitting posture in regular classrooms (N=68). This study uses a multiple-baseline design across two classrooms with a modified repeated reversal within each class. The article defines appropriate sitting posture as behavior composed of four components (feet, buttocks, back, and a whole body). The intervention package includes modeling, correspondence training, prompt, and reinforcement, among others. The authors counted the number of children with appropriate sitting posture in each classroom across all 28 sessions throughout the study. Interobserver agreement of appropriate sitting posture ranged from 80% to 100%. As a result of the intervention, the mean proportion of children with appropriate posture increased from approximately 20% to 90%. In addition, their academic writing productivity increased with the improved sitting posture. Teachers' acceptance of the intervention program proved to be excellent.
Robot-based tele-echography: clinical evaluation of the TER system in abdominal aortic exploration.
Martinelli, Thomas; Bosson, Jean-Luc; Bressollette, Luc; Pelissier, Franck; Boidard, Eric; Troccaz, Jocelyne; Cinquin, Philippe
2007-11-01
The TER system is a robot-based tele-echography system allowing remote ultrasound examination. The specialist moves a mock-up of the ultrasound probe at the master site, and the robot reproduces the movements of the real probe, which sends back ultrasound images and force feedback. This tool could be used to perform ultrasound examinations in small health care centers or from isolated sites. The objective of this study was to prove, under real conditions, the feasibility and reliability of the TER system in detecting abdominal aortic and iliac aneurysms. Fifty-eight patients were included in 2 centers in Brest and Grenoble, France. The remote examination was compared with the reference standard, the bedside examination, for aorta and iliac artery diameter measurement, detection and description of aneurysms, detection of atheromatosis, the duration of the examination, and acceptability. All aneurysms (8) were detected by both techniques as intramural thrombosis and extension to the iliac arteries. The interobserver correlation coefficient was 0.982 (P < .0001) for aortic diameters. The rate of concordance between 2 operators in evaluating atheromatosis was 84% +/- 11% (95% confidence interval). Our study on 58 patients suggests that the TER system could be a reliable, acceptable, and effective robot-based system for performing remote abdominal aortic ultrasound examinations. Research is continuing to improve the equipment for general abdominal use.
Reliability testing of the Larsen and Sharp classifications for rheumatoid arthritis of the elbow.
Jew, Nicholas B; Hollins, Anthony M; Mauck, Benjamin M; Smith, Richard A; Azar, Frederick M; Miller, Robert H; Throckmorton, Thomas W
2017-01-01
Two popular systems for classifying rheumatoid arthritis affecting the elbow are the Larsen and Sharp schemes. To our knowledge, no study has investigated the reliability of these 2 systems. We compared the intraobserver and interobserver agreement of the 2 systems to determine whether one is more reliable than the other. The radiographs of 45 patients diagnosed with rheumatoid arthritis affecting the elbow were evaluated. Anteroposterior and lateral radiographs were deidentified and distributed to 6 evaluators (4 fellowship-trained upper extremity surgeons and 2 orthopedic trainees). Each evaluator graded all 45 radiographs according to the Larsen and Sharp scoring methods on 2 occasions, at least 2 weeks apart. Overall intraobserver reliability was 0.93 (95% confidence interval [CI], 0.90-0.95) for the Larsen system and 0.92 (95% CI, 0.86-0.96) for the Sharp classification, both indicating substantial agreement. Overall interobserver reliability was 0.70 (95% CI, 0.60-0.80) for the Larsen classification and 0.68 (95% CI, 0.54-0.81) for the Sharp system, both indicating good agreement. There were no significant differences in the intraobserver or interobserver reliability of the systems overall and no significant differences in reliability between attending surgeons and trainees for either classification system. The Larsen and Sharp systems both show substantial intraobserver reliability and good interobserver agreement for the radiographic classification of rheumatoid arthritis affecting the elbow. Differences in training level did not result in substantial variances in reliability for either system. We conclude that both systems can be reliably used to evaluate rheumatoid arthritis of the elbow by observers of varying training levels. Copyright © 2017 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
Kakinuma, Ryutaro; Ashizawa, Kazuto; Kuriyama, Keiko; Fukushima, Aya; Ishikawa, Hiroyuki; Kamiya, Hisashi; Koizumi, Naoya; Maruyama, Yuichiro; Minami, Kazunori; Nitta, Norihisa; Oda, Seitaro; Oshiro, Yasuji; Kusumoto, Masahiko; Murayama, Sadayuki; Murata, Kiyoshi; Muramatsu, Yukio; Moriyama, Noriyuki
2012-04-01
To evaluate interobserver agreement in regard to measurements of focal ground-glass opacities (GGO) diameters on computed tomography (CT) images to identify increases in the size of GGOs. Approval by the institutional review board and informed consent by the patients were obtained. Ten GGOs (mean size, 10.4 mm; range, 6.5-15 mm), one each in 10 patients (mean age, 65.9 years; range, 58-78 years), were used to make the diameter measurements. Eleven radiologists independently measured the diameters of the GGOs on a total of 40 thin-section CT images (the first [n = 10], the second [n = 10], and the third [n = 10] follow-up CT examinations and remeasurement of the first [n = 10] follow-up CT examinations) without comparing time-lapse CT images. Interobserver agreement was assessed by means of Bland-Altman plots. The smallest range of the 95% limits of interobserver agreement between the members of the 55 pairs of the 11 radiologists in regard to maximal diameter was -1.14 to 1.72 mm, and the largest range was -7.7 to 1.7 mm. The mean value of the lower limit of the 95% limits of agreement was -3.1 ± 1.4 mm, and the mean value of their upper limit was 2.5 ± 1.1 mm. When measurements are made by any two radiologists, an increase in the length of the maximal diameter of more than 1.72 mm would be necessary in order to be able to state that the maximal diameter of a particular GGO had actually increased. Copyright © 2012 AUR. Published by Elsevier Inc. All rights reserved.
Choi, M H; Oh, S N; Park, G E; Yeo, D-M; Jung, S E
2018-05-10
To evaluate the interobserver and intermethod correlations of histogram metrics of dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) parameters acquired by multiple readers using the single-section and whole-tumor volume methods. Four DCE parameters (K trans , K ep , V e , V p ) were evaluated in 45 patients (31 men and 14 women; mean age, 61±11 years [range, 29-83 years]) with locally advanced rectal cancer using pre-chemoradiotherapy (CRT) MRI. Ten histogram metrics were extracted using two methods of lesion selection performed by three radiologists: the whole-tumor volume method for the whole tumor on axial section-by-section images and the single-section method for the entire area of the tumor on one axial image. The interobserver and intermethod correlations were evaluated using the intraclass correlation coefficients (ICCs). The ICCs showed excellent interobserver and intermethod correlations in most of histogram metrics of the DCE parameters. The ICCs among the three readers were > 0.7 (P<0.001) for all histogram metrics, except for the minimum and maximum. The intermethod correlations for most of the histogram metrics were excellent for each radiologist, regardless of the differences in the radiologists' experience. The interobserver and intermethod correlations for most of the histogram metrics of the DCE parameters are excellent in rectal cancer. Therefore, the single-section method may be a potential alternative to the whole-tumor volume method using pre-CRT MRI, despite the fact that the high agreement between the two methods cannot be extrapolated to post-CRT MRI. Copyright © 2018 Société française de radiologie. Published by Elsevier Masson SAS. All rights reserved.
Kim, Sun Mi; Han, Heon; Park, Jeong Mi; Choi, Yoon Jung; Yoon, Hoi Soo; Sohn, Jung Hee; Baek, Moon Hee; Kim, Yoon Nam; Chae, Young Moon; June, Jeon Jong; Lee, Jiwon; Jeon, Yong Hwan
2012-10-01
To determine which Breast Imaging Reporting and Data System (BI-RADS) descriptors for ultrasound are predictors for breast cancer using logistic regression (LR) analysis in conjunction with interobserver variability between breast radiologists, and to compare the performance of artificial neural network (ANN) and LR models in differentiation of benign and malignant breast masses. Five breast radiologists retrospectively reviewed 140 breast masses and described each lesion using BI-RADS lexicon and categorized final assessments. Interobserver agreements between the observers were measured by kappa statistics. The radiologists' responses for BI-RADS were pooled. The data were divided randomly into train (n = 70) and test sets (n = 70). Using train set, optimal independent variables were determined by using LR analysis with forward stepwise selection. The LR and ANN models were constructed with the optimal independent variables and the biopsy results as dependent variable. Performances of the models and radiologists were evaluated on the test set using receiver-operating characteristic (ROC) analysis. Among BI-RADS descriptors, margin and boundary were determined as the predictors according to stepwise LR showing moderate interobserver agreement. Area under the ROC curves (AUC) for both of LR and ANN were 0.87 (95% CI, 0.77-0.94). AUCs for the five radiologists ranged 0.79-0.91. There was no significant difference in AUC values among the LR, ANN, and radiologists (p > 0.05). Margin and boundary were found as statistically significant predictors with good interobserver agreement. Use of the LR and ANN showed similar performance to that of the radiologists for differentiation of benign and malignant breast masses.
Wiig, Ola; Terjesen, Terje; Svenningsen, Svein
2002-10-01
We evaluated the inter-observer agreement of radiographic methods when evaluating patients with Perthes' disease. The radiographs were assessed at the time of diagnosis and at the 1-year follow-up by local orthopaedic surgeons (O) and 2 experienced pediatric orthopedic surgeons (TT and SS). The Catterall, Salter-Thompson, and Herring lateral pillar classifications were compared, and the femoral head coverage (FHC), center-edge angle (CE-angle), and articulo-trochanteric distance (ATD) were measured in the affected and normal hips. On the primary evaluation, the lateral pillar and Salter-Thompson classifications had a higher level of agreement among the observers than the Catterall classification, but none of the classifications showed good agreement (weighted kappa values between O and SS 0.56, 0.54, 0.49, respectively). Combining Catterall groups 1 and 2 into one group, and groups 3 and 4 into another resulted in better agreement (kappa 0.55) than with the original 4-group system. The agreement was also better (kappa 0.62-0.70) between experienced than between less experienced examiners for all classifications. The femoral head coverage was a more reliable and accurate measure than the CE-angle for quantifying the acetabular covering of the femoral head, as indicated by higher intraclass correlation coefficients (ICC) and smaller inter-observer differences. The ATD showed good agreement in all comparisons and had low interobserver differences. We conclude that all classifications of femoral head involvement are adequate in clinical work if the radiographic assessment is done by experienced examiners. When they are less experienced examiners, a 2-group classification or the lateral pillar classification is more reliable. For evaluation of containment of the femoral head, FHC is more appropriate than the CE-angle.
Singh, Ranjodh; Zhou, Zhiping; Tisnado, Jamie; Haque, Sofia; Peck, Kyung K.; Young, Robert J.; Tsiouris, Apostolos John; Thakur, Sunitha B.; Souweidane, Mark M.
2017-01-01
OBJECTIVE Accurately determining diffuse intrinsic pontine glioma (DIPG) tumor volume is clinically important. The aims of the current study were to 1) measure DIPG volumes using methods that require different degrees of subjective judgment; and 2) evaluate interobserver agreement of measurements made using these methods. METHODS Eight patients from a Phase I clinical trial testing convection-enhanced delivery (CED) of a therapeutic antibody were included in the study. Pre-CED, post–radiation therapy axial T2-weighted images were analyzed using 2 methods requiring high degrees of subjective judgment (picture archiving and communication system [PACS] polygon and Volume Viewer auto-contour methods) and 1 method requiring a low degree of subjective judgment (k-means clustering segmentation) to determine tumor volumes. Lin’s concordance correlation coefficients (CCCs) were calculated to assess interobserver agreement. RESULTS The CCCs of measurements made by 2 observers with the PACS polygon and the Volume Viewer auto-contour methods were 0.9465 (lower 1-sided 95% confidence limit 0.8472) and 0.7514 (lower 1-sided 95% confidence limit 0.3143), respectively. Both were considered poor agreement. The CCC of measurements made using k-means clustering segmentation was 0.9938 (lower 1-sided 95% confidence limit 0.9772), which was considered substantial strength of agreement. CONCLUSIONS The poor interobserver agreement of PACS polygon and Volume Viewer auto-contour methods high-lighted the difficulty in consistently measuring DIPG tumor volumes using methods requiring high degrees of subjective judgment. k-means clustering segmentation, which requires a low degree of subjective judgment, showed better interob-server agreement and produced tumor volumes with delineated borders. PMID:27391980
Braileanu, Maria; Yang, Wuyang; Caplan, Justin M; Lin, Li-Mei; Radvany, Martin G; Tamargo, Rafael J; Huang, Judy
2016-11-01
Arteriovenous malformation (AVM) diffuseness has been shown to be prognostic of treatment outcomes. We assessed interobserver agreement of AVM diffuseness among physicians of different specialty and training backgrounds using digital subtraction angiography (DSA). All research protocols were approved by the institutional review board for this retrospective chart review. In a single-blinded setting, 2 attending neurosurgeons, 1 attending interventional neuroradiologist, and 1 senior neurosurgical resident rated 80 DSA views of 36 AVMs as either compact or diffuse. Individual interobserver agreement and subgroup agreement were analyzed using κ agreement and intraclass correlation coefficient. Disagreement regarding AVM diffuseness occurred in 43.8% of all DSA views (n = 80). Interobserver κ agreement on AVM diffuseness using DSA views among 4 physicians ranged from fair (κ = 0.40 [95% confidence interval (CI) = 0.22-0.58]) to substantial (κ = 0.65 [95% CI = 0.48-0.81]), whereas total intraclass correlation coefficient was 0.81 (95% CI = 0.73-0.87). For the 36 AVMs, κ agreement ranged from fair (κ = 0.36 [95% CI = 0.13-0.60]) to moderate (κ = 0.57 [95% CI = 0.35-0.79]), whereas intraclass correlation coefficient among all 4 physicians was 0.68 (95% CI = 0.47-0.82). Moderate agreement on AVM diffuseness (n = 80) was found between attending and resident assessments (κ = 0.57 [95% CI = 0.39-0.75]) and between neurosurgeon and interventional neuroradiologist assessments (κ = 0.55 [95% CI = 0.37-0.73]). Agreement of individual physicians on AVM diffuseness varies from fair to substantial. Objective and three-dimensional measures of AVM diffuseness should be developed for consistent clinical application. Copyright © 2016 Elsevier Inc. All rights reserved.
Cohen, Julien G; Kim, Hyungjin; Park, Su Bin; van Ginneken, Bram; Ferretti, Gilbert R; Lee, Chang Hyun; Goo, Jin Mo; Park, Chang Min
2017-08-01
To evaluate the differences between filtered back projection (FBP) and model-based iterative reconstruction (MBIR) algorithms on semi-automatic measurements in subsolid nodules (SSNs). Unenhanced CT scans of 73 SSNs obtained using the same protocol and reconstructed with both FBP and MBIR algorithms were evaluated by two radiologists. Diameter, mean attenuation, mass and volume of whole nodules and their solid components were measured. Intra- and interobserver variability and differences between FBP and MBIR were then evaluated using Bland-Altman method and Wilcoxon tests. Longest diameter, volume and mass of nodules and those of their solid components were significantly higher using MBIR (p < 0.05) with mean differences of 1.1% (limits of agreement, -6.4 to 8.5%), 3.2% (-20.9 to 27.3%) and 2.9% (-16.9 to 22.7%) and 3.2% (-20.5 to 27%), 6.3% (-51.9 to 64.6%), 6.6% (-50.1 to 63.3%), respectively. The limits of agreement between FBP and MBIR were within the range of intra- and interobserver variability for both algorithms with respect to the diameter, volume and mass of nodules and their solid components. There were no significant differences in intra- or interobserver variability between FBP and MBIR (p > 0.05). Semi-automatic measurements of SSNs significantly differed between FBP and MBIR; however, the differences were within the range of measurement variability. • Intra- and interobserver reproducibility of measurements did not differ between FBP and MBIR. • Differences in SSNs' semi-automatic measurement induced by reconstruction algorithms were not clinically significant. • Semi-automatic measurement may be conducted regardless of reconstruction algorithm. • SSNs' semi-automated classification agreement (pure vs. part-solid) did not significantly differ between algorithms.
Krogh, T P; Fredberg, U; Christensen, R; Stengaard-Pedersen, K; Ellingsen, T
2013-10-01
Tennis elbow, also known as lateral epicondylitis (LE), is a common disorder often assessed by ultrasound. The aim of this study was to evaluate the ultrasonographic outcomes and methods used in LE research and clinical practice. This study was designed as an intra- and interobserver reliability and agreement study. Ultrasonographic examination of the common extensor tendon of the elbow was performed. The intraobserver study examined tendon thickness twice in 20 right elbows from 20 healthy individuals at an interval of 7 to 12 days. The interobserver study examined tendon thickness, color Doppler activity, and bony spurs in 18 right elbows in 9 healthy individuals and 9 patients with LE. Two trained rheumatologists performed the interobserver examinations with the same scanner on the same day. The main outcomes were intra- and interclass correlation (ICC) and agreement. In the intraobserver study, the ICC with regard to tendon thickness ranged from 0.76 to 0.81, depending on the measurement techniques used. The agreement ranged from 0.06 to 0.13 mm. In the interobserver study, the tendon thickness ICC ranged from 0.45 to 0.65 and the agreement ranged from -0.17 to 0.13 mm. The ICC for color Doppler activity was 0.93, with agreement in 14/18 (78 %) of the cases. A perfect reliability was demonstrated for bony spurs, with an ICC of 1 and exact agreement in 18/18 (100 %) of the cases. Good to excellent reliability was obtained for all measurements. The ultrasonographic techniques evaluated in this trial can be recommended for use in both research and clinical practice. © Georg Thieme Verlag KG Stuttgart · New York.
Sternby, Hanna; Verdonk, Robert C; Aguilar, Guadalupe; Dimova, Alexandra; Ignatavicius, Povilas; Ilzarbe, Lucas; Koiva, Peeter; Lantto, Eila; Loigom, Tonis; Penttilä, Anne; Regnér, Sara; Rosendahl, Jonas; Strahinova, Vanya; Zackrisson, Sophia; Zviniene, Kristina; Bollen, Thomas L
2016-01-01
For consistent reporting and better comparison of data in research the revised Atlanta classification (RAC) proposes new computed tomography (CT) criteria to describe the morphology of acute pancreatitis (AP). The aim of this study was to analyse the interobserver agreement among radiologists in evaluating CT morphology by using the new RAC criteria in patients with AP. Patients with a first episode of AP who obtained a CT were identified and consecutively enrolled at six European centres backwards from January 2013 to January 2012. A local radiologist at each center and a central expert radiologist scored the CTs separately using the RAC criteria. Center dependent and independent interobserver agreement was determined using Kappa statistics. In total, 285 patients with 388 CTs were included. For most CT criteria, interobserver agreement was moderate to substantial. In four categories, the center independent kappa values were fair: extrapancreatic necrosis (EXPN) (0.326), type of pancreatitis (0.370), characteristics of collections (0.408), and appropriate term of collections (0.356). The fair kappa values relate to discrepancies in the identification of extrapancreatic necrotic material. The local radiologists diagnosed EXPN (33% versus 59%, P < 0.0001) and non-homogeneous collections (35% versus 66%, P < 0.0001) significantly less frequent than the central expert. Cases read by the central expert showed superior correlation with clinical outcome. Diagnosis of EXPN and recognition of non-homogeneous collections show only fair agreement potentially resulting in inconsistent reporting of morphologic findings. Copyright © 2016 IAP and EPC. Published by Elsevier B.V. All rights reserved.
High resolution pituitary gland MRI at 7.0 tesla: a clinical evaluation in Cushing's disease.
de Rotte, Alexandra A J; Groenewegen, Amy; Rutgers, Dik R; Witkamp, Theo; Zelissen, Pierre M J; Meijer, F J Anton; van Lindert, Erik J; Hermus, Ad; Luijten, Peter R; Hendrikse, Jeroen
2016-01-01
To evaluate the detection of pituitary lesions at 7.0 T compared to 1.5 T MRI in 16 patients with clinically and biochemically proven Cushing's disease. In seven patients, no lesion was detected on the initial 1.5 T MRI, and in nine patients it was uncertain whether there was a lesion. Firstly, two readers assessed both 1.5 T and 7.0 T MRI examinations unpaired in a random order for the presence of lesions. Consensus reading with a third neuroradiologist was used to define final lesions in all MRIs. Secondly, surgical outcome was evaluated. A comparison was made between the lesions visualized with MRI and the lesions found during surgery in 9/16 patients. The interobserver agreement for lesion detection was good at 1.5 T MRI (κ = 0.69) and 7.0 T MRI (κ = 0.62). In five patients, both the 1.5 T and 7.0 T MRI enabled visualization of a lesion on the correct side of the pituitary gland. In three patients, 7.0 T MRI detected a lesion on the correct side of the pituitary gland, while no lesion was visible at 1.5 T MRI. The interobserver agreement of image assessment for 7.0 T MRI in patients with Cushing's disease was good, and lesions were detected more accurately with 7.0 T MRI. Interobserver agreement for lesion detection on 1.5 T MRI was good; Interobserver agreement for lesion detection on 7.0 T MRI was good; 7.0 T enabled confirmation of unclear lesions at 1.5 T; 7.0 T enabled visualization of lesions not visible at 1.5 T.
Salavati, M; Krijnen, W P; Rameckers, E A A; Looijestijn, P L; Maathuis, C G B; van der Schans, C P; Steenbergen, B
2015-01-01
The aims of this study were to adapt the Gross Motor Function Measure-88 (GMFM-88) for children with Cerebral Palsy (CP) and Cerebral Visual Impairment (CVI) and to determine the test-retest and interobserver reliability of the adapted version. Sixteen paediatric physical therapists familiar with CVI participated in the adaptation process. The Delphi method was used to gain consensus among a panel of experts. Seventy-seven children with CP and CVI (44 boys and 33 girls, aged between 50 and 144 months) participated in this study. To assess test-retest and interobserver reliability, the GMFM-88 was administered twice within three weeks (Mean=9 days, SD=6 days) by trained paediatric physical therapists, one of whom was familiar with the child and one who wasn't. Percentages of identical scores, Cronbach's alphas and intraclass correlation coefficients (ICC) were computed for each dimension level. All experts agreed on the proposed adaptations of the GMFM-88 for children with CP and CVI. Test-retest reliability ICCs for dimension scores were between 0.94 and 1.00, mean percentages of identical scores between 29 and 71, and interobserver reliability ICCs of the adapted GMFM-88 were 0.99-1.00 for dimension scores. Mean percentages of identical scores varied between 53 and 91. Test-retest and interobserver reliability of the GMFM-88-CVI for children with CP and CVI was excellent. Internal consistency of dimension scores lay between 0.97 and 1.00. The psychometric properties of the adapted GMFM-88 for children with CP and CVI are reliable and comparable to the original GMFM-88. Copyright © 2015 Elsevier Ltd. All rights reserved.
Application of age estimation methods based on teeth eruption: how easy is Olze method to use?
De Angelis, D; Gibelli, D; Merelli, V; Botto, M; Ventura, F; Cattaneo, C
2014-09-01
The development of new methods for age estimation has become with time an urgent issue because of the increasing immigration, in order to estimate accurately the age of those subjects who lack valid identity documents. Methods of age estimation are divided in skeletal and dental ones, and among the latter, Olze's method is one of the most recent, since it was introduced in 2010 with the aim to identify the legal age of 18 and 21 years by evaluating the different stages of development of the periodontal ligament of the third molars with closed root apices. The present study aims at verifying the applicability of the method to the daily forensic practice, with special focus on the interobserver repeatability. Olze's method was applied by three different observers (two physicians and one dentist without a specific training in Olze's method) to 61 orthopantomograms from subjects of mixed ethnicity aged between 16 and 51 years. The analysis took into consideration the lower third molars. The results provided by the different observers were then compared in order to verify the interobserver error. Results showed that interobserver error varies between 43 and 57 % for the right lower third molar (M48) and between 23 and 49 % for the left lower third molar (M38). Chi-square test did not show significant differences according to the side of teeth and type of professional figure. The results prove that Olze's method is not easy to apply when used by not adequately trained personnel, because of an intrinsic interobserver error. Since it is however a crucial method in age determination, it should be used only by experienced observers after an intensive and specific training.
Prospective assessment of interobserver agreement for defecography in fecal incontinence.
Dobben, Annette C; Wiersma, Tjeerd G; Janssen, Lucas W M; de Vos, Rien; Terra, Maaike P; Baeten, Cor G; Stoker, Jaap
2005-11-01
The primary aim of our study was to determine the interobserver agreement of defecography in diagnosing enterocele, anterior rectocele, intussusception, and anismus in fecal-incontinent patients. The subsidiary aim was to evaluate the influence of level of experience on interpreting defecography. Defecography was performed in 105 consecutive fecal-incontinent patients. Observers were classified by level of experience and their findings were compared with the findings of an expert radiologist. The quality of the expert radiologist's findings was evaluated by an intraobserver agreement procedure. Intraobserver agreement was good to very good except for anismus: incomplete evacuation after 30 sec (kappa, 0.55) and puborectalis impression (kappa, 0.54). Interobserver agreement for enterocele and rectocele was good (kappa, 0.66 for both) and for intussusception, fair (kappa, 0.29). Interobserver agreement for anismus: incomplete evacuation after 30 sec was moderate (kappa, 0.47), and for anismus: puborectalis impression was fair (kappa, 0.24). Agreement in grading of enterocele and rectocele was good (kappa, 0.64 and 0.72, respectively) and for intussusception, fair (kappa, 0.39). Agreement separated by experience level was very good for rectocele (kappa, 0.83) and grading of rectoceles (kappa, 0.83) and moderate for intussusception (kappa, 0.44) at the most experienced level. For enterocele and grading, experience level did not influence the reproducibility. Reproducibility for enterocele, anterior rectocele, and severity grading is good, but for intussusception is fair to moderate. For anismus, the diagnosis of incomplete evacuation after 30 sec is more reproducible than puborectalis impression. The level of experience seems to play a role in diagnosing anterior rectocele and its grading and in diagnosing intussusception.
Fox, M R; Pandolfino, J E; Sweis, R; Sauter, M; Abreu Y Abreu, A T; Anggiansah, A; Bogte, A; Bredenoord, A J; Dengler, W; Elvevi, A; Fruehauf, H; Gellersen, S; Ghosh, S; Gyawali, C P; Heinrich, H; Hemmink, M; Jafari, J; Kaufman, E; Kessing, K; Kwiatek, M; Lubomyr, B; Banasiuk, M; Mion, F; Pérez-de-la-Serna, J; Remes-Troche, J M; Rohof, W; Roman, S; Ruiz-de-León, A; Tutuian, R; Uscinowicz, M; Valdovinos, M A; Vardar, R; Velosa, M; Waśko-Czopnik, D; Weijenborg, P; Wilshire, C; Wright, J; Zerbib, F; Menne, D
2015-01-01
High-resolution esophageal manometry (HRM) is a recent development used in the evaluation of esophageal function. Our aim was to assess the inter-observer agreement for diagnosis of esophageal motility disorders using this technology. Practitioners registered on the HRM Working Group website were invited to review and classify (i) 147 individual water swallows and (ii) 40 diagnostic studies comprising 10 swallows using a drop-down menu that followed the Chicago Classification system. Data were presented using a standardized format with pressure contours without a summary of HRM metrics. The sequence of swallows was fixed for each user but randomized between users to avoid sequence bias. Participants were blinded to other entries. (i) Individual swallows were assessed by 18 practitioners (13 institutions). Consensus agreement (≤ 2/18 dissenters) was present for most cases of normal peristalsis and achalasia but not for cases of peristaltic dysmotility. (ii) Diagnostic studies were assessed by 36 practitioners (28 institutions). Overall inter-observer agreement was 'moderate' (kappa 0.51) being 'substantial' (kappa > 0.7) for achalasia type I/II and no lower than 'fair-moderate' (kappa >0.34) for any diagnosis. Overall agreement was somewhat higher among those that had performed >400 studies (n = 9; kappa 0.55) and 'substantial' among experts involved in development of the Chicago Classification system (n = 4; kappa 0.66). This prospective, randomized, and blinded study reports an acceptable level of inter-observer agreement for HRM diagnoses across the full spectrum of esophageal motility disorders for a large group of clinicians working in a range of medical institutions. Suboptimal agreement for diagnosis of peristaltic motility disorders highlights contribution of objective HRM metrics. © 2014 International Society for Diseases of the Esophagus.
Chapman, Cary B; Herrera, Mauricio F; Binenbaum, Gil; Schweppe, Michael; Staron, Ronald B; Feldman, Frieda; Rosenwasser, Melvin P
2003-09-01
The purpose of this prospective study was to determine the level of interobserver and intraobserver agreement among orthopedic surgeons and radiologists when computed tomography (CT) scans are used with plain radiographs to evaluate intertrochanteric fractures. In addition, the prognostic value of current classifications systems concerning quality of life was evaluated. Sixty-one patients who presented with intertrochanteric fractures received open reduction and internal fixation with compression hip screw. Three orthopedic surgeons and 2 radiologists independently classified the fractures according to 2 systems: Evans-Jensen and AO (Arbeitsgemeinschaft für Osteo-synthesefragen). Fractures were initially graded with plain radiographs and then again in conjunction with CT. Results were analyzed using the (kappa) kappa coefficient. The 36-item Short-Form Health Survey was administered at baseline, 3 months, and 1 year, and results were correlated with fracture grade. Mean kappa coefficients when comparing radiography alone with radiography and CT scan were 0.63 for the AO system and 0.59 for the Evans-Jensen system. Both represent "fair" agreements. Mean overall interobserver kappa coefficients were 0.67 for radiologists and 0.57 for orthopedic surgeons. Radiologists also had higher intraobserver kappa coefficients. No significant relationships were found between follow-up Short Form Health Survey results and intraoperative grading of fractures. When these classification schemes are compared, interobserver agreement does not appear to change dramatically when information from CT scans is added. This may suggest that (1) more data have been provided by CT with greater possibilities for misinterpretation and (2) these classification schemes may not be comprehensive in describing fracture pattern and displacement. Finally, both systems failed to provide any prognostic value.
Youk, Ji Hyun; Jung, Inkyung; Yoon, Jung Hyun; Kim, Sung Hun; Kim, You Me; Lee, Eun Hye; Jeong, Sun Hye; Kim, Min Jung
2016-09-01
Our aim was to compare the inter-observer variability and diagnostic performance of the Breast Imaging Reporting and Data System (BI-RADS) lexicon for breast ultrasound of static and video images. Ninety-nine breast masses visible on ultrasound examination from 95 women 19-81 y of age at five institutions were enrolled in this study. They were scheduled to undergo biopsy or surgery or had been stable for at least 2 y of ultrasound follow-up after benign biopsy results or typically benign findings. For each mass, representative long- and short-axis static ultrasound images were acquired; real-time long- and short-axis B-mode video images through the mass area were separately saved as cine clips. Each image was reviewed independently by five radiologists who were asked to classify ultrasound features according to the fifth edition of the BI-RADS lexicon. Inter-observer variability was assessed using kappa (κ) statistics. Diagnostic performance on static and video images was compared using the area under the receiver operating characteristic curve. No significant difference was found in κ values between static and video images for all descriptors, although κ values of video images were higher than those of static images for shape, orientation, margin and calcifications. After receiver operating characteristic curve analysis, the video images (0.83, range: 0.77-0.87) had higher areas under the curve than the static images (0.80, range: 0.75-0.83; p = 0.08). Inter-observer variability and diagnostic performance of video images was similar to that of static images on breast ultrasonography according to the new edition of BI-RADS. Copyright © 2016 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Gupta, Tejpal; Nair, Vimoj; Epari, Sridhar; Pietsch, Torsten; Jalali, Rakesh
2012-01-01
There is significant inter-observer variation amongst the neuro-pathologists in the typing, subtyping, and grading of glial neoplasms for diagnosis. Centralized pathology review has been proposed to minimize this inter-observer variation and is now almost mandatory for accrual into multicentric trials. We sought to assess the concordance between neuro-pathologists on histopathological diagnosis of glioblastoma. Comparison of local, institutional, and central neuro-oncopathology reporting in a cohort of 34 patients with newly diagnosed supratentorial glioblastoma accrued consecutively at a tertiary-care institution on a prospective trial testing the addition of a new agent to standard chemo-radiation regimen. Concordance was sub-optimal between local histological diagnosis and central review, fair between local diagnosis and institutional review, and good between institutional and central review, with respect to histological typing/subtyping. Twelve (39%) of 31 patients with local histological diagnosis had identical tumor type, subtype and grade on central review. Overall agreement was modestly better (52%) between local diagnosis and institutional review. In contrast, 28 (83%) of 34 patients had completely concordant histopathologic diagnosis between institutional and central review. The inter-observer reliability test showed poor agreement between local and central review (kappa statistic=0.12, 95% confidence interval (CI): -0.03-0.32, P=0.043), but moderate agreement between institutional and central review (kappa statistic=0.51, 95%CI: 0.17-0.84, P=0.00003). Agreement between local diagnosis and institutional review was fair. There exists significant inter-observer variation regarding histopathological diagnosis of glioblastoma with significant implications for clinical research and practice. There is a need for more objective, quantitative, robust, and reproducible criteria for better subtyping for accurate diagnosis.
Kassam, A M; Tillotson, L; Schranz, P J; Mandalia, V I
2015-01-01
The aim of the study is to show, on an MRI scan, that the posterior border of the anterior horn of the lateral meniscus (AHLM) could guide tibial tunnel position in the sagittal plane and provide anatomical graft position. One hundred MRI scans were analysed with normal cruciate ligaments and no evidence of meniscal injury. We measured the distance between the posterior border of the AHLM and the midpoint of the ACL by superimposing sagittal images. The mean distance between the posterior border of the AHLM and the ACL midpoint was -0.1mm (i.e. 0.1mm posterior to the ACL midpoint). The range was 5mm to -4.6mm. The median value was 0.0mm. 95% confidence interval was from -0.5 to 0.3mm. A normal, parametric distribution was observed and Intra- and inter-observer variability showed significant correlation (p<0.05) using Pearsons Correlation test (intra-observer) and Interclass correlation (inter-observer). Using the posterior border of the AHLM is a reproducible and anatomical marker for the midpoint of the ACL footprint in the majority of cases. It can be used intra-operatively as a guide for tibial tunnel insertion and graft placement allowing anatomical reconstruction. There will inevitably be some anatomical variation. Pre-operative MRI assessment of the relationship between AHLM and ACL footprint is advised to improve surgical planning. Level 4.
[Identification of adverse events in hospitalised influenza patients].
Aranaz-Andrés, J M; Gea-Velázquez de Castro, M T; Jiménez-Pericás, F; Balbuena-Segura, A I; Meyer-García, M C; López-Fresneña, N; Miralles-Bueno, J J; Obón-Azuara, B; Moliner-Lahoz, J; Aibar-Remón, C
2015-01-01
To test the inter-observer agreement in identifying adverse events (AE) in patients hospitalized by flu and undergoing precautionary isolation measures. Historical cohort study, 50 patients undergoing isolation measures due to flu, and 50 patients without any isolation measures. The AE incidence ranges from 10 to 26% depending on the observer (26% [95%CI: 17.4%-34.60%], 10% [95%CI: 4.12%-15.88%], and 23% [95%CI: 14.75%-31.25%]). It was always lower in the cohort undergoing the isolation measures. This difference is statistically significant when the accurate definition of a case is applied. The agreement as regards the screening was good (higher than 76%; Kappa index between 0.29 and 0.81). The agreement as regards the accurate identification of AE related to care was lower (from 50 to 93.3%, Kappa index from 0.20 to 0.70). Before performing an epidemiological study on AE, interobserver concordance must be analyzed to improve the accuracy of the results and the validity of the study. Studies have different levels of reliability. Kappa index shows high levels for the screening guide, but not for the identification of AE. Without a good methodology the results achieved, and thus the decisions made from them, cannot be guaranteed. Researchers have to be sure of the method used, which should be as close as possible to the optimal achievable. Copyright © 2014 SECA. Published by Elsevier Espana. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, M; Woo, B; Kim, J
Purpose: Objective and reliable quantification of imaging phenotype is an essential part of radiogenomic studies. We compared the reproducibility of two semi-automatic segmentation methods for quantitative image phenotyping in magnetic resonance imaging (MRI) of glioblastoma multiforme (GBM). Methods: MRI examinations with T1 post-gadolinium and FLAIR sequences of 10 GBM patients were downloaded from the Cancer Image Archive site. Two semi-automatic segmentation tools with different algorithms (deformable model and grow cut method) were used to segment contrast enhancement, necrosis and edema regions by two independent observers. A total of 21 imaging features consisting of area and edge groups were extracted automaticallymore » from the segmented tumor. The inter-observer variability and coefficient of variation (COV) were calculated to evaluate the reproducibility. Results: Inter-observer correlations and coefficient of variation of imaging features with the deformable model ranged from 0.953 to 0.999 and 2.1% to 9.2%, respectively, and the grow cut method ranged from 0.799 to 0.976 and 3.5% to 26.6%, respectively. Coefficient of variation for especially important features which were previously reported as predictive of patient survival were: 3.4% with deformable model and 7.4% with grow cut method for the proportion of contrast enhanced tumor region; 5.5% with deformable model and 25.7% with grow cut method for the proportion of necrosis; and 2.1% with deformable model and 4.4% with grow cut method for edge sharpness of tumor on CE-T1W1. Conclusion: Comparison of two semi-automated tumor segmentation techniques shows reliable image feature extraction for radiogenomic analysis of GBM patients with multiparametric Brain MRI.« less
Duncan, James R; Kline, Benjamin; Glaiberman, Craig B
2007-04-01
To create and test methods of extracting efficiency data from recordings of simulated renal stent procedures. Task analysis was performed and used to design a standardized testing protocol. Five experienced angiographers then performed 16 renal stent simulations using the Simbionix AngioMentor angiographic simulator. Audio and video recordings of these simulations were captured from multiple vantage points. The recordings were synchronized and compiled. A series of efficiency metrics (procedure time, contrast volume, and tool use) were then extracted from the recordings. The intraobserver and interobserver variability of these individual metrics was also assessed. The metrics were converted to costs and aggregated to determine the fixed and variable costs of a procedure segment or the entire procedure. Task analysis and pilot testing led to a standardized testing protocol suitable for performance assessment. Task analysis also identified seven checkpoints that divided the renal stent simulations into six segments. Efficiency metrics for these different segments were extracted from the recordings and showed excellent intra- and interobserver correlations. Analysis of the individual and aggregated efficiency metrics demonstrated large differences between segments as well as between different angiographers. These differences persisted when efficiency was expressed as either total or variable costs. Task analysis facilitated both protocol development and data analysis. Efficiency metrics were readily extracted from recordings of simulated procedures. Aggregating the metrics and dividing the procedure into segments revealed potential insights that could be easily overlooked because the simulator currently does not attempt to aggregate the metrics and only provides data derived from the entire procedure. The data indicate that analysis of simulated angiographic procedures will be a powerful method of assessing performance in interventional radiology.
Holland-Letz, Tim; Endres, Heinz G; Biedermann, Stefanie; Mahn, Matthias; Kunert, Joachim; Groh, Sabine; Pittrow, David; von Bilderling, Peter; Sternitzky, Reinhardt; Diehm, Curt
2007-05-01
The reliability of ankle-brachial index (ABI) measurements performed by different observer groups in primary care has not yet been determined. The aims of the study were to provide precise estimates for all effects influencing the variability of the ABI (patients' individual variability, intra- and inter-observer variability), with particular focus on the performance of different observer groups. Using a partially balanced incomplete block design, 144 unselected individuals aged > or = 65 years underwent double ABI measurements by one vascular surgeon or vascular physician, one family physician and one nurse with training in Doppler sonography. Three groups comprising a total of 108 individuals were analyzed (only two with ABI < 0.90). Errors for two repeated measurements for all three observer groups did not differ (experts 8.5%, family physicians 7.7%, and nurses 7.5%, p = 0.39). There was no relevant bias among observer groups. Intra-observer variability expressed as standard deviation divided by the mean was 8%, and inter-observer variability was 9%. In conclusion, reproducibility of the ABI measurement was good in this cohort of elderly patients who almost all had values in the normal range. The mean error of 8-9% within or between observers is smaller than with established screening measures. Since there were no differences among observers with different training backgrounds, our study confirms the appropriateness of ABI assessment for screening peripheral arterial disease (PAD) and generalized atherosclerosis in the primary case setting. Given the importance of the early detection and management of PAD, this diagnostic tool should be used routinely as a standard for PAD screening. Additional studies will be required to confirm our observations in patients with PAD of various severities.
Deferm, Julie T; Schreurs, Ruud; Baan, Frank; Bruggink, Robin; Merkx, Matthijs A W; Xi, Tong; Bergé, Stefaan J; Maal, Thomas J J
2018-04-01
The purpose of this study was to assess the feasibility of 3D intraoral scanning for documentation of palatal soft tissue by evaluating the accuracy of shape, color, and curvature. Intraoral scans of ten participants' upper dentition and palate were acquired with the TRIOS® 3D intraoral scanner by two observers. Conventional impressions were taken and digitized as a gold standard. The resulting surface models were aligned using an Iterative Closest Point approach. The absolute distance measurements between the intraoral models and the digitized impression were used to quantify the trueness and precision of intraoral scanning. The mean color of the palatal soft tissue was extracted in HSV (hue, saturation, value) format to establish the color precision. Finally, the mean curvature of the surface models was calculated and used for surface irregularity. Mean average distance error between the conventional impression models and the intraoral models was 0.02 ± 0.07 mm (p = 0.30). Mean interobserver color difference was - 0.08 ± 1.49° (p = 0.864), 0.28 ± 0.78% (p = 0.286), and 0.30 ± 1.14% (p = 0.426) for respectively hue, saturation, and value. The interobserver differences for overall and maximum surface irregularity were 0.01 ± 0.03 and 0.00 ± 0.05 mm. This study supports the hypothesis that the intraoral scan can perform a 3D documentation of palatal soft tissue in terms of shape, color, and curvature. An intraoral scanner can be an objective tool, adjunctive to the clinical examination of the palatal tissue.
Morilla-Herrera, J C; Morales-Asencio, J M; Fernández-Gallego, M C; Cobos, E Berrobianco; Romero, A Delgado
2011-01-01
Self-care and management of therapeutic regime (drugs adherence, preventive behaviours and development of healthy life-styles) are key components for managing chronic diseases. Nursing has standardized languages which describe many of these situations, such as the diagnosis "Ineffective Self Health Management" (ISHM) or many of the Nursing Outcomes Classification (NOC) indicators. The aims of this study were to determine the interobserver reliability of a NOC-based instrument for assessment and aid in diagnosis of the ISHM in patients with chronic conditions in Primary Health Care, to determine its diagnostic validity and to describe the prevalence of patients with this problem. Cross-sectional validation study developed in the provinces of Málaga, Cádiz and Almería from 2006 to 2009. Each patient was assessed by 3 independent observers: the first two observers evaluated scoring of the NOC indicators and the third one acted as the "gold-standard". Two hundred and twenty-eight patients were included, 37.7% of them with more than one chronic condition. NOC indicators showed a high interobserver reliability (ICC>0,70) and a consistency (Cronbach's alpha: 0.81). With a cut-point of 10.5, sensitivity was 61% and specificity 85%, and the area under the curve was 0.81 (CI95%: 0.77 to 0.85). The prevalence of patients with ISHM was 36% (CI 95%: 34 to 40). The use of NOC indicators allows evaluation of management of the therapeutic regime in people with chronic conditions with a satisfactory validity and it provides new approaches for dealing with this problem.
Azzam, Michael G; Lenarz, Christopher J; Farrow, Lutul D; Israel, Heidi A; Kieffer, David A; Kaar, Scott G
2011-08-01
To validate the use of the clock face reference as a reliable means of communicating femoral intercondylar notch position. A single red mark was made on ten identical left Sawbones femurs in the intercondylar notch at variable locations. Ten surgeons, who routinely perform ACL reconstructions, were presented the femurs in random order and asked to state the position of the mark to the nearest 30-min interval. Responses were recorded and then repeated 3 weeks later. The same 10 surgeons were presented with 30 actual arthroscopic photographs of the intercondylar notch, performed at 90° of knee flexion, with a probe pointing at various locations (10 knees; 3 photographs/knee) along the lateral aspect of the notch. The results were then analyzed with an ICC, Cronbach's alpha test, and descriptive statistics. For the Sawbones, the ICC was 0.996 while individual physician's Cronbach's alpha test ranged from 0.954 to 0.999, indicating a very high interobserver and intraobserver reliability. The mean range of responses among the 10 surgeons was 1.6 h, SD 0.6. For the photographs, the ICC was also high at 0.997. There was a mean range of 1.1 h, SD 0.4, among surgeons. The clock face method is commonly utilized for both placement of the femoral tunnel during ACL reconstruction as well as describing the location of the ACL femoral tunnel between communicating surgeons. Despite a high statistical interobserver correlation, there is significant range among different surgeons' responses. The present study questions the reliability of the clock face method for use between surgeons as a stand alone tool. Other methods also utilizing anatomic landmarks may be more accurate for describing intercondylar notch anatomy. III.
Benatti, Lucia; Corvi, Federico; Tomasso, Livia; Mercuri, Stefano; Querques, Lea; Ricceri, Fulvio; Bandello, Francesco; Querques, Giuseppe
2017-06-01
To analyze the inter-methods agreement in arteriovenous ratio (AVR) evaluation between spectral-domain optical coherence tomography (SD-OCT) and Dynamic Vessel Analyzer (DVA). Healthy volunteers underwent DVA and SD-OCT examination. AVR was measured by SD-OCT using the four external lines of the optic nerve head-centered 7-line cube and by DVA using an automated AVR estimation. The mean AVR was calculated, twice, separately by two independent readers for each tool. Twenty-two eyes of 11 healthy subjects (five women and six men, mean age 35) were included. AVR analysis by DVA showed high inter-observer agreement between reader 1 and 2, and high intra-observer agreement for both reader 1 and reader 2. With regard to AVR analysis on SD-OCT, we found high inter-observer agreement between reader 1 and 2, and low intra-observer agreement for reader 2 but high intra-observer agreement for reader 1. Overall, the mean AVR measured on SD-OCT turned out to be significantly higher than mean AVR measured through DVA (reader 1, 0.9023 ± 0.06 vs 0.8036 ± 0.08; p < 0.001, and reader 2, 0.9067 ± 0.06 vs 0.8083 ± 0.05; p= 0.003). No inter-method agreement in AVR could be detected in the present study due to bias in measurements (shift between DVA and SD-OCT). We found significant difference in the two noninvasive methods for AVR measurement, with a tendency for SD-OCT to overestimate retinal vascular caliber in comparison to DVA. This may be useful for achieving greater accuracy in the evaluation of retinal vessel in ocular as well as systemic diseases.
Simplified Radiographic Damage Index for Affected Joints in Chronic Gouty Arthritis
2016-01-01
The aim of this study was to develop and validate a new radiographic damage scoring method (DAmagE index of GoUt; DAEGU) in chronic gout using plain radiography. Two independent observers scored foot x-rays from 15 patients with chronic gout according to the DAEGU method and the modified Sharp/van der Heijde (SvdH) method. The 10 metatarsophalangeal (MTP) and 2 interphalangeal (IP) joints of the first toes of both feet were scored to assess the degrees of erosion and joint space narrowing (JSN). The intraobserver and interobserver reliabilities were analyzed by calculating the intraclass correlation coefficient (ICC) and minimal detectable change (MDC). The correlation between the DAEGU and SvdH methods was analyzed by calculating the Spearman's rho correlation coefficients and Kappa coefficients. The DAEGU method was found to be highly reproducible (0.945–0.987 for the intraobserver and 0.993–0.996 for the interobserver ICC values). The erosion, JSN, and total scores exhibited strong positive correlations between the DAEGU and SvdH methods and also within each method (r = 0.860–0.969, P < 0.001 for all parameters). The DAEGU and SvdH methods were in very good agreement as determined by Kappa coefficient analysis [0.732 (0.387–1.000) for erosion and 1.000 (1.000–1.000) for JSN]. In conclusion, this study revealed that DAEGU method was a reliable and feasible tool in the assessment of radiographic damage in chronic gout. The DAEGU method may provide a more easy assessment of structural damage in chronic gout in the real clinical practice. PMID:26955246
Pomerleau, J; Knai, C; Foster, C; Rutter, H; Darmon, N; Derflerova Brazdova, Z; Hadziomeragic, A F; Pekcan, G; Pudule, I; Robertson, A; Brunner, E; Suhrcke, M; Gabrijelcic Blenkus, M; Lhotska, L; Maiani, G; Mistura, L; Lobstein, T; Martin, B W; Elinder, L S; Logstrup, S; Racioppi, F; McKee, M
2013-03-01
The authors designed an instrument to measure objectively aspects of the built and food environments in urban areas, the EURO-PREVOB Community Questionnaire, within the EU-funded project 'Tackling the social and economic determinants of nutrition and physical activity for the prevention of obesity across Europe' (EURO-PREVOB). This paper describes its development, reliability, validity, feasibility and relevance to public health and obesity research. The Community Questionnaire is designed to measure key aspects of the food and built environments in urban areas of varying levels of affluence or deprivation, within different countries. The questionnaire assesses (1) the food environment and (2) the built environment. Pilot tests of the EURO-PREVOB Community Questionnaire were conducted in five to 10 purposively sampled urban areas of different socio-economic status in each of Ankara, Brno, Marseille, Riga, and Sarajevo. Inter-rater reliability was compared between two pairs of fieldworkers in each city centre using three methods: inter-observer agreement (IOA), kappa statistics, and intraclass correlation coefficients (ICCs). Data were collected successfully in all five cities. Overall reliability of the EURO-PREVOB Community Questionnaire was excellent (inter-observer agreement (IOA) > 0.87; intraclass correlation coefficients (ICC)s > 0.91 and kappa statistics > 0.7. However, assessment of certain aspects of the quality of the built environment yielded slightly lower IOA coefficients than the quantitative aspects. The EURO-PREVOB Community Questionnaire was found to be a reliable and practical observational tool for measuring differences in community-level data on environmental factors that can impact on dietary intake and physical activity. The next step is to evaluate its predictive power by collecting behavioural and anthropometric data relevant to obesity and its determinants. Copyright © 2013 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.
Spherical subjective refraction with a novel 3D virtual reality based system.
Pujol, Jaume; Ondategui-Parra, Juan Carlos; Badiella, Llorenç; Otero, Carles; Vilaseca, Meritxell; Aldaba, Mikel
To conduct a clinical validation of a virtual reality-based experimental system that is able to assess the spherical subjective refraction simplifying the methodology of ocular refraction. For the agreement assessment, spherical refraction measurements were obtained from 104 eyes of 52 subjects using three different methods: subjectively with the experimental prototype (Subj.E) and the classical subjective refraction (Subj.C); and objectively with the WAM-5500 autorefractor (WAM). To evaluate precision (intra- and inter-observer variability) of each refractive tool independently, 26 eyes were measured in four occasions. With regard to agreement, the mean difference (±SD) for the spherical equivalent (M) between the new experimental subjective method (Subj.E) and the classical subjective refraction (Subj.C) was -0.034D (±0.454D). The corresponding 95% Limits of Agreement (LoA) were (-0.856D, 0.924D). In relation to precision, intra-observer mean difference for the M component was 0.034±0.195D for the Subj.C, 0.015±0.177D for the WAM and 0.072±0.197D for the Subj.E. Inter-observer variability showed worse precision values, although still clinically valid (below 0.25D) in all instruments. The spherical equivalent obtained with the new experimental system was precise and in good agreement with the classical subjective routine. The algorithm implemented in this new system and its optical configuration has been shown to be a first valid step for spherical error correction in a semiautomated way. Copyright © 2016 Spanish General Council of Optometry. Published by Elsevier España, S.L.U. All rights reserved.
Construct Validity and Reliability of the SARA Gait and Posture Sub-scale in Early Onset Ataxia
Lawerman, Tjitske F.; Brandsma, Rick; Verbeek, Renate J.; van der Hoeven, Johannes H.; Lunsing, Roelineke J.; Kremer, Hubertus P. H.; Sival, Deborah A.
2017-01-01
Aim: In children, gait and posture assessment provides a crucial marker for the early characterization, surveillance and treatment evaluation of early onset ataxia (EOA). For reliable data entry of studies targeting at gait and posture improvement, uniform quantitative biomarkers are necessary. Until now, the pediatric test construct of gait and posture scores of the Scale for Assessment and Rating of Ataxia sub-scale (SARA) is still unclear. In the present study, we aimed to validate the construct validity and reliability of the pediatric (SARAGAIT/POSTURE) sub-scale. Methods: We included 28 EOA patients [15.5 (6–34) years; median (range)]. For inter-observer reliability, we determined the ICC on EOA SARAGAIT/POSTURE sub-scores by three independent pediatric neurologists. For convergent validity, we associated SARAGAIT/POSTURE sub-scores with: (1) Ataxic gait Severity Measurement by Klockgether (ASMK; dynamic balance), (2) Pediatric Balance Scale (PBS; static balance), (3) Gross Motor Function Classification Scale -extended and revised version (GMFCS-E&R), (4) SARA-kinetic scores (SARAKINETIC; kinetic function of the upper and lower limbs), (5) Archimedes Spiral (AS; kinetic function of the upper limbs), and (6) total SARA scores (SARATOTAL; i.e., summed SARAGAIT/POSTURE, SARAKINETIC, and SARASPEECH sub-scores). For discriminant validity, we investigated whether EOA co-morbidity factors (myopathy and myoclonus) could influence SARAGAIT/POSTURE sub-scores. Results: The inter-observer agreement (ICC) on EOA SARAGAIT/POSTURE sub-scores was high (0.97). SARAGAIT/POSTURE was strongly correlated with the other ataxia and functional scales [ASMK (rs = -0.819; p < 0.001); PBS (rs = -0.943; p < 0.001); GMFCS-E&R (rs = -0.862; p < 0.001); SARAKINETIC (rs = 0.726; p < 0.001); AS (rs = 0.609; p = 0.002); and SARATOTAL (rs = 0.935; p < 0.001)]. Comorbid myopathy influenced SARAGAIT/POSTURE scores by concurrent muscle weakness, whereas comorbid myoclonus predominantly influenced SARAKINETIC scores. Conclusion: In young EOA patients, separate SARAGAIT/POSTURE parameters reveal a good inter-observer agreement and convergent validity, implicating the reliability of the scale. In perspective of incomplete discriminant validity, it is advisable to interpret SARAGAIT/POSTURE scores for comorbid muscle weakness. PMID:29326569
Differentiation of periapical granulomas and cysts by using dental MRI: a pilot study.
Juerchott, Alexander; Pfefferle, Thorsten; Flechtenmacher, Christa; Mente, Johannes; Bendszus, Martin; Heiland, Sabine; Hilgenfeld, Tim
2018-05-17
The purpose of this pilot study was to evaluate whether periapical granulomas can be differentiated from periapical cysts in vivo by using dental magnetic resonance imaging (MRI). Prior to apicoectomy, 11 patients with radiographically confirmed periapical lesions underwent dental MRI, including fat-saturated T2-weighted (T2wFS) images, non-contrast-enhanced T1-weighted images with and without fat saturation (T1w/T1wFS), and contrast-enhanced fat-saturated T1-weighted (T1wFS+C) images. Two independent observers performed structured image analysis of MRI datasets twice. A total of 15 diagnostic MRI criteria were evaluated, and histopathological results (6 granulomas and 5 cysts) were compared with MRI characteristics. Statistical analysis was performed using intraclass correlation coefficient (ICC), Cohen's kappa (κ), Mann-Whitney U-test and Fisher's exact test. Lesion identification and consecutive structured image analysis was possible on T2wFS and T1wFS+C MRI images. A high reproducibility was shown for MRI measurements of the maximum lesion diameter (intraobserver ICC = 0.996/0.998; interobserver ICC = 0.997), for the "peripheral rim" thickness (intraobserver ICC = 0.988/0.984; interobserver ICC = 0.970), and for all non-quantitative MRI criteria (intraobserver-κ = 0.990/0.995; interobserver-κ = 0.988). In accordance with histopathological results, six MRI criteria allowed for a clear differentiation between cysts and granulomas: (1) outer margin of lesion, (2) texture of "peripheral rim" in T1wFS+C, (3) texture of "lesion center" in T2wFS, (4) surrounding tissue involvement in T2wFS, (5) surrounding tissue involvement in T1wFS+C and (6) maximum "peripheral rim" thickness (all: P < 0.05). In conclusion, this pilot study indicates that radiation-free dental MRI enables a reliable differentiation between periapical cysts and granulomas in vivo. Thus, MRI may substantially improve treatment strategies and help to avoid unnecessary surgery in apical periodontitis.
Histopathological Identification of Colon Cancer with Microsatellite Instability
Alexander, Julian; Watanabe, Toshiaki; Wu, Tsung-Teh; Rashid, Asif; Li, Shuan; Hamilton, Stanley R.
2001-01-01
Cancer with high levels of microsatellite instability (MSI-H) is the hallmark of hereditary nonpolyposis colorectal cancer syndrome, and MSI-H occurs in ∼15% of sporadic colorectal carcinomas that have improved prognosis. We examined the utility of histopathology for the identification of MSI-H cancers by evaluating the features of 323 sporadic carcinomas using specified criteria and comparing the results to MSI-H status. Coded hematoxylin and eosin sections were evaluated for tumor features (signet ring cells; mucinous histology; cribriforming, poor differentiation, and medullary-type pattern; sponge-like mucinous growth; pushing invasive margin) and features of host immune response (Crohn’s-like lymphoid reaction, intratumoral lymphocytic infiltrate, and intraepithelial T cells by immunohistochemistry for CD3 with morphometry). Interobserver variation among five pathologists was determined. Subjective interpretation of histopathology as an indication for MSI testing was recorded. We found that medullary carcinoma, intraepithelial lymphocytosis, and poor differentiation were the best discriminators between MSI-H and microsatellite-stable cancers (odds ratio: 37.8, 9.8, and 4.0, respectively; P = 0.000003 to <0.000001) with high specificity (99 to 87%). The sensitivities, however, were very low (14 to 38%), and interobserver agreement was good only for evaluation of poor differentiation (kappa, 0.69). Mucinous histopathological type and presence of signet ring cells had low odds ratios of 3.3 and 2.7 (P = 0.005 and P = 0.02) with specificities of 95% but sensitivities of only 15 and 13%. Subjective interpretation of the overall histopathology as suggesting MSI-H performed better than any individual feature; the odds ratio was 7.5 (P < 0.000001) with sensitivity of 49%, specificity of 89%, and moderate interobserver agreement (kappa, 0.52). Forty intraepithelial CD3-positive lymphocytes/0.94 mm2, as established by receiver operating characteristic curve analysis, resulted in an odds ratio of 6.0 (P < 0.000001) with sensitivity of 75% and specificity of 67%. Our findings indicate that histopathological evaluation can be used to prioritize sporadic colon cancers for MSI studies, but morphological prediction of MSI-H has low sensitivity, requiring molecular analysis for therapeutic decisions. PMID:11159189
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pogson, EM; University of Wollongong, Wollongong, NSW; Liverpool and Macarthur Cancer Therapy Centres, Liverpool, NSW
2016-06-15
Purpose: Breast cancers predominantly arise from Glandular Breast Tissue (GBT). If the GBT can be treated effectively post-operatively utilising radiotherapy this may be adequate volumetric coverage for adjuvant breast radiotherapy. Adequate imaging of the GBT is necessary and will be assessed between MRI and CT modalities. GBT visualisation is acknowledged to be qualitatively superior on Magnetic Resonance Image (MRI) compared to Computed Tomography (CT), the current radiotherapy imaging standard, however this has not been quantitatively assessed. For radiotherapy purposes it is important that any treatment volume can be consistently defined between observers. This study investigates the consistency of CT andmore » MRI GBT contours for potential radiotherapy planning. Methods: Ten experts (9 breast radiation oncologists and 1 radiologist) contoured the extent of the visible GBT for 33 patients on MRI and CT (both without contrast), which was performed according to a contouring guideline in supine and prone patient positions. The GBT volume was not a conventional whole breast radiotherapy planning volume, but rather the extent of GBT that was indicated from the CT or MR imaging. Volumes were compared utilizing the dice similarity coefficient (DSC), kappa statistic, and Hausdorff Distances (HDs) to ascertain the modality that was most consistently volumed. Results: The inter-observer concordance was of substantial agreement (kappa above 0.6) for the CT supine, CT prone, MRI supine and MRI prone datasets. The MRI GBT volumes were larger than the CT GBT volumes (p<0.001). Inter-observer conformity was higher for CT than MRI, although the magnitude of this difference was small (VOI<0.04). Conformity between modalities (CT and MRI) was in agreement for both prone and supine, DSC=0.75. Prone GBT volumes were larger than supine for both MRI and CT. Conclusion: MRI improves the extent of GBT delineation. The role of MRI guided, GBT-targeted radiotherapy requires investigation in a clinical trial. This work was supported by a grant number APP1033237 from Cancer Australia and the National Breast Cancer Foundation.« less
Soukup, Viktor; Čapoun, Otakar; Cohen, Daniel; Hernández, Virginia; Babjuk, Marek; Burger, Max; Compérat, Eva; Gontero, Paolo; Lam, Thomas; MacLennan, Steven; Mostafid, A Hugh; Palou, Joan; van Rhijn, Bas W G; Rouprêt, Morgan; Shariat, Shahrokh F; Sylvester, Richard; Yuan, Yuhong; Zigeuner, Richard
2017-11-01
Tumour grade is an important prognostic indicator in non-muscle-invasive bladder cancer (NMIBC). Histopathological classifications are limited by interobserver variability (reproducibility), which may have prognostic implications. European Association of Urology NMIBC guidelines suggest concurrent use of both 1973 and 2004/2016 World Health Organization (WHO) classifications. To compare the prognostic performance and reproducibility of the 1973 and 2004/2016 WHO grading systems for NMIBC. A systematic literature search was undertaken incorporating Medline, Embase, and the Cochrane Library. Studies were critically appraised for risk of bias (QUIPS). For prognosis, the primary outcome was progression to muscle-invasive or metastatic disease. Secondary outcomes were disease recurrence, and overall and cancer-specific survival. For reproducibility, the primary outcome was interobserver variability between pathologists. Secondary outcome was intraobserver variability (repeatability) by the same pathologist. Of 3593 articles identified, 20 were included in the prognostic review; three were eligible for the reproducibility review. Increasing tumour grade in both classifications was associated with higher disease progression and recurrence rates. Progression rates in grade 1 patients were similar to those in low-grade patients; progression rates in grade 3 patients were higher than those in high-grade patients. Survival data were limited. Reproducibility of the 2004/2016 system was marginally better than that of the 1973 system. Two studies on repeatability showed conflicting results. Most studies had a moderate to high risk of bias. Current grading classifications in NMIBC are suboptimal. The 1973 system identifies more aggressive tumours. Intra- and interobserver variability was slightly less in the 2004/2016 classification. We could not confirm that the 2004/2016 classification outperforms the 1973 classification in prediction of recurrence and progression. This article summarises the utility of two different grading systems for non-muscle-invasive bladder cancer. Both systems predict progression and recurrence, although pathologists vary in their reporting; suggestions for further improvements are made. Copyright © 2017 European Association of Urology. Published by Elsevier B.V. All rights reserved.
Mudford, Oliver C; Taylor, Sarah Ann; Martin, Neil T
2009-01-01
We reviewed all research articles in 10 recent volumes of the Journal of Applied Behavior Analysis (JABA): Vol. 28(3), 1995, through Vol. 38(2), 2005. Continuous recording was used in the majority (55%) of the 168 articles reporting data on free-operant human behaviors. Three methods for reporting interobserver agreement (exact agreement, block-by-block agreement, and time-window analysis) were employed in more than 10 of the articles that reported continuous recording. Having identified these currently popular agreement computation algorithms, we explain them to assist researchers, software writers, and other consumers of JABA articles.
Labrecque, M; Dostaler, L P; Dumont, H; Huard, G; Laflamme, L
1993-01-01
OBJECTIVE: To determine the interobserver reliability of tympanograms obtained with the MicroTymp, a portable tympanometer. SETTING: Family medicine teaching unit in a tertiary care hospital. PATIENTS: Thirty-three patients who presented to the ear, nose and throat clinic in August 1990 for an ear problem. INTERVENTION: Three residents in family medicine independently attempted to record with the MicroTymp one tympanogram for the 66 ears. We excluded the results for seven ears for which tympanograms could not be obtained. MAIN OUTCOME MEASURE: Using objective criteria, two family physicians and two residents in family medicine independently classified the 177 tympanograms into five categories (normal, possible effusion, possible perforation, possible tympano-ossicular dysfunction and unclassifiable). Reliability was estimated by means of the kappa (kappa) coefficient on 161 tympanograms from 59 ears for which the interpretation of the three tympanograms agreed. MAIN RESULTS: The interpretation of the three tympanograms agreed for 34 of the 59 ears (0.58) (kappa = 0.52, 95% confidence limits 0.45 and 0.59). There was no significant difference in interobserver reliability between pairs of observers or between symptomatic and asymptomatic ears. CONCLUSIONS: The interobserver reliability of the MicroTymp is moderate. The tympanograms obtained with the instrument should be interpreted in the context of the clinical findings. PMID:8431817
The validity and reliability of a simple semantic classification of foot posture.
Cross, Hugh A; Lehman, Linda
2008-12-01
The Simple Semantic Classification (SSC) is described as a pragmatic method to assist in the assessment of the weight bearing foot. It was designed for application by therapists and technicians working in underdeveloped situations, after they have had basic orientation in foot function. To present evidence of the validity and inter observer reliability of the SSC. 13 physiotherapists from LEPRA India projects and 12 physical therapists functioning within the National Programme for the Elimination of Hansen's Disease (PNEH), Brazil, participated in an inter-observer exercise. Inter-observer agreement was gauged using the Kappa statistic. The results of the inter-observer exercise were dependent on observations of foot posture made from photographs. This was necessary to ensure that the procedure was standardised for participants in different countries. The method had limitations which were partly reflected in the results. The level of agreement between the principle investigator and Indian physiotherapists was Kappa = 058. The level of agreement between Brazilian physical therapists and the principle investigator was Kappa = 0.70. The authors opine that the results were sufficiently compelling to suggest that the Simple Semantic Classification can be used as a field method to identify people at increased risk of foot pathologies.
Haggerty, Christopher M; Kramer, Sage P; Binkley, Cassi M; Powell, David K; Mattingly, Andrea C; Charnigo, Richard; Epstein, Frederick H; Fornwalt, Brandon K
2013-08-27
Advanced measures of cardiac function are increasingly important to clinical assessment due to their superior diagnostic and predictive capabilities. Cine DENSE cardiovascular magnetic resonance (CMR) is ideal for quantifying advanced measures of cardiac function based on its high spatial resolution and streamlined post-processing. While many studies have utilized cine DENSE in both humans and small-animal models, the inter-test and inter-observer reproducibility for quantification of advanced cardiac function in mice has not been evaluated. This represents a critical knowledge gap for both understanding the capabilities of this technique and for the design of future experiments. We hypothesized that cine DENSE CMR would show excellent inter-test and inter-observer reproducibility for advanced measures of left ventricular (LV) function in mice. Five normal mice (C57BL/6) and four mice with depressed cardiac function (diet-induced obesity) were imaged twice, two days apart, on a 7T ClinScan MR system. Images were acquired with 15-20 frames per cardiac cycle in three short-axis (basal, mid, apical) and two long-axis orientations (4-chamber and 2-chamber). LV strain, twist, torsion, and measures of synchrony were quantified. Images from both days were analyzed by one observer to quantify inter-test reproducibility, while inter-observer reproducibility was assessed by a second observer's analysis of day-1 images. The coefficient of variation (CoV) was used to quantify reproducibility. LV strains and torsion were highly reproducible on both inter-observer and inter-test bases with CoVs ≤ 15%, and inter-observer reproducibility was generally better than inter-test reproducibility. However, end-systolic twist angles showed much higher variance, likely due to the sensitivity of slice location within the sharp longitudinal gradient in twist angle. Measures of synchrony including the circumferential (CURE) and radial (RURE) uniformity of strain indices, showed excellent reproducibility with CoVs of 1% and 3%, respectively. Finally, peak measures (e.g., strains) were generally more reproducible than the corresponding rates of change (e.g., strain rate). Cine DENSE CMR is a highly reproducible technique for quantification of advanced measures of left ventricular cardiac function in mice including strains, torsion and measures of synchrony. However, myocardial twist angles are not reproducible and future studies should instead report torsion.
Tewes, S; Rodt, T; Marquardt, S; Evangelidou, E; Wacker, F K; von Falck, C
2013-11-01
Evaluation of the potential usability of an iPad 3 with a high-resolution display in CT emergency diagnosis compared to a 3 D PACS workstation. 3 readers used a 5-point Likert scale to evaluate 40 CCT scans and 40 CTPA scans to determine the detectability of early signs of infarction in CCT or segmental and subsegmental pulmonary embolisms in CT angiography of the pulmonary arteries (CTPA) on the iPad 3 (Apple Inc., USA) using an application for image viewing (Visage Ease, Visage Imaging GmbH, Berlin) and on a 3 D PACS workstation (Visage 7.1, Visage Imaging, Berlin) using a certified monitor for image viewing. The results were compared using the Wilcoxon rank sum test, Spearman's correlation coefficient, and a kappa statistic. There was no significant difference in the median evaluations for the readings of both the CCT scans and the CTPA scans on the iPad 3 and on the workstation (p > 0.05) for all three readers. The mean Spearman's correlation coefficient for CCT and CTPA was 0.46 (± 0.2) and 0.69 (± 0.16), respectively, for the comparison iPad/PACS, 0.41 (± 0.16) and 0.68 (± 0.06), respectively, for the interobserver agreement on the iPad, and 0.35 (± 0.05) and 0.68 (± 0.10), respectively, for the interobserver agreement on the PACS. Mean kappa values for CCT of 0.52 (± 0.17) for the comparison iPad/PACS and 0.33 (± 0.16) and 0.32 (± 0.16), respectively, for the interobserver agreement on the iPad and the PACS were achieved. For CTPA average kappa values of 0.67 (± 0.19) were calculated for the comparison iPad/PACS and 0.69 (± 0.08) and 0.60 (± 0.14), respectively, for the interobserver concordance on the iPad 3 and the PACS. All differences were not statistically significant (p > 0.05). The variability of the interpretation of typical emergency scans on an iPad 3 with a high-resolution display and on a 3 D PACS workstation does not differ from the interobserver variability. © Georg Thieme Verlag KG Stuttgart · New York.
Razek, Ahmed Abdel Khalek Abdel; Gaballa, Gada; Megahed, Abdel Salam; Elmogy, Ebrahiem
2013-11-01
To evaluate vasculature of arteriovenous malformations (AVMs) of head and neck with time resolved imaging of contrast kinetics (TRICKS) MR angiography (MRA). Prospective study was conducted upon 19 patients (age range, 12-29 years; mean age 18 years; 10 males and 9 females) with AVM of head and neck. TRICKS-MRA of head and neck was performed during injection of contrast medium. Post processing with reconstruction of the images was done. Two independent readers assessed the overall TRICKS-MRA image quality score using a 5-point scale and depiction of the main arterial feeders, nidus, and venous drainage using 3 points scale. The Kappa test for interobserver agreement was done. The AVMs were evaluated morphologically in terms of number and origin of the main arterial feeders, the location and size of nidus either small (>2 cm) or large (>2 cm) and the draining veins into the superficial or deep venous drainage. The average TRICKS-MRA image quality score as judged by reader 1 was 3.89 ± 1.15 and that as judged by reader 2 was 3.89 ± 0.10, which yielded excellent interobserver agreement (k=0.77, 95% CI=0.53-0.98, r=0.78, P=0.001). The interobserver agreement of both readers was excellent for the arterial feeders (k=0.81, 95% CI=0.57-1.00, r=0.83, P=0.001), excellent for the nidus (k=0.91, 95% CI=0.75-1.00, r=0.92, P=0.001), and good for the venous drainage (k=0.77, 95% CI=0.53-0.98, r=0.78, P=0.001). The arterial feeders were single (n=14) or multiple (n=5), the nidus was large (n=16) or small (n=3) and the venous drainage was into the internal jugular (n=17) or the external jugular (n=2) veins. Three patients with small nidus and single arterial feeder were treated with sclerotherapy. Eleven patients with large nidus and single arterial feeder were referred for embolization. Combined embolization and surgery were done for five patients with large nidus and multiple arterial feeders. We concluded that TRICKS-MRA is a reliable non invasive tool for evaluation of the feeding arteries, the nidus and the draining veins of AVMs of head and neck. TRICKS-MRA can be used for evaluation and treatment planning of AVMs of head and neck. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Sanyal, Parikshit; Ganguli, Prosenjit; Barui, Sanghita; Deb, Prabal
2018-01-01
The Pap stained cervical smear is a screening tool for cervical cancer. Commercial systems are used for automated screening of liquid based cervical smears. However, there is no image analysis software used for conventional cervical smears. The aim of this study was to develop and test the diagnostic accuracy of a software for analysis of conventional smears. The software was developed using Python programming language and open source libraries. It was standardized with images from Bethesda Interobserver Reproducibility Project. One hundred and thirty images from smears which were reported Negative for Intraepithelial Lesion or Malignancy (NILM), and 45 images where some abnormality has been reported, were collected from the archives of the hospital. The software was then tested on the images. The software was able to segregate images based on overall nuclear: cytoplasmic ratio, coefficient of variation (CV) in nuclear size, nuclear membrane irregularity, and clustering. 68.88% of abnormal images were flagged by the software, as well as 19.23% of NILM images. The major difficulties faced were segmentation of overlapping cell clusters and separation of neutrophils. The software shows potential as a screening tool for conventional cervical smears; however, further refinement in technique is required.
Brain tumor classification using AFM in combination with data mining techniques.
Huml, Marlene; Silye, René; Zauner, Gerald; Hutterer, Stephan; Schilcher, Kurt
2013-01-01
Although classification of astrocytic tumors is standardized by the WHO grading system, which is mainly based on microscopy-derived, histomorphological features, there is great interobserver variability. The main causes are thought to be the complexity of morphological details varying from tumor to tumor and from patient to patient, variations in the technical histopathological procedures like staining protocols, and finally the individual experience of the diagnosing pathologist. Thus, to raise astrocytoma grading to a more objective standard, this paper proposes a methodology based on atomic force microscopy (AFM) derived images made from histopathological samples in combination with data mining techniques. By comparing AFM images with corresponding light microscopy images of the same area, the progressive formation of cavities due to cell necrosis was identified as a typical morphological marker for a computer-assisted analysis. Using genetic programming as a tool for feature analysis, a best model was created that achieved 94.74% classification accuracy in distinguishing grade II tumors from grade IV ones. While utilizing modern image analysis techniques, AFM may become an important tool in astrocytic tumor diagnosis. By this way patients suffering from grade II tumors are identified unambiguously, having a less risk for malignant transformation. They would benefit from early adjuvant therapies.
Rosen, Robert; Marmur, Ellen; Anderson, Lawrence; Welburn, Peter; Katsamas, Janelle
2014-12-01
Local skin responses (LSRs) are the most common adverse effects of topical actinic keratosis (AK) therapy. There is currently no method available that allows objective characterization of LSRs. Here, the authors describe a new scale developed to quantitatively and objectively assess the six most common LSRs resulting from topical AK therapy with ingenol mebutate. The LSR grading scale was developed using a 0-4 numerical rating, with clinical descriptors and representative photographic images for each rating. Good inter-observer grading concordance was demonstrated in peer review during development of the tool. Data on the use of the scale are described from four phase III double-blind studies of ingenol mebutate (n = 1,005). LSRs peaked on days 4 (face/scalp) or 8 (trunk/extremities), with mean maximum composite LSR scores of 9.1 and 6.8, respectively, and a rapid return toward baseline by day 15 in most cases. Mean composite LSR score at day 57 was generally lower than at baseline. The LSR grading scale is an objective tool allowing practicing dermatologists to characterize and compare LSRs to existing and, potentially, future AK therapies.
Zweerink, Alwin; Allaart, Cornelis P; Kuijer, Joost P A; Wu, LiNa; Beek, Aernout M; van de Ven, Peter M; Meine, Mathias; Croisille, Pierre; Clarysse, Patrick; van Rossum, Albert C; Nijveldt, Robin
2017-12-01
Although myocardial strain analysis is a potential tool to improve patient selection for cardiac resynchronization therapy (CRT), there is currently no validated clinical approach to derive segmental strains. We evaluated the novel segment length in cine (SLICE) technique to derive segmental strains from standard cardiovascular MR (CMR) cine images in CRT candidates. Twenty-seven patients with left bundle branch block underwent CMR examination including cine imaging and myocardial tagging (CMR-TAG). SLICE was performed by measuring segment length between anatomical landmarks throughout all phases on short-axis cines. This measure of frame-to-frame segment length change was compared to CMR-TAG circumferential strain measurements. Subsequently, conventional markers of CRT response were calculated. Segmental strains showed good to excellent agreement between SLICE and CMR-TAG (septum strain, intraclass correlation coefficient (ICC) 0.76; lateral wall strain, ICC 0.66). Conventional markers of CRT response also showed close agreement between both methods (ICC 0.61-0.78). Reproducibility of SLICE was excellent for intra-observer testing (all ICC ≥0.76) and good for interobserver testing (all ICC ≥0.61). The novel SLICE post-processing technique on standard CMR cine images offers both accurate and robust segmental strain measures compared to the 'gold standard' CMR-TAG technique, and has the advantage of being widely available. • Myocardial strain analysis could potentially improve patient selection for CRT. • Currently a well validated clinical approach to derive segmental strains is lacking. • The novel SLICE technique derives segmental strains from standard CMR cine images. • SLICE-derived strain markers of CRT response showed close agreement with CMR-TAG. • Future studies will focus on the prognostic value of SLICE in CRT candidates.
Salido-Vallejo, R; Ruano, J; Garnacho-Saucedo, G; Godoy-Gijón, E; Llorca, D; Gómez-Fernández, C; Moreno-Giménez, J C
2014-12-01
Tuberous sclerosis complex (TSC) is an autosomal dominant neurocutaneous disorder characterized by the development of multisystem hamartomatous tumours. Topical sirolimus has recently been suggested as a potential treatment for TSC-associated facial angiofibroma (FA). To validate a reproducible scale created for the assessment of clinical severity and treatment response in these patients. We developed a new tool, the Facial Angiofibroma Severity Index (FASI) to evaluate the grade of erythema and the size and extent of FAs. In total, 30 different photographs of patients with TSC were shown to 56 dermatologists at each evaluation. Three evaluations using the same photographs but in a different random order were performed 1 week apart. Test and retest reliability and interobserver reproducibility were determined. There was good agreement between the investigators. Inter-rater reliability showed strong correlations (> 0.98; range 0.97-0.99) with inter-rater correlation coefficients (ICCs) for the FASI. The global estimated kappa coefficient for the degree of intra-rater agreement (test-retest) was 0.94 (range 0.91-0.97). The FASI is a valid and reliable tool for measuring the clinical severity of TSC-associated FAs, which can be applied in clinical practice to evaluate the response to treatment in these patients. © 2014 British Association of Dermatologists.
The reliability of the Hendrich Fall Risk Model in a geriatric hospital.
Heinze, Cornelia; Halfens, Ruud; Dassen, Theo
2008-12-01
Aims and objectives. The purpose of this study was to test the interrater reliability of the Hendrich Fall Risk Model, an instrument to identify patients in a hospital setting with a high risk of falling. Background. Falls are a serious problem in older patients. Valid and reliable fall risk assessment tools are required to identify high-risk patients and to take adequate preventive measures. Methods. Seventy older patients were independently and simultaneously assessed by six pairs of raters made up of nursing staff members. Consensus estimates were calculated using simple percentage agreement and consistency estimates using Spearman's rho and intra class coefficient. Results. Percentage agreement ranged from 0.70 to 0.92 between the six pairs of raters. Spearman's rho coefficients were between 0.54 and 0.80 and the intra class coefficients were between 0.46 and 0.92. Conclusions. Whereas some pairs of raters obtained considerable interobserver agreement and internal consistency, the others did not. Therefore, it is concluded that the Hendrich Fall Risk Model is not a reliable instrument. The use of more unambiguous operationalized items is preferred. Relevance to clinical practice. In practice, well operationalized fall risk assessment tools are necessary. Observer agreement should always be investigated after introducing a standardized measurement tool. © 2008 The Authors. Journal compilation © 2008 Blackwell Publishing Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dance, M; Chera, B; Falchook, A
2015-06-15
Purpose: Validate the consistency of a gradient-based segmentation tool to facilitate accurate delineation of PET/CT-based GTVs in head and neck cancers by comparing against hybrid PET/MR-derived GTV contours. Materials and Methods: A total of 18 head and neck target volumes (10 primary and 8 nodal) were retrospectively contoured using a gradient-based segmentation tool by two observers. Each observer independently contoured each target five times. Inter-observer variability was evaluated via absolute percent differences. Intra-observer variability was examined by percentage uncertainty. All target volumes were also contoured using the SUV percent threshold method. The thresholds were explored case by case so itsmore » derived volume matched with the gradient-based volume. Dice similarity coefficients (DSC) were calculated to determine overlap of PET/CT GTVs and PET/MR GTVs. Results: The Levene’s test showed there was no statistically significant difference of the variances between the observer’s gradient-derived contours. However, the absolute difference between the observer’s volumes was 10.83%, with a range from 0.39% up to 42.89%. PET-avid regions with qualitatively non-uniform shapes and intensity levels had a higher absolute percent difference near 25%, while regions with uniform shapes and intensity levels had an absolute percent difference of 2% between observers. The average percentage uncertainty between observers was 4.83% and 7%. As the volume of the gradient-derived contours increased, the SUV threshold percent needed to match the volume decreased. Dice coefficients showed good agreement of the PET/CT and PET/MR GTVs with an average DSC value across all volumes at 0.69. Conclusion: Gradient-based segmentation of PET volume showed good consistency in general but can vary considerably for non-uniform target shapes and intensity levels. PET/CT-derived GTV contours stemming from the gradient-based tool show good agreement with the anatomically and metabolically more accurate PET/MR-derived GTV contours, but tumor delineation accuracy can be further improved with the use PET/MR.« less
Serafin, Zbigniew; Strześniewski, Piotr; Lasek, Władysław; Beuth, Wojciech
2012-07-01
The use of contrast media and the time-resolved imaging of contrast kinetics (TRICKS) technique have some theoretical advantages over time-of-flight magnetic resonance angiography (TOF-MRA) in the follow-up of intracranial aneurysms after endovascular treatment. We prospectively compared the diagnostic performance of TRICKS and TOF-MRA with digital subtracted angiography (DSA) in the assessment of occlusion of embolized aneurysms. Seventy-two consecutive patients with 72 aneurysms were examined 3 months after embolization. Test characteristics of TOF-MRA and TRICKS were calculated for the detection of residual flow. The results of quantification of flow were compared with weighted kappa. Intraobserver and interobserver reproducibility was determined. The sensitivity of TOF-MRA was 85% (95% CI, 65-96%) and of TRICKS, 89% (95% CI, 70-97%). The specificity of both methods was 91% (95% CI, 79-98%). The accuracy of the flow quantification ranged from 0.76 (TOF-MRA) to 0.83 (TRICKS). There was no significant difference between the methods in the area under the ROC curve regarding both the detection and the quantification of flow. Intraobserver reproducibility was very good with both techniques (kappa, 0.86-0.89). The interobserver reproducibility was moderate for TOF-MRA and very good for TRICKS (kappa, 0.74-0.80). In this study, TOF-MRA and TRICKS presented similar diagnostic performance; therefore, the use of time-resolved contrast-enhanced MRA is not justified in the follow-up of embolized aneurysms.
Interobserver reproducibility of The Paris System for Reporting Urinary Cytology.
Long, Theresa; Layfield, Lester J; Esebua, Magda; Frazier, Shellaine R; Giorgadze, D Tamar; Schmidt, Robert L
2017-01-01
The Paris System for Reporting Urinary Cytology represents a significant improvement in classification of urinary specimens. The system acknowledges the difficulty in cytologically diagnosing low-grade urothelial carcinomas and has developed categories to deal with this issue. The system uses six categories: unsatisfactory, negative for high-grade urothelial carcinoma (NHGUC), atypical urothelial cells, suspicious for high-grade urothelial carcinoma, high-grade urothelial carcinoma, other malignancies and a seventh subcategory (low-grade urothelial neoplasm). Three hundred and fifty-seven urine specimens were independently reviewed by four cytopathologists unaware of the previous diagnoses. Each cytopathologist rendered a diagnosis according to the Paris System categories. Agreement was assessed using absolute agreement and weighted chance-corrected agreement (kappa). Disagreements were classified as low impact and high impact based on the potential impact of a misclassification on clinical management. The average absolute agreement was 65% with an average expected agreement of 44%. The average chance-corrected agreement (kappa) was 0.32. Nine hundred and ninety-nine of 1902 comparisons between rater pairs were in agreement, but 12% of comparisons differed by two or more categories for the category NHGUC. Approximately 15% of the disagreements were classified as high clinical impact. Our findings indicated that the scheme recommended by the Paris System shows adequate precision for the category NHGUC, but the other categories demonstrated unacceptable interobserver variability. This low level of diagnostic precision may negatively impact the applicability of the Paris System for widespread clinical application.
Abdominal auscultation does not provide clear clinical diagnoses.
Durup-Dickenson, Maja; Christensen, Marie Kirk; Gade, John
2013-05-01
Abdominal auscultation is a part of the clinical examination of patients, but the determining factors in bowel sound evaluation are poorly described. The aim of this study was to assess inter- and intra-observer agreement in physicians' evaluation of pitch, intensity and quantity in abdominal auscultation. A total of 100 physicians were presented with 20 bowel sound recordings in a blinded set-up. Recordings had been made in a mix of healthy volunteers and emergency patients. They evaluated pitch, intensity and quantity of bowel sounds in a questionnaire with three, three and four categories of answers, respectively. Fleiss' multi-rater kappa (κ) coefficients were calculated for inter-observer agreement; for intra-observer agreement, calculation of probability was performed. Inter-observer agreement regarding pitch, intensity and quantity yielded κ-values of 0.19 (p < 0.0001), 0.30 (p < 0.0001) and 0.24 (p < 0.0001), respectively, corresponding to slight, fair and fair agreement. Regarding intra-observer agreement, the probability of agreement was 0.55 (95% confidence interval (CI): 0.51-0.59), 0.45 (95% CI: 0.42-0.49) and 0.41 (95% CI: 0.38-0.45) for pitch, intensity and quantity, respectively. Although relatively poor, observer agreement was slight to fair and thus better than expected by chance. Since the diagnostic value of auscultation increases with addition of history and clinics, and may be further improved by systematic training, it should still be used in the examination of patients with acute abdominal pain. not relevant. not relevant.
Validation of scores of use of inhalation devices: valoration of errors *
Zambelli-Simões, Letícia; Martins, Maria Cleusa; Possari, Juliana Carneiro da Cunha; Carvalho, Greice Borges; Coelho, Ana Carla Carvalho; Cipriano, Sonia Lucena; de Carvalho-Pinto, Regina Maria; Cukier, Alberto; Stelmach, Rafael
2015-01-01
Abstract Objective: To validate two scores quantifying the ability of patients to use metered dose inhalers (MDIs) or dry powder inhalers (DPIs); to identify the most common errors made during their use; and to identify the patients in need of an educational program for the use of these devices. Methods: This study was conducted in three phases: validation of the reliability of the inhaler technique scores; validation of the contents of the two scores using a convenience sample; and testing for criterion validation and discriminant validation of these instruments in patients who met the inclusion criteria. Results: The convenience sample comprised 16 patients. Interobserver disagreement was found in 19% and 25% of the DPI and MDI scores, respectively. After expert analysis on the subject, the scores were modified and were applied in 72 patients. The most relevant difficulty encountered during the use of both types of devices was the maintenance of total lung capacity after a deep inhalation. The degree of correlation of the scores by observer was 0.97 (p < 0.0001). There was good interobserver agreement in the classification of patients as able/not able to use a DPI (50%/50% and 52%/58%; p < 0.01) and an MDI (49%/51% and 54%/46%; p < 0.05). Conclusions: The validated scores allow the identification and correction of inhaler technique errors during consultations and, as a result, improvement in the management of inhalation devices. PMID:26398751
Schmidt, Brian M; McHugh, Jonathan B; Patel, Rajiv M; Wrobel, James S
2018-04-01
Osteomyelitis is common in diabetic foot infections and medical management can lead to poor outcomes. Surgical management involves sending histopathologic and microbiologic specimens which guides future intervention. We examined the effect of obtainment of surgical margins in patients undergoing forefoot amputations to identify patient characteristics associated with outcomes. Secondary aims included evaluating interobserver reliability of histopathologic data at both the distal-to and proximal-to surgical bone margin. Data were prospectively collected on 72 individuals and was pooled for analysis. Standardized method to retrieve intraoperative bone margins was established. A univariate analysis was performed. Negative outcomes, including major lower extremity amputation, wound dehiscence, reulceration, reamputation, or death were recorded. Viable proximal margins were obtained in 63 out of 72 cases (87.5%). Strong interobserver reliability of histopathology was recorded. Univariate analysis demonstrated preoperative platelets, albumin, probe-to-bone testing, absolute toe pressures, smaller wound surface area were associated with obtaining viable margins. Residual osteomyelitis resulted in readmission 2.6 times more often and more postoperative complications. Certain patients were significantly different in the viable margin group versus dirty margin group. High interobserver reliability was demonstrated. Obtainment of viable margins resulted in reduced rates of readmission and negative outcomes. Prognostic, Level I: Prospective.
Suojärvi, Nora; Sillat, T; Lindfors, N; Koskinen, S K
2015-12-01
Operative treatment of an intra-articular distal radius fracture is one of the most common procedures in orthopedic and hand surgery. The intra- and interobserver agreement of common radiographical measurements of these fractures using cone beam computed tomography (CBCT) and plain radiographs were evaluated. Thirty-seven patients undergoing open reduction and volar fixation for a distal radius fracture were studied. Two radiologists analyzed the preoperative radiographs and CBCT images. Agreement of the measurements was subjected to intra-class correlation coefficient and the Bland-Altman analyses. Plain radiographs provided a slightly poorer level of agreement. For fracture diastasis, excellent intraobserver agreement was achieved for radiographs and good or excellent agreement for CBCT, compared to poor interobserver agreement (ICC 0.334) for radiographs and good interobserver agreement (ICC 0.621) for CBCT images. The Bland-Altman analyses indicated a small mean difference between the measurements but rather large variation using both imaging methods, especially in angular measurements. For most of the measurements, radiographs do well, and may be used in clinical practice. Two different measurements by the same reader or by two different readers can lead to different decisions, and therefore a standardization of the measurements is imperative. More detailed analysis of articular surface needs cross-sectional imaging modalities.
Geckili, Onur; Bilhan, Hakan; Cilingir, Altug; Bilmenoglu, Caglar; Ates, Gokcen; Urgun, Aliye Ceren; Bural, Canan
2014-12-01
A comparative ex vivo study was performed to determine electronic percussive test values (PTVs) measured by cabled and wireless electronic percussive testing (EPT) devices and to evaluate the intra- and interobserver reliability of the wireless EPT device. Forty implants were inserted into the vertebrae and forty into the pelvis of a steer, a safe distance apart. The implants were all 4.3 mm wide and 13 mm long, from the same manufacturer. PTV of each implant was measured by four different examiners, using both EPT devices, and compared. Additionally, the intra- and interobserver reliability of the wireless EPT device was evaluated. Statistically significant differences (P <0.05) were observed between PTVs made by the two EPT devices. PTVs measured by the wireless EPT device were significantly higher than the cabled EPT device (P <0.05), indicating lower implant stability. The intraobserver reliability of the wireless EPT device was evaluated as excellent for the measurements in type II bone and good-to-excellent in type IV bone; interobserver reliability was evaluated as fair-to-good in both bone types. The wireless EPT device gives PTVs higher than the cabled EPT device, indicating lower implant stability, and its inter- and intraobserver reliability is good and acceptable.
Shih, Joanna H; Greer, Matthew D; Turkbey, Baris
2018-03-16
To point out the problems with Cohen kappa statistic and to explore alternative metrics to determine interobserver agreement on lesion detection when locations are not prespecified. Use of kappa and two alternative methods, namely index of specific agreement (ISA) and modified kappa, for measuring interobserver agreement on the location of detected lesions are presented. These indices of agreement are illustrated by application to a retrospective multireader study in which nine readers detected and scored prostate cancer lesions in 163 consecutive patients (n = 110 cases, n = 53 controls) using the guideline of Prostate Imaging Reporting and Data System version 2 on multiparametric magnetic resonance imaging. The proposed modified kappa, which properly corrects for the amount of agreement by chance, is shown to be approximately equivalent to the ISA. In the prostate cancer data, average kappa, modified kappa, and ISA equaled 30%, 55%, and 57%, respectively, for all lesions and 20%, 87%, and 87%, respectively, for index lesions. The application of kappa could result in a substantial downward bias in reader agreement on lesion detection when locations are not prespecified. ISA is recommended for assessment of reader agreement on lesion detection. Published by Elsevier Inc.
Assessment of four midcarpal radiologic determinations.
Cho, Mickey S; Battista, Vincent; Dubin, Norman H; Pirela-Cruz, Miguel
2006-03-01
Several radiologic measurement methods have been described for determining static carpal alignment of the wrist. These include the scapholunate, radiolunate, and capitolunate angles. The triangulation method is an alternative radiologic measurement which we believe is easier to use and more reproducible and reliable than the above mentioned methods. The purpose of this study is to assess the intraobserver reproducibility and interobserver reliability of the triangulation method, scapholunate, radiolunate, and capitolunate angles. Twenty orthopaedic residents and staff at varying levels of training made four radiologic measurements including the scapholunate, radiolunate and capitolunate angles as well as the triangulation method on five different lateral, digitized radiographs of the wrist and forearm in neutral radioulnar deviation. Thirty days after the initial measurements, the participants repeated the four radiologic measurements using the same radiographs. The triangulation method had the best intra-and-interobserver agreement of the four methods tested. This agreement was significantly better than the capitolunate and radiolunate angles. The scapholunate angle had the next best intraobserver reproducibility and interobserver reliability. The triangulation method has the best overall observer agreement when compared to the scapholunate, radiolunate, and capitolunate angles in determining static midcarpal alignment. No comment can be made on the validity of the measurements since there is no radiographic gold standard in determining static carpal alignment.
Konishi, Tsuyoshi; Shimada, Yoshifumi; Lee, Lik Hang; Cavalcanti, Marcela S; Hsu, Meier; Smith, Jesse Joshua; Nash, Garrett M; Temple, Larissa K; Guillem, José G; Paty, Philip B; Garcia-Aguilar, Julio; Vakiani, Efsevia; Gonen, Mithat; Shia, Jinru; Weiser, Martin R
2018-06-01
This study aimed to compare common histologic markers at the invasive front of colon adenocarcinoma in terms of prognostic accuracy and interobserver agreement. Consecutive patients who underwent curative resection for stages I to III colon adenocarcinoma at a single institution in 2007 to 2014 were identified. Poorly differentiated clusters (PDCs), tumor budding, perineural invasion, desmoplastic reaction, and Crohn-like lymphoid reaction at the invasive front, as well as the World Health Organization (WHO) grade of the entire tumor, were analyzed. Prognostic accuracies for recurrence-free survival (RFS) were compared, and interobserver agreement among 3 pathologists was assessed. The study cohort consisted of 851 patients. Although all the histologic markers except WHO grade were significantly associated with RFS (PDCs, tumor budding, perineural invasion, and desmoplastic reaction: P<0.001; Crohn-like lymphoid reaction: P=0.021), PDCs (grade 1 [G1]: n=581; G2: n=145; G3: n=125) showed the largest separation of 3-year RFS in the full cohort (G1: 94.1%; G3: 63.7%; hazard ratio [HR], 6.39; 95% confidence interval [CI], 4.11-9.95; P<0.001), stage II patients (G1: 94.0%; G3: 67.3%; HR, 4.15; 95% CI, 1.96-8.82; P<0.001), and stage III patients (G1: 89.0%; G3: 59.4%; HR, 4.50; 95% CI, 2.41-8.41; P<0.001). PDCs had the highest prognostic accuracy for RFS with the concordance probability estimate of 0.642, whereas WHO grade had the lowest. Interobserver agreement was the highest for PDCs, with a weighted kappa of 0.824. The risk of recurrence over time peaked earlier for worse PDCs grade. Our findings indicate that PDCs are the best invasive-front histologic marker in terms of prognostic accuracy and interobserver agreement. PDCs may replace WHO grade as a prognostic indicator.
Doubilet, Peter M; Benson, Carol B
2013-07-01
To assess the interobserver agreement, frequency of occurrence, and prognostic importance of the double sac sign (DSS), intradecidual sign (IDS), and other sonographic findings in early intrauterine pregnancies. We retrospectively identified all sonograms obtained between January 1, 2006, and December 31, 2011, in which: (1) the scan demonstrated an intrauterine fluid collection without a yolk sac or embryo; (2) a follow-up scan confirmed an intrauterine pregnancy; and (3) the first-trimester outcome was known. Each coinvestigator characterized the 199 study sonograms as demonstrating or not demonstrating a DSS or an IDS, based on judgment about whether the scan met published criteria defining these signs. Interobserver agreement was poor for the DSS (κ= 0.24) and IDS (κ= 0.23). Scans frequently demonstrated neither sign: 150 cases (75.4%) if we considered a sign to be present when both investigators graded it as present and 69 cases (34.7%) using the looser criterion that either graded it as present. The presence of a DSS or an IDS was unrelated to the β-human chorionic gonadotropin (β-hCG) value (P > .05, t test, all comparisons). An inner echogenic ring was present in 158 cases (79.4%), and the decidua was brighter peripherally than centrally in 102 (51.3%). The first-trimester outcome was unrelated to the presence of a DSS or an IDS, presence of an inner echogenic ring, or decidual appearance (P > .05, χ(2), all comparisons). The sonographic appearance of early gestational sacs, before visualization of a yolk sac or embryo, is highly variable. The DSS and IDS are often absent; there is poor interobserver agreement regarding these signs; and the prognosis is unrelated to their presence or absence. A round or oval intrauterine fluid collection in a woman with positive β-hCG should be treated as a gestational sac until proven otherwise, regardless of whether it demonstrates a DSS or an IDS.
Schneider, M M; Balke, M; Koenen, P; Fröhlich, M; Wafaisade, A; Bouillon, B; Banerjee, M
2016-07-01
The reliability of the Rockwood classification, the gold standard for acute acromioclavicular (AC) joint separations, has not yet been tested. The purpose of this study was to investigate the reliability of visual and measured AC joint lesion grades according to the Rockwood classification. Four investigators (two shoulder specialists and two second-year residents) examined radiographs (bilateral panoramic stress and axial views) in 58 patients and graded the injury according to the Rockwood classification using the following sequence: (1) visual classification of the AC joint lesion, (2) digital measurement of the coracoclavicular distance (CCD) and the horizontal dislocation (HD) with Osirix Dicom Viewer (Pixmeo, Switzerland), (3) classification of the AC joint lesion according to the measurements and (4) repetition of (1) and (2) after repeated anonymization by an independent physician. Visual and measured Rockwood grades as well as the CCD and HD of every patient were documented, and a CC index was calculated (CCD injured/CCD healthy). All records were then used to evaluate intra- and interobserver reliability. The disagreement between visual and measured diagnosis ranged from 6.9 to 27.6 %. Interobserver reliability for visual diagnosis was good (0.72-0.74) and excellent (0.85-0.93) for measured Rockwood grades. Intraobserver reliability was good to excellent (0.67-0.93) for visual diagnosis and excellent for measured diagnosis (0.90-0.97). The correlations between measurements of the axial view varied from 0.68 to 0.98 (good to excellent) for interobserver reliability and from 0.90 to 0.97 (excellent) for intraobserver reliability. Bilateral panoramic stress and axial radiographs are reliable examinations for grading AC joint injuries according to Rockwood's classification. Clinicians of all experience levels can precisely classify AC joint lesions according to the Rockwood classification. We recommend to grade acute ACG lesions by performing a digital measurement instead of a sole visual diagnosis because of the higher intra- and interobserver reliability. Case series, Level IV.
Medina-Mirapeix, Francesc; Vivo-Fernández, Iván; López-Cañizares, Juan; García-Vidal, José A; Benítez-Martínez, Josep Carles; Del Baño-Aledo, María Elena
2018-01-01
The objective was to determine the inter-observer and test/retest reliability of the "Five-repetition sit-to-stand" (5STS) test in patients with total knee replacement (TKR). To explore correlation between 5STS and two mobility tests. A reliability study was conducted among 24 (mean age 72.13, S.D. 10.67; 50% were women) outpatients with TKR. They were recruited from a traumatology unit of a public hospital via convenience sampling. A physiotherapist and trauma physician assessed each patient at the same time. The same physiotherapist realized a 5STS second measurement 45-60min after the first one. Reliability was assessed with intraclass correlation coefficients (ICCs) and Bland-Altman plots. Pearson coefficient was calculated to assess the correlation between 5STS, time up to go test (TUG) and four meters gait speed (4MGS). ICC for inter-observer and test-retest reliability of the 5STS were 0.998 (95% confidence interval [CI], 0.995-0.999) and 0.982 (95% CI, 0.959-0.992). Bland-Altman plot inter-observer showed limits between -0.82 and 1.06 with a mean of 0.11 and no heteroscedasticity within the data. Bland-Altman plot for test-retest showed the limits between 1.76 and 4.16, a mean of 1.20 and heteroscedasticity within the data. Pearson correlation coefficient revealed significant correlation between 5STS and TUG (r=0.7, p<0.001) and 4MGS (r=-0.583, p=0.003). This study demonstrates excellent inter-observer and test-retest reliability when it is used in people with TKR, and also significant correlation with other functional mobility tests. These findings support the use of 5STS as outcome measure in TKR population. Copyright © 2017 Elsevier B.V. All rights reserved.
De Silvestro, A; Martini, K; Becker, A S; Kim-Nguyen, T D L; Guggenberger, R; Calcagni, M; Frauenfelder, T
2018-02-01
To prospectively investigate digital tomosynthesis (DTS) as an alternative to digital radiography (DR) for postoperative imaging of orthopaedic hardware after trauma or arthrodesis in the hand and wrist. Thirty-six consecutive patients (12 female, median age 36 years, range 19-86 years) were included in this institutional review board approved clinical trial. Imaging was performed with DTS in dorso-palmar projection and DR was performed in dorso-palmar, lateral, and oblique views. Images were evaluated by two independent radiologists for qualitative and diagnosis-related imaging parameters using a four-point Likert scale (1=excellent, 4not diagnostic) and nominal scale. Interobserver agreement between the two readers was assessed with Cohen's kappa (k). Differences between DTS and CR were tested with Wilcoxon's signed-rank test. A p-value <0.05 was considered statistically significant. Regarding image quality, interobserver agreement was higher for DTS compared to DR, especially for fracture-related parameters (delineation osteosynthesis material [OSM]: K DTS 0.96 versus K DR 0.45; delineation fracture margins: K DTS 0.78 versus K DR 0.35). Delineation of fracture margins and delineation of adjacent joint spaces scored significant better for DTS compared to DR (delineation fracture margins: DTS1.54, DR2.28, p0.001; delineation adjacent joint spaces: DTS1.31, DR2.24, p0.001). Regarding diagnosis-related findings, interobserver agreement was almost equal. DTS showed a significant higher sharpness of fracture margins (DTS1.94, DR2.33, p0.04). Mean dose area product (DAP) for DTS was significant higher compared to DR (mean DR0.219 Gy·cm 2 , mean DTS0.903 Gy·cm 2 , p0.001). Fracture healing is more visible and interobserver agreement is higher for DTS compared to DR in the postoperative assessment of orthopaedic hardware in the hand and wrist. Copyright © 2017 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
Schelhorn, Juliane; Neudorf, Ulrich; Schemuth, Haemi; Nensa, Felix; Nassenstein, Kai; Schlosser, Thomas W
2015-11-01
Patients with corrected tetralogy of Fallot (cToF) are prone to develop pulmonary regurgitation and right ventricular enlargement resulting in long-term complications, thus correct right ventricular volumetric monitoring is crucial. However, it remains controversial which cardiovascular magnetic resonance imaging (CMRI) slice orientation is most appropriate in cToF for the analysis of the right ventricular volume. To investigate which slice orientation is most suited for right ventricular volumetry in cToF we compared short-axis and axial slices, and furthermore we compared right ventricular data between CMRI and echocardiography. Thirty CMRI examinations of 27 patients with cToF were included retrospectively. Right ventricular end-diastolic (EDV) and end-systolic volume (ESV) were derived from short-axis and axial cine CMRI planes. Furthermore, pulmonary trunk forward flow in phase-contrast CMRI and right ventricular inner diastolic diameter in echocardiography (R VIDdiast) were measured. By Bland-Altman and variance analysis intra- and inter-observer agreement were assessed for cine CMRI data. By Pearson correlation CMRI cine and phase-contrast data and CMRI cine and echocardiographic data were compared. Intra- and inter-observer variability for right ventricular EDV were significantly lower in axial slices (P = 0.016, P = 0.010). For right ventricular ESV a trend towards a lower intra- and inter-observer variability in axial slices was found (P = 0.063, P = 0.138). Right ventricular stroke volume in short-axis (r = 0.872, P < 0.001) and in axial (r = 0.914, P < 0.001) planes correlated highly, respectively very highly with pulmonary trunk forward flow in phase-contrast CMRI. R VIDdiast correlated highly with right ventricular EDV assessed by short-axis and axial CMRI (P < 0.001, P < 0.001). Due to lower intra- and inter-observer variability, axial slices are recommended for right ventricular volumetry in cToF. © The Foundation Acta Radiologica 2014.
Vorselaars, V M M; Velthuis, S; Huitema, M P; Hosman, A E; Westermann, C J J; Snijder, R J; Mager, J J; Post, M C
2018-04-01
Transthoracic contrast echocardiography (TTCE) is recommended for screening of pulmonary arteriovenous malformations (PAVMs) in hereditary haemorrhagic telangiectasia. Shunt quantification is used to find treatable PAVMs. So far, there has been no study investigating the reproducibility of this diagnostic test. Therefore, this study aimed to describe inter-observer and inter-injection variability of TTCE. We conducted a prospective single centre study. We included all consecutive persons screened for presence of PAVMs in association with hereditary haemorrhagic telangiectasia in 2015. The videos of two contrast injections per patient were divided and reviewed by two cardiologists blinded for patient data. Pulmonary right-to-left shunts were graded using a three-grade scale. Inter-observer and inter-injection agreement was calculated with κ statistics for the presence and grade of pulmonary right-to-left shunts. We included 107 persons (accounting for 214 injections) (49.5% male, mean age 45.0 ± 16.6 years). A pulmonary right-to-left shunt was present in 136 (63.6%) and 131 (61.2%) injections for observer 1 and 2, respectively. Inter-injection agreement for the presence of pulmonary right-to-left shunts was 0.96 (95% confidence interval (CI) 0.9-1.0) and 0.98 (95% CI 0.94-1.00) for observer 1 and 2, respectively. Inter-injection agreement for pulmonary right-to-left shunt grade was 0.96 (95% CI 0.93-0.99) and 0.95 (95% CI 0.92-0.98) respectively. There was disagreement in right-to-left shunt grade between the contrast injections in 11 patients (10.3%). Inter-observer variability for presence and grade of the pulmonary right-to-left shunt was 0.95 (95% CI 0.91-0.99) and 0.97 (95% CI 0.95-0.99) respectively. TTCE has an excellent inter-injection and inter-observer agreement for both the presence and grade of pulmonary right-to-left shunts.
Margossian, Renee; Schwartz, Marcy L; Prakash, Ashwin; Wruck, Lisa; Colan, Steven D; Atz, Andrew M; Bradley, Timothy J; Fogel, Mark A; Hurwitz, Lynne M; Marcus, Edward; Powell, Andrew J; Printz, Beth F; Puchalski, Michael D; Rychik, Jack; Shirali, Girish; Williams, Richard; Yoo, Shi-Joon; Geva, Tal
2009-08-01
Assessment of the size and function of a functional single ventricle (FSV) is a key element in the management of patients after the Fontan procedure. Measurement variability of ventricular mass, volume, and ejection fraction (EF) among observers by echocardiography and cardiac magnetic resonance imaging (CMR) and their reproducibility among readers in these patients have not been described. From the 546 patients enrolled in the Pediatric Heart Network Fontan Cross-Sectional Study (mean age 11.9 +/- 3.4 years), 100 echocardiograms and 50 CMR studies were assessed for measurement reproducibility; 124 subjects with paired studies were selected for comparison between modalities. Interobserver agreement for qualitative grading of ventricular function by echocardiography was modest for left ventricular (LV) morphology (kappa = 0.42) and weak for right ventricular (RV) morphology (kappa = 0.12). For quantitative assessment, high intraclass correlation coefficients were found for echocardiographic interobserver agreement (LV 0.87 to 0.92, RV 0.82 to 0.85) of systolic and diastolic volumes, respectively. In contrast, intraclass correlation coefficients for LV and RV mass were moderate (LV 0.78, RV 0.72). The corresponding intraclass correlation coefficients by CMR were high (LV 0.96, RV 0.85). Volumes by echocardiography averaged 70% of CMR values. Interobserver reproducibility for the EF was similar for the 2 modalities. Although the absolute mean difference between modalities for the EF was small (<2%), 95% limits of agreement were wide. In conclusion, agreement between observers of qualitative FSV function by echocardiography is modest. Measurements of FSV volume by 2-dimensional echocardiography underestimate CMR measurements, but their reproducibility is high. Echocardiographic and CMR measurements of FSV EF demonstrate similar interobserver reproducibility, whereas measurements of FSV mass and LV diastolic volume are more reproducible by CMR.
Bishop, Julie Y; Jones, Grant L; Lewis, Brian; Pedroza, Angela
2015-04-01
In treatment of distal third clavicle fractures, the Neer classification system, based on the location of the fracture in relation to the coracoclavicular ligaments, has traditionally been used to determine fracture pattern stability. To determine the intra- and interobserver reliability in the classification of distal third clavicle fractures via standard plain radiographs and the intra- and interobserver agreement in the preferred treatment of these fractures. Cohort study (Diagnosis); Level of evidence, 3. Thirty radiographs of distal clavicle fractures were randomly selected from patients treated for distal clavicle fractures between 2006 and 2011. The radiographs were distributed to 22 shoulder/sports medicine fellowship-trained orthopaedic surgeons. Fourteen surgeons responded and took part in the study. The evaluators were asked to measure the size of the distal fragment, classify the fracture pattern as stable or unstable, assign the Neer classification, and recommend operative versus nonoperative treatment. The radiographs were reordered and redistributed 3 months later. Inter- and intrarater agreement was determined for the distal fragment size, stability of the fracture, Neer classification, and decision to operate. Single variable logistic regression was performed to determine what factors could most accurately predict the decision for surgery. Interrater agreement was fair for distal fragment size, moderate for stability, fair for Neer classification, slight for type IIB and III fractures, and moderate for treatment approach. Intrarater agreement was moderate for distal fragment size categories (κ = 0.50, P < .001) and Neer classification (κ = 0.42, P < .001) and substantial for stable fracture (κ = 0.65, P < .001) and decision to operate (κ = 0.65, P < .001). Fracture stability was the best predictor of treatment, with 89% accuracy (P < .001). Fracture stability determination and the decision to operate had the highest interobserver agreement. Fracture stability was the key determinant of treatment, rather than the Neer classification system or the size of the distal fragment. © 2015 The Author(s).
Olsen, Cody S; Kuppermann, Nathan; Jaffe, David M; Brown, Kathleen; Babcock, Lynn; Mahajan, Prashant V; Leonard, Julie C
2015-04-01
The objective was to describe the interobserver agreement between trained chart reviewers and physician reviewers in a multicenter retrospective chart review study of children with cervical spine injuries (CSIs). Medical records of children younger than 16 years old with cervical spine radiography from 17 Pediatric Emergency Care Applied Research Network (PECARN) hospitals from years 2000 through 2004 were abstracted by trained reviewers for a study aimed to identify predictors of CSIs in children. Independent physician-reviewers abstracted patient history and clinical findings from a random sample of study patient medical records at each hospital. Interobserver agreement was assessed using percent agreement and the weighted kappa (κ) statistic, with lower 95% confidence intervals. Moderate or better agreement (κ > 0.4) was achieved for most candidate CSI predictors, including altered mental status (κ = 0.87); focal neurologic findings (κ = 0.74); posterior midline neck tenderness (κ = 0.74); any neck tenderness (κ = 0.89); torticollis (κ = 0.79); complaint of neck pain (κ = 0.83); history of loss of consciousness (κ = 0.89); nonambulatory status (κ = 0.74); and substantial injuries to the head (κ = 0.50), torso/trunk (κ = 0.48), and extremities (κ = 0.59). High-risk mechanisms showed near-perfect agreement (diving, κ = 1.0; struck by car, κ = 0.93; other motorized vehicle crash, κ = 0.93; fall, κ = 0.92; high-risk motor vehicle collision, κ = 0.89; hanging, κ = 0.80). Fair agreement was found for clotheslining mechanisms (κ = 0.36) and substantial face injuries (κ = 0.40). Most retrospectively assessed variables thought to be predictive of CSIs in blunt trauma-injured children had at least moderate interobserver agreement, suggesting that these data are sufficiently valid for use in identifying potential predictors of CSI. © 2015 by the Society for Academic Emergency Medicine.
Alsma, Jelmer; van Saase, Jan L C M; Nanayakkara, Prabath W B; Schouten, W E M Ineke; Baten, Anique; Bauer, Martijn P; Holleman, Frits; Ligtenberg, Jack J M; Stassen, Patricia M; Kaasjager, Karin H A H; Haak, Harm R; Bosch, Frank H; Schuit, Stephanie C E
2017-05-01
Capillary refill time (CRT) is a clinical test used to evaluate the circulatory status of patients; various methods are available to assess CRT. Conventional clinical research often demands large numbers of patients, making it costly, labor-intensive, and time-consuming. We studied the interobserver agreement on CRT in a nationwide study by using a novel method of research called flash mob research (FMR). Physicians in the Netherlands were recruited by using word-of-mouth referrals, conventional media, and social media to participate in a nationwide, single-day, "nine-to-five," multicenter, cross-sectional, observational study to evaluate CRT. Patients aged ≥ 18 years presenting to the ED or who were hospitalized were eligible for inclusion. CRT was measured independently (by two investigators) at the patient's sternum and distal phalanx after application of pressure for 5 s (5s) and 15 s (15s). On October 29, 2014, a total of 458 investigators in 38 Dutch hospitals enrolled 1,734 patients. The mean CRT measured at the distal phalanx were 2.3 s (5s, SD 1.1) and 2.4 s (15s, SD 1.3). The mean CRT measured at the sternum was 2.6 s (5s, SD 1.1) and 2.7 s (15s, SD 1.1). Interobserver agreement was higher for the distal phalanx (κ value, 0.40) than for the sternum (κ value, 0.30). Interobserver agreement on CRT is, at best, moderate. CRT measured at the distal phalanx yielded higher interobserver agreement compared with sternal CRT measurements. FMR proved a valuable instrument to investigate a relatively simple clinical question in an inexpensive, quick, and reliable manner. Copyright © 2016 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.
Tramm, Trine; Di Caterino, Tina; Jylling, Anne-Marie B; Lelkaitis, Giedrius; Lænkholm, Anne-Vibeke; Ragó, Péter; Tabor, Tomasz P; Talman, Maj-Lis M; Vouza, Emmanouela
2018-01-01
In breast cancer, there is a growing body of evidence that tumor-infiltrating lymphocytes (TILs) may have clinical utility and may be able to direct clinical decisions for subgroups of patients. Clinical utility is, however, not sufficient for warranting the implementation of a new biomarker in the routine practice, and evaluation of the analytical validity is needed, including testing the reproducibility of decentralized assessment of TILs. The aim of this study was to evaluate the inter-observer agreement of TILs assessment using a standardized method, as proposed by the International TILs Working Group 2014, applied to a cohort of breast cancers reflecting an average breast cancer population. Stromal TILs were assessed using full slide sections from 124 breast cancers with varying histology, malignancy grade and ER- and HER2 status. TILs were estimated by nine dedicated breast pathologists using scanned hematoxylin-eosin stainings. TILs results were categorized using various cutoffs, and the inter-observer agreement was evaluated using the intraclass coefficient (ICC), Kappa statistics as well as individual overall agreements with the median value of TILs. Evaluation of TILs led to an ICC of 0.71 (95% CI: 0.65-0.77) corresponding to an acceptable agreement. Kappa values were in the range of 0.38-0.46 corresponding to a fair to moderate agreement. The individual agreements increased, when using only two categories ('high' vs. 'low' TILs) and a cutoff of 50-60%. The results of the present study are in accordance with previous studies, and shows that the proposed methodology for standardized evaluation of TILs renders an acceptable inter-observer agreement. The findings, however, indicate that assessment of TILs needs further refinement, and is in support of the latest St. Gallen Consensus, that routine reporting of TILs for early breast cancer is not ready for implementation in a clinical setting.
Manga, Simon; Parham, Groesbeck; Benjamin, Nkoum; Nulah, Kathleen; Sheldon, Lisa Kennedy; Welty, Edith; Ogembo, Javier Gordon; Bradford, Leslie; Sando, Zacharie; Shields, Ray; Welty, Thomas
2015-10-01
The World Health Organization recommends visual inspection with acetic acid (VIA) for cervical cancer screening in resource-limited settings. In Cameroon, we use digital cervicography (DC) to capture images of the cervix after VIA. This study evaluated interobserver agreement of DC results, compared DC with histopathologic results, and examined interobserver agreement among screening methods. Three observers, blinded to each other's interpretations, evaluated 540 DC photographs as follows: (1) negative/positive for acetowhite lesions or cancer and (2) assigned a presumptive diagnosis of histopathologic lesion grade in the 91 cases that had a histopathologic diagnosis. Observer A was the actual screening nurse; B, a reproductive health nurse; C, a gynecologic oncologist; and D, the histopathologic diagnosis. We compared inter-rater agreement of DC impressions among observers A, B, and C, and with D, with Cohen kappas. For interpretations of DC, (negative/positive) strengths of agreement of paired observers were the following: A/B, moderate [K, 0.54; 95% confidence interval (CI), 0.47-0.61], A/C, fair (K, 0.37; 95% CI, 0.29-0.44), and B/C, moderate (K, 0.45; 95% CI, 0.37-0.53). For presumptive pathologic grading, strengths of agreement for weighted Ks were as follows: A/B, moderate (K, 0.42; 95% CI, 0.28-0.56); A/C, fair (K, 0.33; 95% CI, 0.20-0.46); B/C, fair (K, 0.54; 95% CI, 0.40-0.67); A/D, moderate (K, 0.59; 95% CI, 0.45-0.74); B/D, moderate (K, 0.58; 95% CI, 0.46-0.70); and C/D, moderate (K, 0.50; 95% CI, 0.37-0.63). Interobserver agreement of DC interpretations was mostly moderate among the 3 observers, between them and histopathology, and comparable to that of other visual-based screening methods, i.e., VIA, cytology, or colposcopy.
Validity and reliability of the iPhone to measure rib hump in scoliosis.
Balg, Frederic; Juteau, Mathieu; Theoret, Chantal; Svotelis, Amy; Grenier, Guillaume
2014-12-01
This was a prospective blinded validity and reliability analysis. The aim of this study was validation and reliability evaluation of the Scoligauge iPhone app. The scoliometer is used to clinically measure the rib hump in scoliosis as a means to evaluate the axial trunk rotation. The increasing availability of smartphone with built-in accelerometer led to the development of a vast number of applications to measure angles. Of these, the Scoligauge mimics a scoliometer. The aim of this study was to compare the validity of the Scoligauge iPhone application without an associated adapter with the traditional scoliometer and to test the reliability of the application in a clinical setting. Two observers measured the rib hump deformity on 34 consecutive patients with idiopathic scoliosis with an average Cobb angle of 24.2 ± 13.5 degrees (range, 4 to 65 degrees). Measurements were made with an iPhone without the adapter and with a scoliometer. The validity as well as the interobserver and intraobserver reliability were calculated using the intraclass coefficient (ICC) and the Bland-Altman test. The mean difference between the scoliometer and the Scoligauge application was 0.4 degrees [95% confidence interval (CI) of ± 3.1 degrees] with an ICC of 0.947 (P < 0.001). The intraobserver and interobserver ICC were 0.961 (P < 0.001) and 0.901 (P < 0.001), respectively. The mean intraobserver difference was 0.0 degrees (95% CI of ± 2.7 degrees) and the mean interobserver difference was 0.1 degrees (95% CI of ± 4.4 degrees). The intraobserver and interobserver reliability of the Scoligauge iPhone app, as well as its validity compared with the scoliometer, are excellent. The mean differences between measurements are small and clinically not significant. Thus, the Scoligauge application is valid for clinical evaluation even without special adapter. Level I (Diagnostic Study).
Cervical vertebrae maturation method morphologic criteria: poor reproducibility.
Nestman, Trenton S; Marshall, Steven D; Qian, Fang; Holton, Nathan; Franciscus, Robert G; Southard, Thomas E
2011-08-01
The cervical vertebrae maturation (CVM) method has been advocated as a predictor of peak mandibular growth. A careful review of the literature showed potential methodologic errors that might influence the high reported reproducibility of the CVM method, and we recently established that the reproducibility of the CVM method was poor when these potential errors were eliminated. The purpose of this study was to further investigate the reproducibility of the individual vertebral patterns. In other words, the purpose was to determine which of the individual CVM vertebral patterns could be classified reliably and which could not. Ten practicing orthodontists, trained in the CVM method, evaluated the morphology of cervical vertebrae C2 through C4 from 30 cephalometric radiographs using questions based on the CVM method. The Fleiss kappa statistic was used to assess interobserver agreement when evaluating each cervical vertebrae morphology question for each subject. The Kendall coefficient of concordance was used to assess the level of interobserver agreement when determining a "derived CVM stage" for each subject. Interobserver agreement was high for assessment of the lower borders of C2, C3, and C4 that were either flat or curved in the CVM method, but interobserver agreement was low for assessment of the vertebral bodies of C3 and C4 when they were either trapezoidal, rectangular horizontal, square, or rectangular vertical; this led to the overall poor reproducibility of the CVM method. These findings were reflected in the Fleiss kappa statistic. Furthermore, nearly 30% of the time, individual morphologic criteria could not be combined to generate a final CVM stage because of incompatible responses to the 5 questions. Intraobserver agreement in this study was only 62%, on average, when the inconclusive stagings were excluded as disagreements. Intraobserver agreement was worse (44%) when the inconclusive stagings were included as disagreements. For the group of subjects that could be assigned a CVM stage, the level of interobserver agreement as measured by the Kendall coefficient of concordance was only 0.45, indicating moderate agreement. The weakness of the CVM method results, in part, from difficulty in classifying the vertebral bodies of C3 and C4 as trapezoidal, rectangular horizontal, square, or rectangular vertical. This led to the overall poor reproducibility of the CVM method and our inability to support its use as a strict clinical guideline for the timing of orthodontic treatment. Copyright © 2011 American Association of Orthodontists. Published by Mosby, Inc. All rights reserved.
Knols, Ruud H; Aufdemkampe, Geert; de Bruin, Eling D; Uebelhart, Daniel; Aaronson, Neil K
2009-01-01
Background Hand-held dynamometry is a portable and inexpensive method to quantify muscle strength. To determine if muscle strength has changed, an examiner must know what part of the difference between a patient's pre-treatment and post-treatment measurements is attributable to real change, and what part is due to measurement error. This study aimed to determine the relative and absolute reliability of intra and inter-observer strength measurements with a hand-held dynamometer (HHD). Methods Two observers performed maximum voluntary peak torque measurements (MVPT) for isometric knee extension in 24 patients with haematological malignancies. For each patient, the measurements were carried out on the same day. The main outcome measures were the intraclass correlation coefficient (ICC ± 95%CI), the standard error of measurement (SEM), the smallest detectable difference (SDD), the relative values as % of the grand mean of the SEM and SDD, and the limits of agreement for the intra- and inter-observer '3 repetition average' and the 'highest value of 3 MVPT' knee extension strength measures. Results The intra-observer ICCs were 0.94 for the average of 3 MVPT (95%CI: 0.86–0.97) and 0.86 for the highest value of 3 MVPT (95%CI: 0.71–0.94). The ICCs for the inter-observer measurements were 0.89 for the average of 3 MVPT (95%CI: 0.75–0.95) and 0.77 for the highest value of 3 MVPT (95%CI: 0.54–0.90). The SEMs for the intra-observer measurements were 6.22 Nm (3.98% of the grand mean (GM) and 9.83 Nm (5.88% of GM). For the inter-observer measurements, the SEMs were 9.65 Nm (6.65% of GM) and 11.41 Nm (6.73% of GM). The SDDs for the generated parameters varied from 17.23 Nm (11.04% of GM) to 27.26 Nm (17.09% of GM) for intra-observer measurements, and 26.76 Nm (16.77% of GM) to 31.62 Nm (18.66% of GM) for inter-observer measurements, with similar results for the limits of agreement. Conclusion The results indicate that there is acceptable relative reliability for evaluating knee strength with a HHD, while the measurement error observed was modest. The HHD may be useful in detecting changes in knee extension strength at the individual patient level. PMID:19272149
Molecular approaches for classifying endometrial carcinoma.
Piulats, Josep M; Guerra, Esther; Gil-Martín, Marta; Roman-Canal, Berta; Gatius, Sonia; Sanz-Pamplona, Rebeca; Velasco, Ana; Vidal, August; Matias-Guiu, Xavier
2017-04-01
Endometrial carcinoma is the most common cancer of the female genital tract. This review article discusses the usefulness of molecular techniques to classify endometrial carcinoma. Any proposal for molecular classification of neoplasms should integrate morphological features of the tumors. For that reason, we start with the current histological classification of endometrial carcinoma, by discussing the correlation between genotype and phenotype, and the most significant recent improvements. Then, we comment on some of the possible flaws of this classification, by discussing also the value of molecular pathology in improving them, including interobserver variation in pathologic interpretation of high grade tumors. Third, we discuss the importance of applying TCGA molecular approach to clinical practice. We also comment on the impact of intratumor heterogeneity in classification, and finally, we will discuss briefly, the usefulness of TCGA classification in tailoring immunotherapy in endometrial cancer patients. We suggest combining pathologic classification and the surrogate TCGA molecular classification for high-grade endometrial carcinomas, as an option to improve assessment of prognosis. Copyright © 2016 Elsevier Inc. All rights reserved.
[Formal quality assessment of informed consent documents in 9 hospitals].
Calle-Urra, J E; Parra-Hidalgo, P; Saturno-Hernández, P J; Martínez-Martínez, M J; Navarro-Moya, F J
2013-01-01
Informed consent forms are very important in the process of medical information. The aim of this study is to design reliable formal quality criteria of these documents and their application in the evaluation of those used in the hospitals of a regional health service. Criteria have been designed from the analysis of existing regulations, previous studies and consultation with key experts. The interobserver concordance was assessed using the kappa index. Criteria evaluation was performed on 1425 documents of 9 hospitals. A total of 19 criteria used in the evaluation of the quality of informed consent forms have been obtained. Kappa values were higher than 0,60 in 17 of them and higher than 0,52 in the other 2. The average number of defects per document was 7.6, with a high-low ratio among hospitals of 1.84. More than 90% of the documents had defects in the information on consequences and contraindications, and in about 90% it did not mention the copy to the patient. More than 60% did not comply with stating the purpose of the procedure, a statement of having understood and clarified doubts, and the treatment options. A tool has been obtained to reliably assess the formal quality of the informed consent forms. The documents assessed have a wide margin for improvement related to giving a copy to the patient, and some aspects of the specific information that patients should receive. Copyright © 2012 SECA. Published by Elsevier Espana. All rights reserved.
Computerized image analysis: estimation of breast density on mammograms
NASA Astrophysics Data System (ADS)
Zhou, Chuan; Chan, Heang-Ping; Petrick, Nicholas; Sahiner, Berkman; Helvie, Mark A.; Roubidoux, Marilyn A.; Hadjiiski, Lubomir M.; Goodsitt, Mitchell M.
2000-06-01
An automated image analysis tool is being developed for estimation of mammographic breast density, which may be useful for risk estimation or for monitoring breast density change in a prevention or intervention program. A mammogram is digitized using a laser scanner and the resolution is reduced to a pixel size of 0.8 mm X 0.8 mm. Breast density analysis is performed in three stages. First, the breast region is segmented from the surrounding background by an automated breast boundary-tracking algorithm. Second, an adaptive dynamic range compression technique is applied to the breast image to reduce the range of the gray level distribution in the low frequency background and to enhance the differences in the characteristic features of the gray level histogram for breasts of different densities. Third, rule-based classification is used to classify the breast images into several classes according to the characteristic features of their gray level histogram. For each image, a gray level threshold is automatically determined to segment the dense tissue from the breast region. The area of segmented dense tissue as a percentage of the breast area is then estimated. In this preliminary study, we analyzed the interobserver variation of breast density estimation by two experienced radiologists using BI-RADS lexicon. The radiologists' visually estimated percent breast densities were compared with the computer's calculation. The results demonstrate the feasibility of estimating mammographic breast density using computer vision techniques and its potential to improve the accuracy and reproducibility in comparison with the subjective visual assessment by radiologists.
Teman, Carolin J.; Wilson, Andrew R.; Perkins, Sherrie L.; Hickman, Kimberly; Prchal, Josef T.; Salama, Mohamed E.
2010-01-01
Evaluation of bone marrow fibrosis and osteosclerosis in myeloproliferative neoplasms (MPN) is subject to interobserver inconsistency. Performance data for currently utilized fibrosis grading systems are lacking, and classification scales for osteosclerosis do not exist. Digital imaging can serve as a quantification method for fibrosis and osteosclerosis. We used digital imaging techniques for trabecular area assessment and reticulin-fiber quantification. Patients with all Philadelphia negative MPN subtypes had higher trabecular volume than controls (p ≤0.0015). Results suggest that the degree of osteosclerosis helps differentiate primary myelofibrosis from other MPN. Numerical quantification of fibrosis highly correlated with subjective scores, and interobserver correlation was satisfactory. Digital imaging provides accurate quantification for osteosclerosis and fibrosis. PMID:20122729
Neumann, M; Friedl, S; Meining, A; Egger, K; Heldwein, W; Rey, J F; Hochberger, J; Classen, M; Hohenberger, W; Rösch, T
2002-10-01
In most European countries, training in GI endoscopy has largely been based on hands-on acquisition of experience in patients rather than on a structured training programme. With the development of training models systematic hands-on training in a variety of diagnostic and therapeutic endoscopy techniques was achieved. Little, however, is known about methods of objectively assessing trainees' performance. We therefore developed an assessment 'score card' for upper GI endoscopy and tested it in endoscopists with various levels of experience. The aim of the study was therefore to assess interobserver variations in the evaluation of trainees. On the basis of textbook and expert opinions a consensus group of eight experienced endoscopists developed a score card for diagnostic upper GI endoscopy with biopsy. The score card includes an assessment of the single steps of the procedure as well as of the times needed to complete each step. This score card was then evaluated in a further conference including ten experts who blindly assessed videotapes of 15 endoscopists performing upper GI endoscopy in a training bio-simulation model (the 'Erlangen Endo-Trainer'). On the basis of their previous experience (i. e. the number of endoscopies performed) these 15 endoscopists were classified into four groups: very experienced, experienced, having some experience and inexperienced. Interobserver variability (IOV) was tested for the various score card parameters (Kendall's rank-correlation coefficient 0.0-0.5 poor, 0.5-1.0 good agreement). In addition, the correlation between the score card assessment and the examiners' experience levels was analysed. Despite poor IOV results for all the parameters tested (Kendall coefficient < 0.3), the assessment parameters correlated well when the examiners' different experience levels were taken into account (correlation coefficient 0.59-0.89, p < 0.05). The score card parameters were suitable for differentiating between the four groups of examiners with different levels of endoscopic experience. As expected with scores involving subjective assessment of performance, the variability between reviewers was substantial. Nevertheless, the assessment score was capable of distinguishing reliably between different experience levels in terms of a good individual observer consistency. The score card can therefore be used to document both training status and progress during endoscopy training courses using bio-simulation models, and this might be able to provide improved quality assurance in GI endoscopy training.
Miyasaka, M; Hirakawa, M; Nakamura, K; Tanaka, F; Mimori, K; Mori, M; Honda, H
2011-08-01
Nonerosive reflux disease (NERD) is classified into grade M (minimal change, endoscopically; erythema without sharp demarcation, whitish turbidity, and/or invisibility of vessels due to these findings) and grade N (normal) in the modified Los Angeles classification system in Japan. However, the classification of grades M and N NERD is not included in the original Los Angeles system because interobserver agreement for the conventional endoscopic diagnosis of grades M or N NERD is poor. Flexible spectral imaging color enhancement (FICE) is a virtual chromoendoscopy technique that enhances mucosal and vascular visibility. The aim of this study is to evaluate whether the endoscopic diagnosis of grades M or N NERD using FICE images is feasible. Between April 2006 and May 2008, 26 NERD patients and 31 controls were enrolled in the present study. First, an experienced endoscopist assessed the color pattern of minimal change in FICE images using conventional endoscopic images and FICE images side-by-side and comparing the proportion of minimal change between the two groups. Second, three blinded endoscopists assessed the presence or absence of minimal change in both groups using conventional endoscopic images and FICE images separately. Intraobserver variability was compared using McNemar's test, and interobserver agreement was described using the kappa value. Minimal changes, such as erythema and whitish turbidity, which were detected using conventional endoscopic images, showed up as navy blue and pink-white, respectively, in color using FICE images in the present FICE mode. The NERD group had a higher proportion of minimal change, compared with the control group (77% and 48%, respectively) (P= 0.033). In all three readers, the detection rates of minimal change using FICE images were greater than those using conventional endoscopic images (P= 0.025, <0.0001, and 0.034 for readers A, B, and C, respectively). The kappa values for all pairs of three readers using FICE images were between 0.683 and 0.812, while those using conventional endoscopic images were between 0.364 and 0.624. Thus, the endoscopic diagnosis of grades M or N NERD using FICE images is feasible and may improve interobserver agreement. © 2011 Copyright the Authors. Journal compilation © 2011, Wiley Periodicals, Inc. and the International Society for Diseases of the Esophagus.
Chang, Jeff; Ip, Matthew; Yang, Michael; Wong, Brendon; Power, Theresa; Lin, Lisa; Xuan, Wei; Phan, Tri Giang; Leong, Rupert W
2016-04-01
Confocal laser endomicroscopy can dynamically assess intestinal mucosal barrier defects and increased intestinal permeability (IP). These are functional features that do not have corresponding appearance on histopathology. As such, previous pathology training may not be beneficial in learning these dynamic features. This study aims to evaluate the diagnostic accuracy, learning curve, inter- and intraobserver agreement for identifying features of increased IP in experienced and inexperienced analysts and pathologists. A total of 180 endoscopic confocal laser endomicroscopy (Pentax EC-3870FK; Pentax, Tokyo, Japan) images of the terminal ileum, subdivided into 6 sets of 30 were evaluated by 6 experienced analysts, 13 inexperienced analysts, and 2 pathologists, after a 30-minute teaching session. Cell-junction enhancement, fluorescein leak, and cell dropout were used to represent increased IP and were either present or absent in each image. For each image, the diagnostic accuracy, confidence, and quality were assessed. Diagnostic accuracy was significantly higher for experienced analysts compared with inexperienced analysts from the first set (96.7% vs 83.1%, P < .001) to the third set (95% vs 89.7, P = .127). No differences in accuracy were noted between inexperienced analysts and pathologists. Confidence (odds ratio, 8.71; 95% confidence interval, 5.58-13.57) and good image quality (odds ratio, 1.58; 95% confidence interval, 1.22-2.03) were associated with improved interpretation. Interobserver agreement κ values were high and improved with experience (experienced analysts, 0.83; inexperienced analysts, 0.73; and pathologists, 0.62). Intraobserver agreement was >0.86 for experienced observers. Features representative of increased IP can be rapidly learned with high inter- and intraobserver agreement. Confidence and image quality were significant predictors of accurate interpretation. Previous pathology training did not have an effect on learning. Copyright © 2016 American Society for Gastrointestinal Endoscopy. Published by Elsevier Inc. All rights reserved.
Impact of Anatomical Location on Value of CT-PET Co-Registration for Delineation of Lung Tumors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fitton, Isabelle; Netherlands Cancer Institute-Antoni van Leeuwenhoek Hospital, Amsterdam; Steenbakkers, Roel J.H.M.
2008-04-01
Purpose: To derive guidelines for the need to use positron emission tomography (PET) for delineation of the primary tumor (PT) according to its anatomical location in the lung. Methods and Materials: In 22 patients with non-small-cell lung cancer, thoracic X-ray computed tomography (CT) and PET were performed. Eleven radiation oncologists delineated the PT on the CT and on the CT-PET registered scans. The PTs were classified into two groups. In Group I patients, the PT was surrounded by lung or visceral pleura, without venous invasion, without extension to chest wall or the mediastinum over more than one quarter of itsmore » surface. In Group II patients, the PT invaded the hilar region, heart, great vessels, pericardium, mediastinum over more than one quarter of its surface and/or associated with atelectasis. A comparison of interobserver variability for each group was performed and expressed as a local standard deviation. Results: The comparison of delineations showed a good reproducibility for Group I, with an average SD of 0.4 cm on CT and an average SD of 0.3 cm on CT-PET (p = 0.1628). There was also a significant improvement with CT-PET for Group II, with an average SD of 1.3 cm on CT and SD of 0.4 cm on CT-PET (p = 0.0003). The improvement was mainly located at the atelectasis/tumor interface. At the tumor/lung and tumor/hilum interfaces, the observer variation was similar with both modalities. Conclusions: Using PET for PT delineation is mandatory to decrease interobserver variability in the hilar region, heart, great vessels, pericardium, mediastinum, and/or the region associated with atelectasis; however it is not essential for delineation of PT surrounded by lung or visceral pleura, without venous invasion or extension to the chest wall.« less
Biederer, J; Bolte, H; Schmidt, T; Charalambous, N; Both, M; Kopp, U; Hoffmann, B; Freitag-Wolf, S; Van Metter, R; Heller, M
2010-03-01
To evaluate in a.-p. digital chest radiograms of an ex vivo system if increased latitude and enhanced image detail contrast (EVP) improve the accuracy of detecting artificial air space opacities in parts of the lung that are superimposed by the diaphragm. 19 porcine lungs were inflated inside a chest phantom, prepared with 20-50 ml gelatin-stabilized liquid to generate alveolar air space opacities, and examined with direct radiography (3.0 × 2.5 k detector/ 125 kVp/ 4 mAs). 276 a.-p. images with and without EVP of 1.0-3.0 were presented to 6 observers. 8 regions were read for opacities, the reference was defined by CT. Statistics included sensitivity/specificity, interobserver variability, and calculation of Az (area under ROC curve). Behind the diaphragm (opacities in 32/92 regions), the median sensitivity increased from 0.35 without EVP to 0.53-0.56 at EVP 1.5-3.0 (significant in 5/6 observers). The specificity decreased from 0.96 to 0.90 (significant in 6/6), and the Az value and interobserver correlation increased from 0.66 to 0.74 and 0.39 to 0.48, respectively. Above the diaphragm, the median sensitivity for artificial opacities (136/276 regions) increased from 0.71 to 0.77-0.82 with EVP (significant in 4/6 observers). The specificity and Az value decreased from 0.76 to 0.62 and 0.74 to 0.70, respectively, (significant in 3/6). In this ex vivo experiment, EVP improved the diagnostic accuracy for artificial air space opacities in the superimposed parts of the lung (area under the ROC curve). Above the diaphragm, the accuracy was not affected due to a tradeoff in sensitivity/specificity. © Georg Thieme Verlag KG Stuttgart · New York.
Shea, Beverley J; Hamel, Candyce; Wells, George A; Bouter, Lex M; Kristjansson, Elizabeth; Grimshaw, Jeremy; Henry, David A; Boers, Maarten
2009-10-01
Our purpose was to measure the agreement, reliability, construct validity, and feasibility of a measurement tool to assess systematic reviews (AMSTAR). We randomly selected 30 systematic reviews from a database. Each was assessed by two reviewers using: (1) the enhanced quality assessment questionnaire (Overview of Quality Assessment Questionnaire [OQAQ]); (2) Sacks' instrument; and (3) our newly developed measurement tool (AMSTAR). We report on reliability (interobserver kappas of the 11 AMSTAR items), intraclass correlation coefficients (ICCs) of the sum scores, construct validity (ICCs of the sum scores of AMSTAR compared with those of other instruments), and completion times. The interrater agreement of the individual items of AMSTAR was substantial with a mean kappa of 0.70 (95% confidence interval [CI]: 0.57, 0.83) (range: 0.38-1.0). Kappas recorded for the other instruments were 0.63 (95% CI: 0.38, 0.78) for enhanced OQAQ and 0.40 (95% CI: 0.29, 0.50) for the Sacks' instrument. The ICC of the total score for AMSTAR was 0.84 (95% CI: 0.65, 0.92) compared with 0.91 (95% CI: 0.82, 0.96) for OQAQ and 0.86 (95% CI: 0.71, 0.94) for the Sacks' instrument. AMSTAR proved easy to apply, each review taking about 15 minutes to complete. AMSTAR has good agreement, reliability, construct validity, and feasibility. These findings need confirmation by a broader range of assessors and a more diverse range of reviews.
NASA Astrophysics Data System (ADS)
Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali
2011-02-01
Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
Chen, Y-J; Chen, S-K; Huang, H-W; Yao, C-C; Chang, H-F
2004-09-01
To compare the cephalometric landmark identification on softcopy and hardcopy of direct digital cephalography acquired by a storage-phosphor (SP) imaging system. Ten digital cephalograms and their conventional counterpart, hardcopy on a transparent blue film, were obtained by a SP imaging system and a dye sublimation printer. Twelve orthodontic residents identified 19 cephalometric landmarks on monitor-displayed SP digital images with computer-aided method and on their hardcopies with conventional method. The x- and y-coordinates for each landmark, indicating the horizontal and vertical positions, were analysed to assess the reliability of landmark identification and evaluate the concordance of the landmark locations in softcopy and hardcopy of SP digital cephalometric radiography. For each of the 19 landmarks, the location differences as well as the horizontal and vertical components were statistically significant between SP digital cephalometric radiography and its hardcopy. Smaller interobserver errors on SP digital images than those on their hardcopies were noted for all the landmarks, except point Go in vertical direction. The scatter-plots demonstrate the characteristic distribution of the interobserver error in both horizontal and vertical directions. Generally, the dispersion of interobserver error on SP digital cephalometric radiography is less than that on its hardcopy with conventional method. The SP digital cephalometric radiography could yield better or comparable level of performance in landmark identification as its hardcopy, except point Go in vertical direction.
Bogaerts, Evelien; Van der Vekens, Elke; Verhoeven, Geert; de Rooster, Hilde; Van Ryssen, Bernadette; Samoy, Yves; Putcuyps, Ingrid; Van Tilburg, Johan; Devriendt, Nausikaa; Weekers, Frederik; Bertal, Mileva; Houdellier, Blandine; Scheemaeker, Stephanie; Versteken, Jeroen; Lamerand, Maryline; Feenstra, Laurien; Peelman, Luc; Nieuwerburgh, Filip Van; Saunders, Jimmy H; Broeckx, Bart J G
2018-04-28
Even though radiography is one of the most frequently used imaging techniques for orthopaedic disorders, it has been demonstrated that the interpretation can vary between assessors. As such, the purpose of this study was to examine the intraobserver and interobserver agreement and the influence of level of expertise on the interpretation of radiographs of the stifle in dogs with and without cranial cruciate ligament rupture (CCLR). Sixteen observers, divided in four groups according to their level of experience, evaluated 30 radiographs (15 cases with CCLR and 15 control stifles) twice. Each observer was asked to evaluate joint effusion, presence and location of degenerative joint disease, joint instability and whether CCLR was present or absent. Overall, intraobserver and interobserver agreement ranged from fair to almost perfect with a trend towards increased agreement for more experienced observers. Additionally, it was found that stifles that were classified with high agreement have either overt disease characteristics or no disease characteristics at all, in comparison to the ones that are classified with a low agreement. Overall, the agreement on radiographic interpretation of CCLR was high, which is important, as it is the basis of a correct diagnosis and treatment. © British Veterinary Association (unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Intra- and interobserver reliability of quantitative ultrasound measurement of the plantar fascia.
Rathleff, Michael Skovdal; Moelgaard, Carsten; Lykkegaard Olesen, Jens
2011-01-01
To determine intra- and interobserver reliability and measurement precision of sonographic assessment of plantar fascia thickness when using one, the mean of two, or the mean of three measurements. Two experienced observers scanned 20 healthy subjects twice with 60 minutes between test and retest. A GE LOGIQe ultrasound scanner was used in the study. The built-in software in the scanner was used to measure the thickness of the plantar fascia (PF). Reliability was calculated using intraclass correlation coefficient (ICC) and limits of agreement (LOA). Intraobserver reliability (ICC) using one measurement was 0.50 for one observer and 0.52 for the other, and using the mean of three measurements intraobserver reliability increased up to 0.77 and 0.67, respectively. Interobserver reliability (ICC) when using one measurement was 0.62 and increased to 0.82 when using the average of three measurements. LOA showed that when using the average of three measurements, LOA decreased to 0.6 mm, corresponding to 17.5% of the mean thickness of the PF. The results showed that reliability increases when using the mean of three measurements compared with one. Limits of agreement based on intratester reliability shows that changes in thickness that are larger than 0.6 mm can be considered actual changes in thickness and not a result of measurement error. Copyright © 2011 Wiley Periodicals, Inc.
Zygmunt, Arkadiusz; Adamczewski, Zbigniew; Zygmunt, Agnieszka; Karbownik-Lewinska, Malgorzata; Lewinski, Andrzej
2017-01-01
Goitre incidence in school-aged children evaluated using ultrasonography is one of the essential indicators of iodine intake in a given area. The aim of the study was to examine what the difference is between the volume of the thyroid gland measured in the supine and sitting position and to determine the intra-observer, inter-observer, and inter-position variations. The survey was conducted among 87 children (56 girls and 31 boys aged 7-13 years, mean age 10.44 ± 1.72 years). The thyroid volume measured in a sitting position was significantly lower than that measured in the supine position. The intra-observer variations for the total thyroid volume equalled 9.56-9.65%. The inter-observer variations were significantly higher and amounted to 34.5-35.7%. The way in which ultrasound evaluation is performed is important for the analysis of the results. It is crucial to aim for the smallest inter-observer variation, which can be achieved by strictly defining the methods of the thyroid measurement and comparing one's measuring techniques with the reference method. The use of standards in ultrasound evaluation performed in the supine position, as well as the use of standards without a strict determination of the study method, can lead to erro-neous conclusions. © 2017 S. Karger AG, Basel.
Cerciello, Simone; Monk, Andrew Paul; Visonà, Enrico; Carbone, Stefano; Edwards, Thomas Bradley; Maffulli, Nicola; Walch, Gilles
2017-07-01
Secondary cuff failure after shoulder replacement is disabling and often requires additional surgery. Increased critical shoulder angle (CSA) has been found in patients with cuff tear compared to normal subjects. The interobserver reliability of the CSA and the relationship between CSA and symptomatic secondary cuff failure after shoulder replacement were investigated. Nineteen patients with symptomatic cuff failure after anatomic shoulder replacement (mean FU 45 months) were compared to a control group of 29 patients showing no signs of symptomatic cuff failure (mean FU 105.7 months). The CSA was measured by two blinded surgeons at a mean follow-up of 45 and 105.7 months, respectively. Inter-observer reliability was calculated. The mean CSA in the study group in neutral, internal and external rotations were 33°, 34° and 34°, respectively. Corresponding values in the control group were 32°, 32° and 32°. The interclass correlation coefficient for the whole population between the two examiners were 0.956 (P < 0.01), 0.964 (P < 0.01) and 0.955 (P < 0.01), respectively. There were no significant differences of CSA values between patients who had undergone shoulder replacement and experienced late cuff failure and those in whom the same procedure had been successful. A good inter-observer reliability was found for the CSA method.
Lopez, Mandi J; Davis, Kechia M; Jeffrey-Borger, Susan L; Markel, Mark D; Rettenmund, Christy
2009-12-01
To determine interobserver repeatability of measurements on computed tomography (CT) images of lax canine hip joints at different ages and in the presence of degenerative joint disease at maturity. Longitudinal observational investigation. Sibling crossbreed hounds. Pelvic CT was performed at 20, 24, 32, 48, 68, and 104 weeks of age. Measures were performed on 3 contiguous two-dimensional (2D) transverse CT images of both hips at each time point by 3 investigators. Center-edge angle (CEA), horizontal toit externe angle (HTEA), ventral (VASA), dorsal (DASA), and horizontal (HASA) acetabular sector angles, acetabular index (AI), and percent femoral head coverage (CPC) were measured. Interobserver repeatability was quantified with the intraclass correlation coefficient (ICC). Satisfactory repeatability was considered when ICC >or=0.75. DASA, CEA, and CPC were repeatable in all age groups. HASA and HTEA were repeatable for all but 1 time point. At 20 weeks of age, all measures but AI were repeatable, and at 104 weeks of age, DASA, CEA, CPC, and HASA were repeatable. Measures were repeatable in hips with and without degenerative changes with the exceptions of AI and HASA in normal hips and VASA and HTEA in osteoarthritic hips. Most 2D CT measurements examined were repeatable regardless of age or joint disease. Two-dimensional CT measures may augment current techniques for assessing joint changes in lax canine hips.
Karaman, Adem; Durur-Subasi, Irmak; Alper, Fatih; Durur-Karakaya, Afak; Subasi, Mahmut; Akgun, Metin
2017-10-01
To determine whether the use of necrosis/wall apparent diffusion coefficient (ADC) ratios in the differentiation of necrotic lung lesions is more reliable than measuring the wall alone. In this retrospective study, a total of 76 patients (54 males and 22 females, 71% vs. 29%, with a mean age of 53 ± 18 years, range, 18-84) were enrolled, 33 of whom had lung carcinoma and 43 had a benign necrotic lung lesion. A 3T scanner was used. The calculation of the necrosis/wall ADC ratio was based on ADC values measured from necrosis and the wall of the lesions by diffusion-weighted imaging (DWI). Statistical analyses were performed with the independent samples t-test and receiver operating characteristic analysis. Intraobserver and interobserver reliability were calculated for ADC values of wall and necrosis. The mean necrosis/wall ADC ratio was 1.67 ± 0.23 for malignant lesions and 0.75 ± 0.19 for benign lung lesions (P < 0.001). To estimate malignancy the area under the curve (AUC) values for necrosis ADC, wall ADC, and the necrosis/wall ADC ratio were 0.720, 0.073, and 0.997, respectively. A wall/necrosis ADC ratio cutoff value of 1.12 demonstrated a 100% sensitivity and 98% specificity in the estimation of malignancy. Positive predictive value was 100%, and negative predictive value 98% and diagnostic accuracy 99%. There was a good intraobserver and interobserver reliability for wall and necrosis. The necrosis/wall ADC ratio appears to be a reliable and promising tool for discriminating lung carcinoma from benign necrotic lung lesions than measuring the wall alone. 4 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2017;46:1001-1006. © 2017 International Society for Magnetic Resonance in Medicine.
Melloul, Emmanuel; Raptis, Dimitri A; Boss, Andreas; Pfammater, Thomas; Tschuor, Christoph; Tian, Yinghua; Graf, Rolf; Clavien, Pierre-Alain; Lesurtel, Mickael
2014-04-01
To develop a noninvasive technique to assess liver volumetry and intrahepatic portal vein anatomy in a mouse model of liver regeneration. Fifty-two C57BL/6 male mice underwent magnetic resonance imaging (MRI) of the liver using a 4.7 T small animal MRI system after no treatment, 70% partial hepatectomy (PH), or selective portal vein embolization. The protocol consisted of the following sequences: three-dimensional-encoded spoiled gradient-echo sequence (repetition time per echo time 15 per 2.7 ms, flip angle 20°) for volumetry, and two-dimensional-encoded time-of-flight angiography sequence (repetition time per echo time 18 per 6.4 ms, flip angle 80°) for vessel visualization. Liver volume and portal vein segmentation was performed using a dedicated postprocessing software. In animals with portal vein embolization, portography served as reference standard. True liver volume was measured after sacrificing the animals. Measurements were carried out by two independent observers with subsequent analysis by the Cohen κ-test for interobserver agreement. MRI liver volumetry highly correlated with the true liver volume measurement using a conventional method in both the untreated liver and the liver remnant after 70% PH with a high interobserver correlation coefficient of 0.94 (95% confidence interval, 0.80-0.98 for untreated liver [P < 0.001] and 0.90-0.97 after 70% PH [P < 0.001]). The diagnostic accuracy of magnetic resonance angiography for the occlusion of one branch of the portal vein was 0.95 (95% confidence interval, 0.84-1). The level of agreement between the two observers for the description of intrahepatic vascular anatomy was excellent (Cohen κ value = 0.925). This protocol may be used for noninvasive liver volumetry and visualization of portal vein anatomy in mice. It will serve the dynamic study of new strategies to enhance liver regeneration in vivo. Copyright © 2014 Elsevier Inc. All rights reserved.
Iryo, Yasuhiko; Hirai, Toshinori; Nakamura, Masanobu; Inoue, Yasuteru; Watanabe, Masaki; Ando, Yukio; Azuma, Minako; Nishimura, Shinichiro; Shigematsu, Yoshinori; Kitajima, Mika; Yamashita, Yasuyuki
2015-09-01
To evaluate whether 3-T four-dimensional (4D) arterial spin-labelling (ASL) -based magnetic resonance angiography (MRA) is useful for assessing the collateral circulation via the circle of Willis in patients with carotid artery steno-occlusive disease. Institutional review board approval and prior written informed consent from all patients were obtained. The inclusion criteria were fulfilled by 13 patients with carotid artery steno-occlusive disease. All underwent 4D-ASL MRA at 3 T and digital subtraction angiography (DSA). The flow-sensitive alternating inversion recovery (FAIR) preparation scheme with look-locker sampling was used for spin labeling. At 300-ms intervals seven dynamic scans were obtained with a spatial resolution of 0.5×0.5×0.6 mm(3). The collateral flow via the circle of Willis was read on 4D-ASL MRA and DSA images by two sets of two independent readers each. κ statistics were used to assess interobserver and intermodality agreement. On DSA, collateral flow via the anterior communicating artery (AcomA) was observed in six patients, via the posterior communicating artery (PcomA) in four patients, and via both the AcomA and PcomA in three patients. With respect to the qualitative evaluation of 4D-ASL MRA images, interobserver agreement was excellent for all items (κ=1). 4D-ASL MRA and DSA consensus readings agreed on the type of collateral flow pattern in 10 of the 13 patients (77%). Intermodality agreement was good (κ=0.606; 95% confidence interval (CI): 0.215-0.997). 3 T 4D-ASL MRA may be a useful tool for the evaluation of the collateral circulation in patients with carotid artery steno-occlusive disease. Copyright © 2015 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
Matsuki, Keisuke; Watanabe, Atsuya; Ochiai, Shunsuke; Kenmoku, Tomonori; Ochiai, Nobuyasu; Obata, Takayuki; Toyone, Tomoaki; Wada, Yuichi; Okubo, Toshiyuki
2014-05-01
Although fatty degeneration of the rotator cuff muscles has been reported to affect the outcomes of rotator cuff repairs, only a few studies have attempted to quantitatively evaluate this degeneration. T2 mapping is a quantitative magnetic resonance imaging technique that potentially evaluates the concentration of fat in muscles. The purpose of this study was to investigate fatty degeneration of the rotator cuff muscles by using T2 mapping, as well as to evaluate the reliability of T2 measurement. We obtained magnetic resonance images including T2 mapping from 184 shoulders (180 patients; 110 male patients [112 shoulders] and 70 female patients [72 shoulders]; mean age, 62 years [range, 16-84 years]). Eighty-three shoulders had no rotator cuff tear (group A), whereas 101 shoulders had tears, of which 62 were incomplete to medium (group B) and 39 were large to massive (group C). T2 values of the supraspinatus and infraspinatus muscles were measured and compared among groups. Intraobserver and interobserver variabilities also were examined. The mean T2 values of the supraspinatus in groups A, B, and C were 36.3 ± 4.7 milliseconds, 44.2 ± 11.3 milliseconds, and 57.0 ± 18.8 milliseconds, respectively. The mean T2 values of the infraspinatus in groups A, B, and C were 36.1 ± 5.1 milliseconds, 40.0 ± 11.1 milliseconds, and 51.9 ± 18.2 milliseconds, respectively. The T2 value significantly increased with the extent of the tear in both muscles. Both intraobserver and interobserver variabilities were more than 0.99. T2 mapping can be a reliable tool to quantify fatty degeneration of the rotator cuff muscles. Copyright © 2014 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Mosby, Inc. All rights reserved.
T2-mapping of the sacroiliac joints at 1.5 Tesla: a feasibility and reproducibility study.
Albano, Domenico; Chianca, Vito; Cuocolo, Renato; Bignone, Rodolfo; Ciccia, Francesco; Sconfienza, Luca Maria; Midiri, Massimo; Brunetti, Arturo; Lagalla, Roberto; Galia, Massimo
2018-04-20
To evaluate the reproducibility of T2 relaxation time measurements of the sacroiliac joints at 1.5 T. Healthy volunteers underwent an oblique axial multislice multiecho spin-echo sequence of the sacroiliac joints at 1.5 T. Regions of interest were manually drawn using a dedicated software by two musculoskeletal radiologists to include the cartilaginous part of the sacroiliac joints. A senior radiologist performed the measurement twice, while a resident measured once. Intra- and inter-observer reproducibility was tested using the Bland-Altman method. Association between sex and T2 relaxation times was tested using the Mann-Whitney U test. Correlation between T2 relaxation times and body mass index (BMI) was tested using the Spearman's rho. Eighty sacroiliac joints of 40 subjects (mean age: 28 ± 4.8 years, range: 20-43; mean BMI: 23.3 ± 3.1, range: 18.9-30) were imaged. The mean T2 values obtained by the senior radiologist in the first series of measurements were 42 ± 4.4 ms, whereas in the second series were 40.7 ± 4.5 ms. The mean T2 values obtained by the radiology resident were 41.1 ± 4.2 ms. Intra-observer reproducibility was 88% (coefficient of repeatability = 3.8; bias = 1.28; p < .001), while inter-observer reproducibility was 86% (4.7; -.88; p < .001). There was significant association between sex and T2 relaxation times (p = .024) and significant inverse correlation between T2 relaxation times and BMI (r = -.340, p = .002). The assessment of T2 relaxation time measurements of sacroiliac joints seems to be highly reproducible at 1.5 T. Further studies could investigate the potential clinical application of this tool in the sacroiliac joints.
Fink, Christine; Uhlmann, Lorenz; Klose, Christina; Haenssle, Holger A
2018-05-17
Reliable and accurate assessment of severity in psoriasis is very important in order to meet indication criteria for initiation of systemic treatment or to evaluate treatment efficacy. The most acknowledged tool for measuring the extent of psoriatic skin changes is the Psoriasis Area and Severity Index (PASI). However, the calculation of PASI can be tedious and subjective and high intraobserver and interobserver variability is an important concern. Therefore, there is a great need for a standardised and objective method that guarantees a reproducible PASI calculation. Within this study we will investigate the precision and reproducibility of automated, computer-guided PASI measurements in comparison to trained physicians to address these limitations. Non-interventional analyses of PASI calculations by either physicians in a prospective versus retrospective setting or an automated computer-guided algorithm in 120 patients with plaque psoriasis. All retrospective PASI calculations by physicians or by the computer algorithm are based on total body digital images. The primary objective of this study is comparison of automated computer-guided PASI measurements by means of digital image analysis versus conventional, prospective or retrospective physicians' PASI assessments. Secondary endpoints include (1) the assessment of physicians' interobserver variance in PASI calculations, (2) the assessment of physicians' intraobserver variance in PASI assessments of the same patients' images after a time interval of at least 4 weeks, (3) the assessment of the deviation between physicians' prospective versus retrospective PASI calculations, and (4) the reproducibility of automated computer-guided PASI measurements by assessment of two sets of total body digital images of the same patients taken at one time point. Ethical approval was provided by the Ethics Committee of the Medical Faculty of the University of Heidelberg (ethics approval number S-379/2016). DRKS00011818; Results. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Singer, Adam D; Pattany, Pradip M; Fayad, Laura M; Tresley, Jonathan; Subhawong, Ty K
2016-01-01
Determine interobserver concordance of semiautomated three-dimensional volumetric and two-dimensional manual measurements of apparent diffusion coefficient (ADC) values in soft tissue masses (STMs) and explore standard deviation (SD) as a measure of tumor ADC heterogeneity. Concordance correlation coefficients for mean ADC increased with more extensive sampling. Agreement on the SD of tumor ADC values was better for large regions of interest and multislice methods. Correlation between mean and SD ADC was low, suggesting that these parameters are relatively independent. Mean ADC of STMs can be determined by volumetric quantification with high interobserver agreement. STM heterogeneity merits further investigation as a potential imaging biomarker that complements other functional magnetic resonance imaging parameters. Copyright © 2016 Elsevier Inc. All rights reserved.
Hoffstetter, Patrick; Dornia, Christian; Schäfer, Stephan; Wagner, Merle; Dendl, Lena M; Stroszczynski, Christian; Schreyer, Andreas G
2014-01-01
Rib series (RS) are a special radiological technique to improve the visualization of the bony parts of the chest. The aim of this study was to evaluate the diagnostic accuracy of rib series in minor thorax trauma. Retrospective study of 56 patients who received RS, 39 patients where additionally evaluated by plain chest film (PCF). All patients underwent a computed tomography (CT) of the chest. RS and PCF were re-read independently by three radiologists, the results were compared with the CT as goldstandard. Sensitivity, specificity, negative and positive predictive value were calculated. Significance in the differences of findings was determined by McNemar test, interobserver variability by Cohens kappa test. 56 patients were evaluated (34 men, 22 women, mean age =61 y.). In 22 patients one or more rib fracture could be identified by CT. In 18 of these cases (82%) the correct diagnosis was made by RS, in 16 cases (73%) the correct number of involved ribs was detected. These differences were significant (p = 0.03). Specificity was 100%, negative and positive predictive value were 85% and 100%. Kappa values for the interobserver agreement was 0.92-0.96. Sensitivity of PCF was 46% and was significantly lower (p = 0.008) compared to CT. Rib series does not seem to be an useful examination in evaluating minor thorax trauma. CT seems to be the method of choice to detect rib fractures, but the clinical value of the radiological proof has to be discussed and investigated in larger follow up studies.
Expression of p53, p21 and cyclin D1 in penile cancer: p53 predicts poor prognosis.
Gunia, Sven; Kakies, Christoph; Erbersdobler, Andreas; Hakenberg, Oliver W; Koch, Stefan; May, Matthias
2012-03-01
To evaluate the role of p53, p21 and cyclin D1 expression in patients with penile cancer (PC). Paraffin-embedded tissues from PC specimens from six pathology departments were subjected to a central histopathological review performed by one pathologist. The tissue microarray technique was used for immunostaining which was evaluated by two independent pathologists and correlated with cancer-specific survival (CSS). κ-statistics were used to assess interobserver variability. Uni- and multivariable Cox proportional hazards analysis was applied to assess the independent effects of several prognostic factors on CSS over a median of 32 months (IQR 6-66 months). Specimens and clinical data from 110 men treated surgically for primary PC were collected. p53 staining was positive in 30 and negative in 62 specimens. κ-statistics showed substantial interobserver reproducibility of p53 staining evaluation (κ=0.73; p<0.001). The 5-year CSS rate for the entire study cohort was 74%. Five-year CSS was 84% in p53-negative and 51% in p53-positive PC patients (p=0.003). Multivariable analysis showed p53 (HR=3.20; p=0.041) and pT-stage (HR=4.29; p<0.001) as independent significant prognostic factors for CSS. Cyclin D1 and p21 expression were not correlated with survival. However, incorporating p21 into a multivariable Cox model did contribute to improved model quality for predicting CSS. In patients with PC, the expression of p53 in the primary tumour specimen can be reproducibly assessed and is negatively associated with cancer specific survival.
A Stereological Method for the Quantitative Evaluation of Cartilage Repair Tissue
Nyengaard, Jens Randel; Lind, Martin; Spector, Myron
2015-01-01
Objective To implement stereological principles to develop an easy applicable algorithm for unbiased and quantitative evaluation of cartilage repair. Design Design-unbiased sampling was performed by systematically sectioning the defect perpendicular to the joint surface in parallel planes providing 7 to 10 hematoxylin–eosin stained histological sections. Counting windows were systematically selected and converted into image files (40-50 per defect). The quantification was performed by two-step point counting: (1) calculation of defect volume and (2) quantitative analysis of tissue composition. Step 2 was performed by assigning each point to one of the following categories based on validated and easy distinguishable morphological characteristics: (1) hyaline cartilage (rounded cells in lacunae in hyaline matrix), (2) fibrocartilage (rounded cells in lacunae in fibrous matrix), (3) fibrous tissue (elongated cells in fibrous tissue), (4) bone, (5) scaffold material, and (6) others. The ability to discriminate between the tissue types was determined using conventional or polarized light microscopy, and the interobserver variability was evaluated. Results We describe the application of the stereological method. In the example, we assessed the defect repair tissue volume to be 4.4 mm3 (CE = 0.01). The tissue fractions were subsequently evaluated. Polarized light illumination of the slides improved discrimination between hyaline cartilage and fibrocartilage and increased the interobserver agreement compared with conventional transmitted light. Conclusion We have applied a design-unbiased method for quantitative evaluation of cartilage repair, and we propose this algorithm as a natural supplement to existing descriptive semiquantitative scoring systems. We also propose that polarized light is effective for discrimination between hyaline cartilage and fibrocartilage. PMID:26069715
Keller, Cesar A; Khoor, Andras; Arenberg, Douglas A; Smith, Michael A; Islam, Shaheen U
2018-05-29
Acute cellular rejection (ACR) in lung transplant recipients requires demonstration of perivascular lymphocytic infiltration in alveolar tissue samples from transbronchial biopsies (TBBs). Probe-based confocal laser endomicroscopy (pCLE) allows in vivo observation of alveolar, vascular, and cellular microstructures in the lung with potential to identify ACR. The objective of our prospective, blinded, multicenter observational study was to identify pCLE findings in patients with ACR diagnosed histopathologically by TBB. Lung transplant recipients undergoing diagnostic bronchoscopies within 1 year posttransplant for suspected ACR had pCLE video imaging obtained immediately prior to tissue sampling via TBB. Findings of 2 pCLE criteria, abundant alveolar cellularity and perivascular cellularity (PVC), were assessed by 4 investigators familiar with pCLE and compared to histopathologic criteria of ACR to derive sensitivity, specificity, area under the receiver operating characteristic curve, and accuracy. Interobserver agreement was assessed by calculating intraclass coefficient and Fleiss κ. Findings were analyzed before and after a consensus meeting of investigators on interpreting images. Thirty pCLE procedures were performed on 24 patients, 8 showing ACR in TBB. Diagnostic performance and interobserver agreement using pCLE to identify PVC were significantly higher than those of abundant alveolar cellularity (P<.01). The number of blood vessels identified with PVC on pCLE was significantly correlated with histopathologic activity grading of ACR (P<.01). PVC agreement among investigators significantly improved after consensus meeting (P<.01) CONCLUSIONS: When found on pCLE, PVC is a feasible and reproducible criterion for assessment of ACR in vivo, but there is a learning curve for image interpretation.
de Heide, John; Vroegh, C J; Szili Torok, T; Gobbens, R J J; Zijlstra, F; Takens-Lameijer, M; Lenzen, M J; Yap, S C; Scholte Op Reimer, W J M
Postprocedural complications after elective cardiac interventions include hematomas and infections. Telemedical wound assessment using mobile phones with integrated cameras may improve quality of care and help reduce costs. We aimed to study the feasibility of telemedical wound assessment using a mobile phone. The primary aim was the number of patients who were able to upload their pictures. Secondary aims were image interpretability, agreement between nurse practitioners, and patient evaluation of the intervention. This is a prospective study of all consecutive patients who underwent an elective cardiac intervention. Patients were instructed to photograph their wound or puncture site after hospital discharge and upload the pictures to a secure email address 6 days after hospital discharge. Received photos were assessed by 2 nurse practitioners. The intervention was evaluated using a peer-reviewed questionnaire and photo assessment scheme. In total, 46 eligible patients were included in the study, with 5 screen failures (eg, clinical stay ≥ 6 days) and 1 patient lost to follow-up. Thirty-three of 40 patients (83%) were able to upload their pictures. Smartphone users were more successful in uploading their pictures compared with feature phone users (93% vs 55%, P < .01). Eighty-eight percent of the clinical pictures were interpretable. The interobserver variability had an agreement between 93% and 97%. Patients are able to take and upload the mobile clinical photos to the secure email address, and the vast majority was interpretable. Smartphone users were more successful than feature phone users in uploading their pictures. The interobserver variability was good.
Enhancing reproducibility of ultrasonic measurements by new users
NASA Astrophysics Data System (ADS)
Pramanik, Manojit; Gupta, Madhumita; Krishnan, Kajoli Banerjee
2013-03-01
Perception of operator influences ultrasound image acquisition and processing. Lower costs are attracting new users to medical ultrasound. Anticipating an increase in this trend, we conducted a study to quantify the variability in ultrasonic measurements made by novice users and identify methods to reduce it. We designed a protocol with four presets and trained four new users to scan and manually measure the head circumference of a fetal phantom with an ultrasound scanner. In the first phase, the users followed this protocol in seven distinct sessions. They then received feedback on the quality of the scans from an expert. In the second phase, two of the users repeated the entire protocol aided by visual cues provided to them during scanning. We performed off-line measurements on all the images using a fully automated algorithm capable of measuring the head circumference from fetal phantom images. The ground truth (198.1±1.6 mm) was based on sixteen scans and measurements made by an expert. Our analysis shows that: (1) the inter-observer variability of manual measurements was 5.5 mm, whereas the inter-observer variability of automated measurements was only 0.6 mm in the first phase (2) consistency of image appearance improved and mean manual measurements was 4-5 mm closer to the ground truth in the second phase (3) automated measurements were more precise, accurate and less sensitive to different presets compared to manual measurements in both phases. Our results show that visual aids and automation can bring more reproducibility to ultrasonic measurements made by new users.
Al-Amiry, Bariq; Mahmood, Sarwar; Krupic, Ferid; Sayed-Noor, Arkan
2017-09-01
Background Restoration of femoral offset (FO) and leg length is an important goal in total hip arthroplasty (THA) as it improves functional outcome. Purpose To analyze whether the problem of postoperative leg lengthening and FO reduction is related to the femoral stem or acetabular cup positioning or both. Material and Methods Between September 2010 and April 2013, 172 patients with unilateral primary osteoarthritis treated with THA were included. Postoperative leg-length discrepancy (LLD) and global FO (summation of cup and FO) were measured by two observers using a standardized protocol for evaluation of antero-posterior plain hip radiographs. Patients with postoperative leg lengthening ≥10 mm (n = 41) or with reduced global FO >5 mm (n = 58) were further studied by comparing the stem and cup length of the operated side with the contralateral side in the lengthening group, and by comparing the stem and cup offset of the operated side with the contralateral side in the FO reduction group. We evaluated also the inter-observer and intra-observer reliability of the radiological measurements. Results Both observers found that leg lengthening was related to the stem positioning while FO reduction was related to the positioning of both the femoral stem and acetabular cup. Both inter-observer reliability and intra-observer reproducibility were moderate to excellent (intra-class correlation co-efficient, ICC ≥0.69). Conclusion Post THA leg lengthening was mainly caused by improper femoral stem positioning while global FO reduction resulted from improper positioning of both the femoral stem and the acetabular cup.
A Stereological Method for the Quantitative Evaluation of Cartilage Repair Tissue.
Foldager, Casper Bindzus; Nyengaard, Jens Randel; Lind, Martin; Spector, Myron
2015-04-01
To implement stereological principles to develop an easy applicable algorithm for unbiased and quantitative evaluation of cartilage repair. Design-unbiased sampling was performed by systematically sectioning the defect perpendicular to the joint surface in parallel planes providing 7 to 10 hematoxylin-eosin stained histological sections. Counting windows were systematically selected and converted into image files (40-50 per defect). The quantification was performed by two-step point counting: (1) calculation of defect volume and (2) quantitative analysis of tissue composition. Step 2 was performed by assigning each point to one of the following categories based on validated and easy distinguishable morphological characteristics: (1) hyaline cartilage (rounded cells in lacunae in hyaline matrix), (2) fibrocartilage (rounded cells in lacunae in fibrous matrix), (3) fibrous tissue (elongated cells in fibrous tissue), (4) bone, (5) scaffold material, and (6) others. The ability to discriminate between the tissue types was determined using conventional or polarized light microscopy, and the interobserver variability was evaluated. We describe the application of the stereological method. In the example, we assessed the defect repair tissue volume to be 4.4 mm(3) (CE = 0.01). The tissue fractions were subsequently evaluated. Polarized light illumination of the slides improved discrimination between hyaline cartilage and fibrocartilage and increased the interobserver agreement compared with conventional transmitted light. We have applied a design-unbiased method for quantitative evaluation of cartilage repair, and we propose this algorithm as a natural supplement to existing descriptive semiquantitative scoring systems. We also propose that polarized light is effective for discrimination between hyaline cartilage and fibrocartilage.
Fleury, Eduardo F C; Gianini, Ana Claudia; Marcomini, Karem; Oliveira, Vilmar
2018-01-01
To determine the applicability of a computer-aided diagnostic system strain elastography system for the classification of breast masses diagnosed by ultrasound and scored using the criteria proposed by the breast imaging and reporting data system ultrasound lexicon and to determine the diagnostic accuracy and interobserver variability. This prospective study was conducted between March 1, 2016, and May 30, 2016. A total of 83 breast masses subjected to percutaneous biopsy were included. Ultrasound elastography images before biopsy were interpreted by 3 radiologists with and without the aid of computer-aided diagnostic system for strain elastography. The parameters evaluated by each radiologist results were sensitivity, specificity, and diagnostic accuracy, with and without computer-aided diagnostic system for strain elastography. Interobserver variability was assessed using a weighted κ test and an intraclass correlation coefficient. The areas under the receiver operating characteristic curves were also calculated. The areas under the receiver operating characteristic curve were 0.835, 0.801, and 0.765 for readers 1, 2, and 3, respectively, without computer-aided diagnostic system for strain elastography, and 0.900, 0.926, and 0.868, respectively, with computer-aided diagnostic system for strain elastography. The intraclass correlation coefficient between the 3 readers was 0.6713 without computer-aided diagnostic system for strain elastography and 0.811 with computer-aided diagnostic system for strain elastography. The proposed computer-aided diagnostic system for strain elastography system has the potential to improve the diagnostic performance of radiologists in breast examination using ultrasound associated with elastography.
Reliability of the Robinson classification for displaced comminuted midshaft clavicular fractures.
Stegeman, Sylvia A; Fernandes, Nicole C; Krijnen, Pieta; Schipper, Inger B
2015-01-01
This study aimed to assess the reliability of the Robinson classification for displaced comminuted midshaft fractures. A total of 102 surgeons and 52 radiologists classified 15 displaced comminuted midshaft clavicular fractures on anteroposterior (AP) and 30-degree caudocephalad radiographs twice. For both surgeons and radiologists, inter-observer and intra-observer agreement significantly improved after showing the 30-degree caudocephalad view in addition to the AP view. Radiologists had significantly higher inter- and intra-observer agreement than surgeons after judging both radiographs (κmultirater of 0.81 vs. 0.56; κintra-observer of 0.73 vs. 0.44). We advise to use two-plane radiography and to routinely incorporate the Robinson classification in the radiology reports. Copyright © 2015 Elsevier Inc. All rights reserved.
Zhang, Rui-Fang; Fu, Yu-Chuan; Lu, Yi; Zhang, Xiao-Xia; Hu, Yu-Min; Zhou, Yong-Jin; Tian, Nai-Feng; He, Jia-Wei; Yan, Zhi-Han
2017-02-01
Accurately evaluating the extent of trunk imbalance in the coronal plane is significant for patients before and after treatment. We preliminarily practiced a new method, axis-line-angle technique (ALAT), for evaluating coronal trunk imbalance with excellent intra-observer and interobserver reliability. Radiologists and surgeons were encouraged to use this method in clinical practice. However, the optimal cutoff value of the ALAT for determination of the extent of coronal trunk imbalance has not been calculated up to now. The purpose of this study was to identify the cutoff value of the ALAT that best predicts a positive measurement point to assess coronal balance or imbalance. A retrospective study at a university affiliated hospital was carried out. A total of 130 patients with C7-central sacral vertical line (CSVL) >0 mm and aged 10-18 years were recruited in this study from September 2013 to December 2014. Data were analyzed to determine the optimal cutoff value of the ALAT measurement. The C7-CSVL and ALAT measurements were conducted respectively twice on plain film within a 2-week interval by two radiologists. The optimal cutoff value of the ALAT was analyzed via receiver operating characteristic (ROC) curve. Comparison variables were performed with chi-square test between the C7-CSVL and ALAT measurements for evaluating trunk imbalance. Kappa agreement coefficient method was used to test the intra-observer and interobserver agreement of C7-CSVL and ALAT. The ROC curve area for the ALAT was 0.82 (95% confidence interval: 0.753-0.894, p<.001). The maximum Youden index was 0.51, and the corresponding cutoff point was 2.59°. No statistical difference was found between the C7-CSVL and ALAT measurements for evaluating trunk imbalance (p>.05). Intra-observer agreement values for the C7-CSVL measurements by observers 1 and 2 were 0.79 and 0.91 (p<.001), respectively, whereas intra-observer agreement values for the ALAT measurements were both 0.89 by observers 1 and 2 (p<.001). The interobserver agreement values for the first and second measurements with the C7-CSVL were 0.78 and 0.85 (p<.001), respectively, whereas the interobserver agreement values for the first and second measurements with the ALAT were 0.91 and 0.88 (p<.001), respectively. The newly developed ALAT provided an acceptable optimal cutoff value for evaluating trunk imbalance in the coronal plane with a high level of intra-observer and interobserver agreement, which suggests that the ALAT is suitable for clinical use. Copyright © 2016 Elsevier Inc. All rights reserved.
Neben-Wittich, Michelle A.; Atherton, Pamela J.; Schwartz, David J.; Sloan, Jeff A.; Griffin, Patricia C.; Deming, Richard L.; Anders, Jon C.; Loprinzi, Charles L.; Burger, Kelli N.; Martenson, James A.; Miller, Robert C.
2012-01-01
Purpose Considerable interobserver variability exists among providers and between providers and patients when measuring subjective symptoms. In the recently published Phase III N06C4 trial of mometasone cream vs. placebo to prevent radiation dermatitis, the primary provider–assessed (PA) endpoint, using the Common Toxicity Criteria for Adverse Events (CTCAE), was negative. However, prospectively planned secondary analyses of patient-reported outcomes (PROs), using the Skindex-16 and Skin Toxicity Assessment Tool (STAT), were positive. This study assesses the relationship between PA outcomes and PROs. Methods and Materials Pearson correlation coefficients were calculated to compare the three tools. Statistical correlations were defined as follows: <0.5, mild; 0.5–0.7, moderate; and >0.7, strong. Results CTCAE dermatitis moderately correlated with STATerythema, and CTCAE pruritus strongly correlated with STAT itching. CTCAE pruritus had a moderate correlation with Skindex-16 itching. Comparing the 2 PRO tools, Skindex-16 itching correlated moderately with STAT itching. Skindex-16 burning, hurting, irritation, and persistence all showed the strongest correlation with STAT burning; they showed moderate correlations with STAT itching and tenderness. Conclusions The PRO Skindex-16 correlated well with the PRO portions of STAT, but neither tool correlated well with CTCAE. PROs delineated a wider spectrum of toxicity than PA measures and provided more information on rash, redness, pruritus, and annoyance measures compared with CTCAE findings of rash and pruritus. PROs may provide a more complete measure of patient experience than single-symptom, PA endpoints in clinical trials assessing radiation skin toxicity. PMID:20888137
Alimoglu, Mustafa K.; Sarac, Didar B.; Alparslan, Derya; Karakas, Ayse A.; Altintas, Levent
2014-01-01
Background Efforts are made to enhance in-class learner engagement because it stimulates and enhances learning. However, it is not easy to quantify learner engagement. This study aimed to develop and validate an observation tool for instructor and student behaviors to determine and compare in-class learner engagement levels in four different class types delivered by the same instructor. Methods Observer pairs observed instructor and student behaviors during lectures in large class (LLC, n=2) with third-year medical students, lectures in small class (LSC, n=6) and case-based teaching sessions (CBT, n=4) with fifth-year students, and problem-based learning (PBL) sessions (~7 hours) with second-year students. The observation tool was a revised form of STROBE, an instrument for recording behaviors of an instructor and four randomly selected students as snapshots for 5-min cycles. Instructor and student behaviors were scored 1–5 on this tool named ‘in-class engagement measure (IEM)’. The IEM scores were parallel to the degree of behavior's contribution to active student engagement, so higher scores were associated with more in-class learner engagement. Additionally, the number of questions asked by the instructor and students were recorded. A total of 203 5-min observations were performed (LLC 20, LSC 85, CBT 50, and PBL 48). Results Interobserver agreement on instructor and student behaviors was 93.7% (κ=0.87) and 80.6% (κ=0.71), respectively. Higher median IEM scores were found in student-centered and problem-oriented methods such as CBT and PBL. A moderate correlation was found between instructor and student behaviors (r=0.689). Conclusions This study provides some evidence for validity of the IEM scores as a measure of student engagement in different class types. PMID:25308966
Shu, Jie; Dolman, G E; Duan, Jiang; Qiu, Guoping; Ilyas, Mohammad
2016-04-27
Colour is the most important feature used in quantitative immunohistochemistry (IHC) image analysis; IHC is used to provide information relating to aetiology and to confirm malignancy. Statistical modelling is a technique widely used for colour detection in computer vision. We have developed a statistical model of colour detection applicable to detection of stain colour in digital IHC images. Model was first trained by massive colour pixels collected semi-automatically. To speed up the training and detection processes, we removed luminance channel, Y channel of YCbCr colour space and chose 128 histogram bins which is the optimal number. A maximum likelihood classifier is used to classify pixels in digital slides into positively or negatively stained pixels automatically. The model-based tool was developed within ImageJ to quantify targets identified using IHC and histochemistry. The purpose of evaluation was to compare the computer model with human evaluation. Several large datasets were prepared and obtained from human oesophageal cancer, colon cancer and liver cirrhosis with different colour stains. Experimental results have demonstrated the model-based tool achieves more accurate results than colour deconvolution and CMYK model in the detection of brown colour, and is comparable to colour deconvolution in the detection of pink colour. We have also demostrated the proposed model has little inter-dataset variations. A robust and effective statistical model is introduced in this paper. The model-based interactive tool in ImageJ, which can create a visual representation of the statistical model and detect a specified colour automatically, is easy to use and available freely at http://rsb.info.nih.gov/ij/plugins/ihc-toolbox/index.html . Testing to the tool by different users showed only minor inter-observer variations in results.
Yen, Po-Yin; Kelley, Marjorie; Lopetegui, Marcelo; Rosado, Amber L.; Migliore, Elaina M.; Chipps, Esther M.; Buck, Jacalyn
2016-01-01
A fundamental understanding of multitasking within nursing workflow is important in today’s dynamic and complex healthcare environment. We conducted a time motion study to understand nursing workflow, specifically multitasking and task switching activities. We used TimeCaT, a comprehensive electronic time capture tool, to capture observational data. We established inter-observer reliability prior to data collection. We completed 56 hours of observation of 10 registered nurses. We found, on average, nurses had 124 communications and 208 hands-on tasks per 4-hour block of time. They multitasked (having communication and hands-on tasks simultaneously) 131 times, representing 39.48% of all times; the total multitasking duration ranges from 14.6 minutes to 109 minutes, 44.98 minutes (18.63%) on average. We also reviewed workflow visualization to uncover the multitasking events. Our study design and methods provide a practical and reliable approach to conducting and analyzing time motion studies from both quantitative and qualitative perspectives. PMID:28269924
DNA methylation-based classification of central nervous system tumours.
Capper, David; Jones, David T W; Sill, Martin; Hovestadt, Volker; Schrimpf, Daniel; Sturm, Dominik; Koelsche, Christian; Sahm, Felix; Chavez, Lukas; Reuss, David E; Kratz, Annekathrin; Wefers, Annika K; Huang, Kristin; Pajtler, Kristian W; Schweizer, Leonille; Stichel, Damian; Olar, Adriana; Engel, Nils W; Lindenberg, Kerstin; Harter, Patrick N; Braczynski, Anne K; Plate, Karl H; Dohmen, Hildegard; Garvalov, Boyan K; Coras, Roland; Hölsken, Annett; Hewer, Ekkehard; Bewerunge-Hudler, Melanie; Schick, Matthias; Fischer, Roger; Beschorner, Rudi; Schittenhelm, Jens; Staszewski, Ori; Wani, Khalida; Varlet, Pascale; Pages, Melanie; Temming, Petra; Lohmann, Dietmar; Selt, Florian; Witt, Hendrik; Milde, Till; Witt, Olaf; Aronica, Eleonora; Giangaspero, Felice; Rushing, Elisabeth; Scheurlen, Wolfram; Geisenberger, Christoph; Rodriguez, Fausto J; Becker, Albert; Preusser, Matthias; Haberler, Christine; Bjerkvig, Rolf; Cryan, Jane; Farrell, Michael; Deckert, Martina; Hench, Jürgen; Frank, Stephan; Serrano, Jonathan; Kannan, Kasthuri; Tsirigos, Aristotelis; Brück, Wolfgang; Hofer, Silvia; Brehmer, Stefanie; Seiz-Rosenhagen, Marcel; Hänggi, Daniel; Hans, Volkmar; Rozsnoki, Stephanie; Hansford, Jordan R; Kohlhof, Patricia; Kristensen, Bjarne W; Lechner, Matt; Lopes, Beatriz; Mawrin, Christian; Ketter, Ralf; Kulozik, Andreas; Khatib, Ziad; Heppner, Frank; Koch, Arend; Jouvet, Anne; Keohane, Catherine; Mühleisen, Helmut; Mueller, Wolf; Pohl, Ute; Prinz, Marco; Benner, Axel; Zapatka, Marc; Gottardo, Nicholas G; Driever, Pablo Hernáiz; Kramm, Christof M; Müller, Hermann L; Rutkowski, Stefan; von Hoff, Katja; Frühwald, Michael C; Gnekow, Astrid; Fleischhack, Gudrun; Tippelt, Stephan; Calaminus, Gabriele; Monoranu, Camelia-Maria; Perry, Arie; Jones, Chris; Jacques, Thomas S; Radlwimmer, Bernhard; Gessi, Marco; Pietsch, Torsten; Schramm, Johannes; Schackert, Gabriele; Westphal, Manfred; Reifenberger, Guido; Wesseling, Pieter; Weller, Michael; Collins, Vincent Peter; Blümcke, Ingmar; Bendszus, Martin; Debus, Jürgen; Huang, Annie; Jabado, Nada; Northcott, Paul A; Paulus, Werner; Gajjar, Amar; Robinson, Giles W; Taylor, Michael D; Jaunmuktane, Zane; Ryzhova, Marina; Platten, Michael; Unterberg, Andreas; Wick, Wolfgang; Karajannis, Matthias A; Mittelbronn, Michel; Acker, Till; Hartmann, Christian; Aldape, Kenneth; Schüller, Ulrich; Buslei, Rolf; Lichter, Peter; Kool, Marcel; Herold-Mende, Christel; Ellison, David W; Hasselblatt, Martin; Snuderl, Matija; Brandner, Sebastian; Korshunov, Andrey; von Deimling, Andreas; Pfister, Stefan M
2018-03-22
Accurate pathological diagnosis is crucial for optimal management of patients with cancer. For the approximately 100 known tumour types of the central nervous system, standardization of the diagnostic process has been shown to be particularly challenging-with substantial inter-observer variability in the histopathological diagnosis of many tumour types. Here we present a comprehensive approach for the DNA methylation-based classification of central nervous system tumours across all entities and age groups, and demonstrate its application in a routine diagnostic setting. We show that the availability of this method may have a substantial impact on diagnostic precision compared to standard methods, resulting in a change of diagnosis in up to 12% of prospective cases. For broader accessibility, we have designed a free online classifier tool, the use of which does not require any additional onsite data processing. Our results provide a blueprint for the generation of machine-learning-based tumour classifiers across other cancer entities, with the potential to fundamentally transform tumour pathology.
Yen, Po-Yin; Kelley, Marjorie; Lopetegui, Marcelo; Rosado, Amber L; Migliore, Elaina M; Chipps, Esther M; Buck, Jacalyn
2016-01-01
A fundamental understanding of multitasking within nursing workflow is important in today's dynamic and complex healthcare environment. We conducted a time motion study to understand nursing workflow, specifically multitasking and task switching activities. We used TimeCaT, a comprehensive electronic time capture tool, to capture observational data. We established inter-observer reliability prior to data collection. We completed 56 hours of observation of 10 registered nurses. We found, on average, nurses had 124 communications and 208 hands-on tasks per 4-hour block of time. They multitasked (having communication and hands-on tasks simultaneously) 131 times, representing 39.48% of all times; the total multitasking duration ranges from 14.6 minutes to 109 minutes, 44.98 minutes (18.63%) on average. We also reviewed workflow visualization to uncover the multitasking events. Our study design and methods provide a practical and reliable approach to conducting and analyzing time motion studies from both quantitative and qualitative perspectives.
How Effective Are Patient Education Materials in Educating Patients?
Keçeci, Ayla; Toprak, Sadiye; Kiliç, Seçil
2017-11-01
The aim of this research was to evaluate the patient education materials prepared and published by nurses and physicians in terms of the qualitative properties of these materials, including readability, understandability, and actionability. A total of 38 patient education materials prepared by nurses and physicians in a university hospital in Turkey were evaluated. The readability of the materials was assessed using the formulas proposed by Atesman and Cetinkaya. The Patient Education Materials Assessment Tool (PEMAT) form was used for estimating the understandability and actionability. Data were analyzed using the percentile and mean values, and the Kendall's Tau-c and correlation tests were used for interobserver agreement. According to the assessments based on the readability formulas, 55.3% of the materials were moderately difficult, while 81.6% had instructional-level readability (U.S. Grades 8 and 9) with a moderate to low level of understandability and actionability. Consequently, the patient education materials evaluated in our study had a moderate level of readability, understandability, and actionability.
Martí Gamboa, Sabina; Giménez, Olga Redrado; Mancho, Jara Pascual; Moros, María Lapresta; Sada, Julia Ruiz; Mateo, Sergio Castan
2017-04-01
Objective The objective of this study was to determine ability to detect neonatal acidemia and interobserver agreement with the FIGO 3-tier and 5-tier fetal heart rate (FHR) classification systems. Design This was a case-control study. Setting This study was set at the University Medical Center. Population A total of 202 FHR tracings of 102 women who delivered an acidemic fetus (umbilical arterial cord gas pH ≤ 7.10 and BE < - 8) and 100 who delivered a nonacidemic fetus (umbilical arterial cord gas pH > 7.10) were assessed. A subanalysis was performed for those fetuses who suffered severe metabolic acidemia (pH ≤ 7.0 and BE < - 12). Methods Two reviewers blind to clinical and outcome data classified tracings according to the new 3-tier system proposed by the FIGO and the 5-tier system proposed by Parer and Ikeda. Main Outcome Measures Sensitivity and specificity for detecting neonatal acidemia and interobserver agreement in classifying FHR tracings into categories of both systems were studied. Results The 3-tier system showed a greater sensitivity and lower specificity to detect neonatal acidemia (43.6% sensitivity, 82.5% specificity) and severe metabolic acidemia (71.4% sensitivity, 74.0% specificity) compared with the 5-tier system (36.3% sensitivity, 88% specificity and 61.9% sensitivity, 80.1% specificity, respectively). Both systems were compared by area under the receiver-operating characteristic curve, with comparable predictive ability for detecting neonatal acidemia (FIGO-area under the curve [AUC]: 0.63 [95% confidence interval [CI]: 0.57-0.68] and Parer-AUC: 0.62 [95% CI: 0.56-0.67]). Interobserver agreement was moderate for both systems, but performance at each specific category showed a better agreement for the 5-tier system identifying a pathological tracing (orange or red, κ: 0.625 vs. pathological category, κ: 0.538). Conclusion Both systems presented a comparable ability to predict neonatal acidemia, although the 5-tier system showed a better interobserver agreement identifying pathological tracings. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Gottschalk, Hilton P; Bastrom, Tracey P; Edmonds, Eric W
2013-01-01
Standard elbow radiographs (AP and lateral views) are not accurate enough to measure true displacement of medial epicondyle fractures of the humerus. The amount of perceived displacement has been used to determine treatment options. This study assesses the utility of internal oblique radiographs for measurement of true displacement in these fractures. A medial epicondyle fracture was created in a cadaveric specimen. Displacement of the fragment (mm) was set at 5, 10, and 15 in line with the vector of the flexor pronator mass. The fragment was sutured temporarily in place. Radiographs were obtained at 0 (AP), 15, 30, 45, 60, 75, and 90 degrees (lateral) of internal rotation, with the elbow in set positions of flexion. This was done with and without radio-opaque markers placed on the fragment and fracture bed. The 45 and 60 degrees internal oblique radiographs were then presented to 5 separate reviewers (of different levels of training) to evaluate intraobserver and interobserver agreement. Change in elbow position did not affect the perceived displacement (P=0.82) with excellent intraobserver reliability (intraclass correlation coefficient range, 0.979 to 0.988) and interobserver agreement of 0.953. The intraclass correlation coefficient for intraobserver reliability on 45 degrees internal oblique films for all groups ranged from 0.985 to 0.998, with interobserver agreement of 0.953. For predicting displacement, the observers were 60% accurate in predicting the true displacement on the 45 degrees internal oblique films and only 35% accurate using the 60 degrees internal oblique view. Standardizing to a 45 degrees internal oblique radiograph of the elbow (regardless of elbow flexion) can augment the treating surgeon's ability to determine true displacement. At this degree of rotation, the measured number can be multiplied by 1.4 to better estimate displacement. The addition of a 45 degrees internal oblique radiograph in medial humeral epicondyle fractures has good intraobserver and interobserver reliability to more accurately estimate the true displacement of these fractures. Diagnostic study, Level II (Development of diagnostic study with universally applied reference "gold" standard).
Infante, Fernando; Espada Vaquero, Mercedes; Bignardi, Tommaso; Lu, Chuan; Testa, Antonia C; Fauchon, David; Epstein, Elisabeth; Leone, Francesco P G; Van den Bosch, Thierry; Martins, Wellington P; Condous, George
2018-06-01
To assess interobserver reproducibility in detecting tubal ectopic pregnancies by reading data sets from 3-dimensional (3D) transvaginal ultrasonography (TVUS) and comparing it with real-time 2-dimensional (2D) TVUS. Images were initially classified as showing pregnancies of unknown location or tubal ectopic pregnancies on real time 2D TVUS by an experienced sonologist, who acquired 5 3D volumes. Data sets were analyzed offline by 5 observers who had to classify each case as ectopic pregnancy or pregnancy of unknown location. The interobserver reproducibility was evaluated by the Fleiss κ statistic. The performance of each observer in predicting ectopic pregnancies was compared to that of the experienced sonologist. Women were followed until they were reclassified as follows: (1) failed pregnancy of unknown location; (2) intrauterine pregnancy; (3) ectopic pregnancy; or (4) persistent pregnancy of unknown location. Sixty-one women were included. The agreement between reading offline 3D data sets and the first real-time 2D TVUS was very good (80%-82%; κ = 0.89). The overall interobserver agreement among observers reading offline 3D data sets was moderate (κ = 0.52). The diagnostic performance of experienced observers reading offline 3D data sets had accuracy of 78.3% to 85.0%, sensitivity of 66.7% to 81.3%, specificity of 79.5% to 88.4%, positive predictive value of 57.1% to 72.2%, and negative predictive value of 87.5% to 91.3%, compared to the experienced sonologist's real-time 2D TVUS: accuracy of 94.5%, sensitivity of 94.4%, specificity of 94.5%, positive predictive value of 85.0%, and negative predictive value of 98.1%. The diagnostic accuracy of 3D TVUS by reading offline data sets for predicting ectopic pregnancies is dependent on experience. Reading only static 3D data sets without clinical information does not match the diagnostic performance of real time 2D TVUS combined with clinical information obtained during the scan. © 2017 by the American Institute of Ultrasound in Medicine.
A proposed simple method for measurement in the anterior chamber angle: biometric gonioscopy.
Congdon, N G; Spaeth, G L; Augsburger, J; Klancnik, J; Patel, K; Hunter, D G
1999-11-01
To design a system of gonioscopy that will allow greater interobserver reliability and more clearly defined screening cutoffs for angle closure than current systems while being simple to teach and technologically appropriate for use in rural Asia, where the prevalence of angle-closure glaucoma is highest. Clinic-based validation and interobserver reliability trial. Study 1: 21 patients 18 years of age and older recruited from a university-based specialty glaucoma clinic; study 2: 32 patients 18 years of age and older recruited from the same clinic. In study 1, all participants underwent conventional gonioscopy by an experienced observer (GLS) using the Spaeth system and in the same eye also underwent Scheimpflug photography, ultrasonographic measurement of anterior chamber depth and axial length, automatic refraction, and biometric gonioscopy with measurement of the distance from iris insertion to Schwalbe's line using a reticule based in the slit-lamp ocular. In study 2, all participants underwent both conventional gonioscopy and biometric gonioscopy by an experienced gonioscopist (NGC) and a medical student with no previous training in gonioscopy (JK). Study 1: The association between biometric gonioscopy and conventional gonioscopy, Scheimpflug photography, and other factors known to correlate with the configuration of the angle. Study 2: Interobserver agreement using biometric gonioscopy compared to that obtained with conventional gonioscopy. In study 1, there was an independent, monotonic, statistically significant relationship between biometric gonioscopy and both Spaeth angle (P = 0.001, t test) and Spaeth insertion (P = 0.008, t test) grades. Biometric gonioscopy correctly identified six of six patients with occludable angles according to Spaeth criteria. Biometric gonioscopic grade was also significantly associated with the anterior chamber angle as measured by Scheimpflug photography (P = 0.005, t test). In study 2, the intraclass correlation coefficient between graders for biometric gonioscopy (0.97) was higher than for Spaeth angle grade (0.72) or Spaeth insertion grade (0.84). Biometric gonioscopy correlates well with other measures of the anterior chamber angle, shows a higher degree of interobserver reliability than conventional gonioscopy, and can readily be learned by an inexperienced observer.
Cooper, David T; Behrens, Claus F
2016-01-01
Objective: In cervical radiotherapy, it is essential that the uterine position is correctly determined prior to treatment delivery. The aim of this study was to evaluate an autoscan ultrasound (A-US) probe, a motorized transducer creating three-dimensional (3D) images by sweeping, by comparing it with a conventional ultrasound (C-US) probe, where manual scanning is required to acquire 3D images. Methods: Nine healthy volunteers were scanned by seven operators, using the Clarity® system (Elekta, Stockholm, Sweden). In total, 72 scans, 36 scans from the C-US and 36 scans from the A-US probes, were acquired. Two observers delineated the uterine structure, using the software-assisted segmentation in the Clarity workstation. The data of uterine volume, uterine centre of mass (COM) and maximum uterine lengths, in three orthogonal directions, were analyzed. Results: In 53% of the C-US scans, the whole uterus was captured, compared with 89% using the A-US. F-test on 36 scans demonstrated statistically significant differences in interobserver COM standard deviation (SD) when comparing the C-US with the A-US probe for the inferior–superior (p < 0.006), left–right (p < 0.012) and anteroposterior directions (p < 0.001). The median of the interobserver COM distance (Euclidean distance for 36 scans) was reduced from 8.5 (C-US) to 6.0 mm (A-US). An F-test on the 36 scans showed strong significant differences (p < 0.001) in the SD of the Euclidean interobserver distance when comparing the C-US with the A-US scans. The average Dice coefficient when comparing the two observers was 0.67 (C-US) and 0.75 (A-US). The predictive interval demonstrated better interobserver delineation concordance using the A-US probe. Conclusion: The A-US probe imaging might be a better choice of image-guided radiotherapy system for correcting for daily uterine positional changes in cervical radiotherapy. Advances in knowledge: Using a novel A-US probe might reduce the uncertainty in interoperator variability during ultrasound scanning. PMID:27452268
Baker, Mariwan; Cooper, David T; Behrens, Claus F
2016-10-01
In cervical radiotherapy, it is essential that the uterine position is correctly determined prior to treatment delivery. The aim of this study was to evaluate an autoscan ultrasound (A-US) probe, a motorized transducer creating three-dimensional (3D) images by sweeping, by comparing it with a conventional ultrasound (C-US) probe, where manual scanning is required to acquire 3D images. Nine healthy volunteers were scanned by seven operators, using the Clarity(®) system (Elekta, Stockholm, Sweden). In total, 72 scans, 36 scans from the C-US and 36 scans from the A-US probes, were acquired. Two observers delineated the uterine structure, using the software-assisted segmentation in the Clarity workstation. The data of uterine volume, uterine centre of mass (COM) and maximum uterine lengths, in three orthogonal directions, were analyzed. In 53% of the C-US scans, the whole uterus was captured, compared with 89% using the A-US. F-test on 36 scans demonstrated statistically significant differences in interobserver COM standard deviation (SD) when comparing the C-US with the A-US probe for the inferior-superior (p < 0.006), left-right (p < 0.012) and anteroposterior directions (p < 0.001). The median of the interobserver COM distance (Euclidean distance for 36 scans) was reduced from 8.5 (C-US) to 6.0 mm (A-US). An F-test on the 36 scans showed strong significant differences (p < 0.001) in the SD of the Euclidean interobserver distance when comparing the C-US with the A-US scans. The average Dice coefficient when comparing the two observers was 0.67 (C-US) and 0.75 (A-US). The predictive interval demonstrated better interobserver delineation concordance using the A-US probe. The A-US probe imaging might be a better choice of image-guided radiotherapy system for correcting for daily uterine positional changes in cervical radiotherapy. Using a novel A-US probe might reduce the uncertainty in interoperator variability during ultrasound scanning.
Inter-study reproducibility of cardiovascular magnetic resonance tagging
2013-01-01
Background The aim of this study is to determine the test-retest reliability of the measurement of regional myocardial function by cardiovascular magnetic resonance (CMR) tagging using spatial modulation of magnetization. Methods Twenty-five participants underwent CMR tagging twice over 12 ± 7 days. To assess the role of slice orientation on strain measurement, two healthy volunteers had a first exam, followed by image acquisition repeated with slices rotated ±15 degrees out of true short axis, followed by a second exam in the true short axis plane. To assess the role of slice location, two healthy volunteers had whole heart tagging. The harmonic phase (HARP) method was used to analyze the tagged images. Peak midwall circumferential strain (Ecc), radial strain (Err), Lambda 1, Lambda 2, and Angle α were determined in basal, mid and apical slices. LV torsion, systolic and early diastolic circumferential strain and torsion rates were also determined. Results LV Ecc and torsion had excellent intra-, interobserver, and inter-study intra-class correlation coefficients (ICC range, 0.7 to 0.9). Err, Lambda 1, Lambda 2 and angle had excellent intra- and interobserver ICC than inter-study ICC. Angle had least inter-study reproducibility. Torsion rates had superior intra-, interobserver, and inter-study reproducibility to strain rates. The measurements of LV Ecc were comparable in all three slices with different short axis orientations (standard deviation of mean Ecc was 0.09, 0.18 and 0.16 at basal, mid and apical slices, respectively). The mean difference in LV Ecc between slices was more pronounced in most of the basal slices compared to the rest of the heart. Conclusions Intraobserver and interobserver reproducibility of all strain and torsion parameters was excellent. Inter-study reproducibility of CMR tagging by SPAMM varied between different parameters as described in the results above and was superior for Ecc and LV torsion. The variation in LV Ecc measurement due to altered slice orientation is negligible compared to the variation due to slice location. Trial registration This trial is registered as NCT00005487 at National Heart, Lung and Blood institute. PMID:23663535
Quantitative assessment of multiple sclerosis lesion load using CAD and expert input
NASA Astrophysics Data System (ADS)
Gertych, Arkadiusz; Wong, Alexis; Sangnil, Alan; Liu, Brent J.
2008-03-01
Multiple sclerosis (MS) is a frequently encountered neurological disease with a progressive but variable course affecting the central nervous system. Outline-based lesion quantification in the assessment of lesion load (LL) performed on magnetic resonance (MR) images is clinically useful and provides information about the development and change reflecting overall disease burden. Methods of LL assessment that rely on human input are tedious, have higher intra- and inter-observer variability and are more time-consuming than computerized automatic (CAD) techniques. At present it seems that methods based on human lesion identification preceded by non-interactive outlining by CAD are the best LL quantification strategies. We have developed a CAD that automatically quantifies MS lesions, displays 3-D lesion map and appends radiological findings to original images according to current DICOM standard. CAD is also capable to display and track changes and make comparison between patient's separate MRI studies to determine disease progression. The findings are exported to a separate imaging tool for review and final approval by expert. Capturing and standardized archiving of manual contours is also implemented. Similarity coefficients calculated from quantities of LL in collected exams show a good correlation of CAD-derived results vs. those incorporated as expert's reading. Combining the CAD approach with an expert interaction may impact to the diagnostic work-up of MS patients because of improved reproducibility in LL assessment and reduced time for single MR or comparative exams reading. Inclusion of CAD-generated outlines as DICOM-compliant overlays into the image data can serve as a better reference in MS progression tracking.
Sehgal, Arvind; Doctor, Tejas; Menahem, Samuel
2014-12-01
Existing data suggest subendocardial ischemia in preterm infants with patent ductus arteriosus (PDA) and alterations in cardiac function after indomethacin administration. This study aimed to explore the evolution of left ventricular function by conventional echocardiography and speckle-tracking echocardiography (STE) and to ascertain the interrelationship with coronary flow indices in response to indomethacin. A prospective observational study was performed with preterm infants receiving indomethacin for medical closure of PDA. Serial echocardiography was performed, and the results were analyzed using analysis of variance. Intra- and interobserver variability was assessed using the intraclass correlation coefficient. Indomethacin was administered to 18 infants born at a median gestational age of 25.8 weeks (interquartile range [IQR], 24.2-28.1 weeks) with a birth weight of 773 g (IQR, 704-1,002 g). The median age of the infants was 7.5 days (IQR, 4-17). Global longitudinal strain (GLS) values significantly decreased immediately after indomethacin infusion (preindomethacin GLS, -19.1 ± 2.4 % vs. -15.9 ± 1.7 %; p < 0.0001) but had improved at reassessment after 1 h (-17.4 ± 1.8 %). Conventional echocardiographic indices did not show significant alterations. A significant increase in arterial resistance in the coronary vasculature from 1.7 to 2.4 mmHg/cm/s was demonstrated. A significant correlation was noted between peak systolic GLS and flow resistance in the coronary vasculature. Significant changes in myocardial indices were observed immediately after indomethacin infusion. Compared with conventional methods, STE is a more sensitive tool to facilitate understanding of hemodynamics in preterm infants.
May, Lindsay J; Ploutz, Michelle; Hollander, Seth A; Reinhartz, Olaf; Almond, Christopher S; Chen, Sharon; Maeda, Katsuhide; Kaufman, Beth D; Yeh, Justin; Rosenthal, David N
2015-04-01
The evolution of pharmacologic therapies and mechanical support including ventricular assist devices (VADs) has broadened the scope of care available to children with advanced heart failure. At the present time, there are only limited means of quantifying disease severity or the concomitant morbidity for this population. This study describes the development of a novel pediatric treatment intensity score (TIS), designed to quantify the burden of illness and clinical trajectory in children on VAD support. There were 5 clinical domains assessed: nutrition, respiratory support, activity level, cardiovascular medications, and care environment. A scale was developed through expert consensus. Higher scores indicate greater morbidity as reflected by intensity of medical management. To evaluate feasibility and face validity, the TIS was applied retrospectively to a subset of pediatric inpatients with VADs. The Bland-Altman method was used to assess limits of agreement. The study comprised 39 patients with 42 implantations. Bland-Altman interobserver and intraobserver comparisons showed good agreement (mean differences in scores of 0.02, limits of agreement ±0.12). Trends in TIS were concordant with the overall clinical impression of improvement. Scores remained ≥0.6 preceding VAD implantation and peaked at 0.71 3 days after VAD implantation. We describe a pediatric VAD scoring tool, to assess global patient morbidity and clinical recovery. We demonstrate feasibility of using this TIS in a test population of inpatients on VAD support. Copyright © 2015 International Society for Heart and Lung Transplantation. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Alfano, R.; Soetemans, D.; Bauman, G. S.; Gibson, E.; Gaed, M.; Moussa, M.; Gomez, J. A.; Chin, J. L.; Pautler, S.; Ward, A. D.
2018-02-01
Multi-parametric MRI (mp-MRI) is becoming a standard in contemporary prostate cancer screening and diagnosis, and has shown to aid physicians in cancer detection. It offers many advantages over traditional systematic biopsy, which has shown to have very high clinical false-negative rates of up to 23% at all stages of the disease. However beneficial, mp-MRI is relatively complex to interpret and suffers from inter-observer variability in lesion localization and grading. Computer-aided diagnosis (CAD) systems have been developed as a solution as they have the power to perform deterministic quantitative image analysis. We measured the accuracy of such a system validated using accurately co-registered whole-mount digitized histology. We trained a logistic linear classifier (LOGLC), support vector machine (SVC), k-nearest neighbour (KNN) and random forest classifier (RFC) in a four part ROI based experiment against: 1) cancer vs. non-cancer, 2) high-grade (Gleason score ≥4+3) vs. low-grade cancer (Gleason score <4+3), 3) high-grade vs. other tissue components and 4) high-grade vs. benign tissue by selecting the classifier with the highest AUC using 1-10 features from forward feature selection. The CAD model was able to classify malignant vs. benign tissue and detect high-grade cancer with high accuracy. Once fully validated, this work will form the basis for a tool that enhances the radiologist's ability to detect malignancies, potentially improving biopsy guidance, treatment selection, and focal therapy for prostate cancer patients, maximizing the potential for cure and increasing quality of life.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wehrschuetz, M., E-mail: martin.wehrschuetz@klinikum-graz.at; Aschauer, M.; Portugaller, H.
The purpose of this study was to assess interobserver variability and accuracy in the evaluation of renal artery stenosis (RAS) with gadolinium-enhanced MR angiography (MRA) and digital subtraction angiography (DSA) in patients with hypertension. The authors found that source images are more accurate than maximum intensity projection (MIP) for depicting renal artery stenosis. Two independent radiologists reviewed MRA and DSA from 38 patients with hypertension. Studies were postprocessed to display images in MIP and source images. DSA was the standard for comparison in each patient. For each main renal artery, percentage stenosis was estimated for any stenosis detected by themore » two radiologists. To calculate sensitivity, specificity and accuracy, MRA studies and stenoses were categorized as normal, mild (1-39%), moderate (40-69%) or severe ({>=}70%), or occluded. DSA stenosis estimates of 70% or greater were considered hemodynamically significant. Analysis of variance demonstrated that MIP estimates of stenosis were greater than source image estimates for both readers. Differences in estimates for MIP versus DSA reached significance in one reader. The interobserver variance for MIP, source images and DSA was excellent (0.80< {kappa}{<=} 0.90). The specificity of source images was high (97%) but less for MIP (87%); average accuracy was 92% for MIP and 98% for source images. In this study, source images are significantly more accurate than MIP images in one reader with a similar trend was observed in the second reader. The interobserver variability was excellent. When renal artery stenosis is a consideration, high accuracy can only be obtained when source images are examined.« less
Accuracy of abdominal auscultation for bowel obstruction.
Breum, Birger Michael; Rud, Bo; Kirkegaard, Thomas; Nordentoft, Tyge
2015-09-14
To investigate the accuracy and inter-observer variation of bowel sound assessment in patients with clinically suspected bowel obstruction. Bowel sounds were recorded in patients with suspected bowel obstruction using a Littmann(®) Electronic Stethoscope. The recordings were processed to yield 25-s sound sequences in random order on PCs. Observers, recruited from doctors within the department, classified the sound sequences as either normal or pathological. The reference tests for bowel obstruction were intraoperative and endoscopic findings and clinical follow up. Sensitivity and specificity were calculated for each observer and compared between junior and senior doctors. Interobserver variation was measured using the Kappa statistic. Bowel sound sequences from 98 patients were assessed by 53 (33 junior and 20 senior) doctors. Laparotomy was performed in 47 patients, 35 of whom had bowel obstruction. Two patients underwent colorectal stenting due to large bowel obstruction. The median sensitivity and specificity was 0.42 (range: 0.19-0.64) and 0.78 (range: 0.35-0.98), respectively. There was no significant difference in accuracy between junior and senior doctors. The median frequency with which doctors classified bowel sounds as abnormal did not differ significantly between patients with and without bowel obstruction (26% vs 23%, P = 0.08). The 53 doctors made up 1378 unique pairs and the median Kappa value was 0.29 (range: -0.15-0.66). Accuracy and inter-observer agreement was generally low. Clinical decisions in patients with possible bowel obstruction should not be based on auscultatory assessment of bowel sounds.
Barra, Filipe Ramos; de Souza, Fernanda Freire; Camelo, Rosimara Eva Ferreira Almeida; Ribeiro, Andrea Campos de Oliveira; Farage, Luciano
2017-01-01
To assess the feasibility of contrast-enhanced spectral mammography (CESM) of the breast for assessing the size of residual tumors after neoadjuvant chemotherapy (NAC). In breast cancer patients who underwent NAC between 2011 and 2013, we evaluated residual tumor measurements obtained with CESM and full-field digital mammography (FFDM). We determined the concordance between the methods, as well as their level of agreement with the pathology. Three radiologists analyzed eight CESM and FFDM measurements separately, considering the size of the residual tumor at its largest diameter and correlating it with that determined in the pathological analysis. Interobserver agreement was also evaluated. The sensitivity, specificity, positive predictive value, and negative predictive value were higher for CESM than for FFDM (83.33%, 100%, 100%, and 66% vs. 50%, 50%, 50%, and 25%, respectively). The CESM measurements showed a strong, consistent correlation with the pathological findings (correlation coefficient = 0.76-0.92; intraclass correlation coefficient = 0.692-0.886). The correlation between the FFDM measurements and the pathological findings was not statistically significant, with questionable consistency (intraclass correlation coefficient = 0.488-0.598). Agreement with the pathological findings was narrower for CESM measurements than for FFDM measurements. Interobserver agreement was higher for CESM than for FFDM (0.94 vs. 0.88). CESM is a feasible means of evaluating residual tumor size after NAC, showing a good correlation and good agreement with pathological findings. For CESM measurements, the interobserver agreement was excellent.
Evaluating causes of error in landmark-based data collection using scanners
Shearer, Brian M.; Cooke, Siobhán B.; Halenar, Lauren B.; Reber, Samantha L.; Plummer, Jeannette E.; Delson, Eric
2017-01-01
In this study, we assess the precision, accuracy, and repeatability of craniodental landmarks (Types I, II, and III, plus curves of semilandmarks) on a single macaque cranium digitally reconstructed with three different surface scanners and a microCT scanner. Nine researchers with varying degrees of osteological and geometric morphometric knowledge landmarked ten iterations of each scan (40 total) to test the effects of scan quality, researcher experience, and landmark type on levels of intra- and interobserver error. Two researchers additionally landmarked ten specimens from seven different macaque species using the same landmark protocol to test the effects of the previously listed variables relative to species-level morphological differences (i.e., observer variance versus real biological variance). Error rates within and among researchers by scan type were calculated to determine whether or not data collected by different individuals or on different digitally rendered crania are consistent enough to be used in a single dataset. Results indicate that scan type does not impact rate of intra- or interobserver error. Interobserver error is far greater than intraobserver error among all individuals, and is similar in variance to that found among different macaque species. Additionally, experience with osteology and morphometrics both positively contribute to precision in multiple landmarking sessions, even where less experienced researchers have been trained in point acquisition. Individual training increases precision (although not necessarily accuracy), and is highly recommended in any situation where multiple researchers will be collecting data for a single project. PMID:29099867
Online Studies on Variation in Orthopedic Surgery: Computed Tomography in MPEG4 Versus DICOM Format.
Mellema, Jos J; Mallee, Wouter H; Guitton, Thierry G; van Dijk, C Niek; Ring, David; Doornberg, Job N
2017-10-01
The purpose of this study was to compare the observer participation and satisfaction as well as interobserver reliability between two online platforms, Science of Variation Group (SOVG) and Traumaplatform Study Collaborative, for the evaluation of complex tibial plateau fractures using computed tomography in MPEG4 and DICOM format. A total of 143 observers started with the online evaluation of 15 complex tibial plateau fractures via either the SOVG or Traumaplatform Study Collaborative websites using MPEG4 videos or a DICOM viewer, respectively. Observers were asked to indicate the absence or presence of four tibial plateau fracture characteristics and to rate their satisfaction with the evaluation as provided by the respective online platforms. The observer participation rate was significantly higher in the SOVG (MPEG4 video) group compared to that in the Traumaplatform Study Collaborative (DICOM viewer) group (75 and 43%, respectively; P < 0.001). The median observer satisfaction with the online evaluation was seven (range, 0-10) using MPEG4 video compared to six (range, 1-9) using DICOM viewer (P = 0.11). The interobserver reliability for recognition of fracture characteristics in complex tibial plateau fractures was higher for the evaluation using MPEG4 video. In conclusion, observer participation and interobserver reliability for the characterization of tibial plateau fractures was greater with MPEG4 videos than with a standard DICOM viewer, while there was no difference in observer satisfaction. Future reliability studies should account for the method of delivering images.
Yagi, Kazuyoshi; Saka, Akiko; Nozawa, Yujiro; Nakamura, Atsuo
2014-04-01
To reduce the incidence of metachronous gastric carcinoma after endoscopic resection of early gastric cancer, Helicobacter pylori eradication therapy has been endorsed. It is not unusual for such patients to be H. pylori negative after eradication or for other reasons. If it were possible to predict H. pylori status using endoscopy alone, it would be very useful in clinical practice. To clarify the accuracy of endoscopic judgment of H. pylori status, we evaluated it in the stomach after endoscopic submucosal dissection (ESD) of gastric cancer. Fifty-six patients treated by ESD were enrolled. The diagnostic criteria for H. pylori status by conventional endoscopy and narrow-band imaging (NBI)-magnifying endoscopy were decided, and H. pylori status was judged by two endoscopists. Based on the H. pylori stool antigen test as a diagnostic gold standard, conventional endoscopy and NBI-magnifying endoscopy were compared for their sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Interobserver agreement was assessed in terms of κ value. Interobserver agreement was moderate (0.56) for conventional endoscopy and substantial (0.77) for NBI-magnifying endoscopy. The sensitivity, specificity, PPV, and NPV were 0.79, 0.52, 0.70, and 0.63 for conventional endoscopy and 0.91, 0.83, 0.88, and 0.86 for NBI-magnifying endoscopy, respectively. Prediction of H. pylori status using NBI-magnifying endoscopy is practical, and interobserver agreement is substantial. © 2013 John Wiley & Sons Ltd.
Larrabide, Ignacio; Cruz Villa-Uriol, Maria; Cárdenes, Rubén; Pozo, Jose Maria; Macho, Juan; San Roman, Luis; Blasco, Jordi; Vivas, Elio; Marzo, Alberto; Hose, D Rod; Frangi, Alejandro F
2011-05-01
Morphological descriptors are practical and essential biomarkers for diagnosis and treatment selection for intracranial aneurysm management according to the current guidelines in use. Nevertheless, relatively little work has been dedicated to improve the three-dimensional quantification of aneurysmal morphology, to automate the analysis, and hence to reduce the inherent intra and interobserver variability of manual analysis. In this paper we propose a methodology for the automated isolation and morphological quantification of saccular intracranial aneurysms based on a 3D representation of the vascular anatomy. This methodology is based on the analysis of the vasculature skeleton's topology and the subsequent application of concepts from deformable cylinders. These are expanded inside the parent vessel to identify different regions and discriminate the aneurysm sac from the parent vessel wall. The method renders as output the surface representation of the isolated aneurysm sac, which can then be quantified automatically. The proposed method provides the means for identifying the aneurysm neck in a deterministic way. The results obtained by the method were assessed in two ways: they were compared to manual measurements obtained by three independent clinicians as normally done during diagnosis and to automated measurements from manually isolated aneurysms by three independent operators, nonclinicians, experts in vascular image analysis. All the measurements were obtained using in-house tools. The results were qualitatively and quantitatively compared for a set of the saccular intracranial aneurysms (n = 26). Measurements performed on a synthetic phantom showed that the automated measurements obtained from manually isolated aneurysms where the most accurate. The differences between the measurements obtained by the clinicians and the manually isolated sacs were statistically significant (neck width: p <0.001, sac height: p = 0.002). When comparing clinicians' measurements to automatically isolated sacs, only the differences for the neck width were significant (neck width: p <0.001, sac height: p = 0.95). However, the correlation and agreement between the measurements obtained from manually and automatically isolated aneurysms for the neck width: p = 0.43 and sac height: p = 0.95 where found. The proposed method allows the automated isolation of intracranial aneurysms, eliminating the interobserver variability. In average, the computational cost of the automated method (2 min 36 s) was similar to the time required by a manual operator (measurement by clinicians: 2 min 51 s, manual isolation: 2 min 21 s) but eliminating human interaction. The automated measurements are irrespective of the viewing angle, eliminating any bias or difference between the observer criteria. Finally, the qualitative assessment of the results showed acceptable agreement between manually and automatically isolated aneurysms.
Disability after encephalitis: development and validation of a new outcome score
Begum, Ashia; Ooi, Mong How; Faragher, Brian; Lai, Boon Foo; Sandaradura, Indunil; Mohan, Anand; Mandhan, Gaurav; Meharwade, Pratibha; Subhashini, S; Abhishek, Gulia; Begum, Asma; Penkulinti, Srihari; Shankar, M Veera; Ravikumar, R; Young, Carolyn; Cardosa, Mary Jane; Ravi, V; Wong, See Chang; Kneen, Rachel; Solomon, Tom
2010-01-01
Abstract Objective To develop a simple tool for assessing the severity of disability resulting from Japanese encephalitis and whether, as a result, a child is likely to be dependent. Methods A new outcome score based on a 15-item questionnaire was developed after a literature review, examination of current assessment tools, discussion with experts and a pilot study. The score was used to evaluate 100 children in Malaysia (56 Japanese encephalitis patients, 2 patients with encephalitis of unknown etiology and 42 controls) and 95 in India (36 Japanese encephalitis patients, 41 patients with encephalitis of unknown etiology and 18 controls). Inter- and intra-observer variability in the outcome score was determined and the score was compared with full clinical assessment. Findings There was good inter-observer agreement on using the new score to identify likely dependency (Κ = 0.942 for Malaysian children; Κ = 0.786 for Indian children) and good intra-observer agreement (Κ = 1.000 and 0.902, respectively). In addition, agreement between the new score and clinical assessment was also good (Κ = 0.906 and 0.762, respectively). The sensitivity and specificity of the new score for identifying children likely to be dependent were 100% and 98.4% in Malaysia and 100% and 93.8% in India. Positive and negative predictive values were 84.2% and 100% in Malaysia and 65.6% and 100% in India. Conclusion The new tool for assessing disability in children after Japanese encephalitis was simple to use and scores correlated well with clinical assessment. PMID:20680123
Kim, Shin-Jeong; Kim, Sunghee; Kang, Kyung-Ah; Oh, Jina; Lee, Myung-Nam
2016-02-01
The lack of reliable and valid tools to evaluate learning outcomes during simulations has limited the adoption and progress of simulation-based nursing education. This study had two aims: (a) to develop a simulation evaluation tool (SET(c-dehydration)) to assess students' clinical judgment in caring for children with dehydration based on the Lasater Clinical Judgment Rubric (LCJR) and (b) to examine its reliability and validity. Undergraduate nursing students from two nursing schools in South Korea participated in this study from March 3 through June 10, 2014. The SET(c-dehydration) was developed, and 120 nursing students' clinical judgment was evaluated. Descriptive statistics, Cronbach's alpha, Cohen's kappa coefficient, and confirmatory factor analysis (CFA) were used to analyze the data. A 41-item version of the SET(c-dehydration) with three subscales was developed. Cohen's kappa (measuring inter-observer reliability) of the sessions ranged from .73 to .95, and Cronbach's alpha was .87. The mean total rating of the SET(c-dehydration) by the instructors was 1.92 (±.25), and the mean scores for the four LCJR dimensions of clinical judgment were as follows: noticing (1.74±.27), interpreting (1.85±.43), responding (2.17±.32), and reflecting (1.79±.35). CFA, which was performed to test construct validity, showed that the four dimensions of the SET(c-dehydration) was an appropriate framework. The SET(c-dehydration) provides a means to evaluate clinical judgment in simulation education. Its reliability and validity should be examined further. Copyright © 2015 Elsevier Ltd. All rights reserved.
CLARIPED: a new tool for risk classification in pediatric emergencies.
Magalhães-Barbosa, Maria Clara de; Prata-Barbosa, Arnaldo; Alves da Cunha, Antonio José Ledo; Lopes, Cláudia de Souza
2016-09-01
To present a new pediatric risk classification tool, CLARIPED, and describe its development steps. Development steps: (i) first round of discussion among experts, first prototype; (ii) pre-test of reliability, 36 hypothetical cases; (iii) second round of discussion to perform adjustments; (iv) team training; (v) pre-test with patients in real time; (vi) third round of discussion to perform new adjustments; (vii) final pre-test of validity (20% of medical treatments in five days). CLARIPED features five urgency categories: Red (Emergency), Orange (very urgent), Yellow (urgent), Green (little urgent) and Blue (not urgent). The first classification step includes the measurement of four vital signs (Vipe score); the second step consists in the urgency discrimination assessment. Each step results in assigning a color, selecting the most urgent one for the final classification. Each color corresponds to a maximum waiting time for medical care and referral to the most appropriate physical area for the patient's clinical condition. The interobserver agreement was substantial (kappa=0.79) and the final pre-test, with 82 medical treatments, showed good correlation between the proportion of patients in each urgency category and the number of used resources (p<0.001). CLARIPED is an objective and easy-to-use tool for simple risk classification, of which pre-tests suggest good reliability and validity. Larger-scale studies on its validity and reliability in different health contexts are ongoing and can contribute to the implementation of a nationwide pediatric risk classification system. Copyright © 2016 Sociedade de Pediatria de São Paulo. Publicado por Elsevier Editora Ltda. All rights reserved.
Brown, Charlotte A.; Bogers, Johnannes; Sahebali, Shaira; Depuydt, Christophe E.; De Prins, Frans; Malinowski, Douglas P.
2012-01-01
Since the Pap test was introduced in the 1940s, there has been an approximately 70% reduction in the incidence of squamous cell cervical cancers in many developed countries by the application of organized and opportunistic screening programs. The efficacy of the Pap test, however, is hampered by high interobserver variability and high false-negative and false-positive rates. The use of biomarkers has demonstrated the ability to overcome these issues, leading to improved positive predictive value of cervical screening results. In addition, the introduction of HPV primary screening programs will necessitate the use of a follow-up test with high specificity to triage the high number of HPV-positive tests. This paper will focus on protein biomarkers currently available for use in cervical cancer screening, which appear to improve the detection of women at greatest risk for developing cervical cancer, including Ki-67, p16INK4a, BD ProEx C, and Cytoactiv HPV L1. PMID:22481919
Use of Digitally Stained Multimodal Confocal Mosaic Images to Screen for Nonmelanoma Skin Cancer
Mu, Euphemia W.; Lewin, Jesse M.; Stevenson, Mary L.; Meehan, Shane A.; Carucci, John A.; Gareau, Daniel S.
2017-01-01
IMPORTANCE Confocal microscopy has the potential to provide rapid bedside pathologic analysis, but clinical adoption has been limited in part by the need for physician retraining to interpret grayscale images. Digitally stained confocal mosaics (DSCMs) mimic the colors of routine histologic specimens and may increase adaptability of this technology. OBJECTIVE To evaluate the accuracy and precision of 3 physicians using DSCMs before and after training to detect basal cell carcinoma (BCC) and squamous cell carcinoma (SCC) in Mohs micrographic surgery fresh-tissue specimens. DESIGN This retrospective study used 133 DSCMs from 64 Mohs tissue excisions, which included clear margins, residual BCC, or residual SCC. Discarded tissue from Mohs surgical excisions from the dermatologic surgery units at Memorial Sloan Kettering Cancer Center and Oregon Health & Science University were collected for confocal imaging from 2006 to 2011. Final data analysis and interpretation took place between 2014 and 2016. Two Mohs surgeons and a Mohs fellow, who were blinded to the correlating gold standard frozen section diagnoses, independently reviewed the DSCMs for residual nonmelanoma skin cancer (NMSC) before and after a brief training session (about 5 minutes). The 2 assessments were separated by a 6-month washout period. MAIN OUTCOMES AND MEASURES Diagnostic accuracy was characterized by sensitivity and specificity of detecting NMSC using DSCMs vs standard frozen histopathologic specimens. The diagnostic precision was calculated based on interobserver agreement and κ scores. Paired 2-sample t tests were used for comparative means analyses before and after training. RESULTS The average respective sensitivities and specificities of detecting NMSC were 90% (95% CI, 89%-91%) and 79% (95% CI, 52%-100%) before training and 99% (95% CI, 99%-99%) (P = .001) and 93% (95% CI, 90%-96%) (P = .18) after training; for BCC, they were 83% (95% CI, 59%-100%) and 92% (95% CI, 81%-100%) before training and 98% (95% CI, 98%-98%) (P = .18) and 97% (95% CI, 95%-100%) (P = .15) after training; for SCC, they were 73% (95% CI, 65%-81%) and 89% (95% CI, 72%-100%) before training and 100% (P = .004) and 98% (95% CI, 95%-100%) (P = .21) after training. The pretraining interobserver agreement was 72% (κ = 0.58), and the posttraining interobserver agreement was 98% (κ = 0.97) (P = .04). CONCLUSIONS AND RELEVANCE Diagnostic use of DSCMs shows promising correlation to frozen histologic analysis, but image quality was affected by variations in image contrast and mosaic-stitching artifact. With training, physicians were able to read DSCMs with significantly improved accuracy and precision to detect NMSC. PMID:27603676
Fledelius, Joan; Khalil, Azza; Hjorthaug, Karin; Frøkiær, Jørgen
2016-12-01
The purpose of this study is to determine whether a qualitative approach or a semi-quantitative approach provides the most robust method for early response evaluation with 2'-deoxy-2'-[(18)F]fluoro-D-glucose (F-18-FDG) positron emission tomography combined with whole body computed tomography (PET/CT) in non-small cell lung cancer (NSCLC). In this study eight Nuclear Medicine consultants analyzed F-18-FDG PET/CT scans from 35 patients with locally advanced NSCLC. Scans were performed at baseline and after 2 cycles of chemotherapy. Each observer used two different methods for evaluation: (1) PET response criteria in solid tumors (PERCIST) 1.0 and (2) a qualitative approach. Both methods allocate patients into one of four response categories (complete and partial metabolic response (CMR and PMR) and stable and progressive metabolic disease (SMD and PMD)). The inter-observer agreement was evaluated using Fleiss' kappa for multiple raters, Cohens kappa for comparison of the two methods, and intraclass correlation coefficients (ICC) for comparison of lean body mass corrected standardized uptake value (SUL) peak measurements. The agreement between observers when determining the percentage change in SULpeak was "almost perfect", with ICC = 0.959. There was a strong agreement among observers allocating patients to the different response categories with a Fleiss kappa of 0.76 (0.71-0.81). In 22 of the 35 patients, complete agreement was observed with PERCIST 1.0. The agreement was lower when using the qualitative method, moderate, having a Fleiss kappa of 0.60 (0.55-0.64). Complete agreement was achieved in only 10 of the 35 patients. The difference between the two methods was statistically significant (p < 0.005) (chi-squared). Comparing the two methods for each individual observer showed Cohen's kappa values ranging from 0.64 to 0.79, translating into a strong agreement between the two methods. PERCIST 1.0 provides a higher overall agreement between observers than the qualitative approach in categorizing early treatment response in NSCLC patients. The inter-observer agreement is in fact strong when using PERCIST 1.0 even when the level of instruction is purposely kept to a minimum in order to mimic the everyday situation. The variability is largely owing to the subjective elements of the method.
NASA Astrophysics Data System (ADS)
Gavrielides, Marios A.; Ronnett, Brigitte M.; Vang, Russell; Seidman, Jeffrey D.
2015-03-01
Studies have shown that different cell types of ovarian carcinoma have different molecular profiles, exhibit different behavior, and that patients could benefit from typespecific treatment. Different cell types display different histopathology features, and different criteria are used for each cell type classification. Inter-observer variability for the task of classifying ovarian cancer cell types is an under-examined area of research. This study served as a pilot study to quantify observer variability related to the classification of ovarian cancer cell types and to extract valuable data for designing a validation study of digital pathology (DP) for this task. Three observers with expertise in gynecologic pathology reviewed 114 cases of ovarian cancer with optical microscopy, with specific guidelines for classifications into distinct cell types. For 93 cases all 3 pathologists agreed on the same cell type, for 18 cases 2 out of 3 agreed, and for 3 cases there was no agreement. Across cell types with a minimum sample size of 10 cases, agreement between all three observers was {91.1%, 80.0%, 90.0%, 78.6%, 100.0%, 61.5%} for the high grade serous carcinoma, low grade serous carcinoma, endometrioid, mucinous, clear cell, and carcinosarcoma cell types respectively. These results indicate that unanimous agreement varied over a fairly wide range. However, additional research is needed to determine the importance of these differences in comparison studies. These results will be used to aid in the design and sizing of such a study comparing optical and digital pathology. In addition, the results will help in understanding the potential role computer-aided diagnosis has in helping to improve the agreement of pathologists for this task.
Abdelfattah, Adham; Otto, Randall J; Simon, Peter; Christmas, Kaitlyn N; Tanner, Gregory; LaMartina, Joey; Levy, Jonathan C; Cuff, Derek J; Mighell, Mark A; Frankle, Mark A
2018-04-01
Revision of unstable reverse shoulder arthroplasty (RSA) remains a significant challenge. The purpose of this study was to determine the reliability of a new treatment-guiding classification for instability after RSA, to describe the clinical outcomes of patients stabilized operatively, and to identify those with higher risk of recurrence. All patients undergoing revision for instability after RSA were identified at our institution. Demographic, clinical, radiographic, and intraoperative data were collected. A classification was developed using all identified causes of instability after RSA and allocating them to 1 of 3 defined treatment-guiding categories. Eight surgeons reviewed all data and applied the classification scheme to each case. Interobserver and intraobserver reliability was used to evaluate the classification scheme. Preoperative clinical outcomes were compared with final follow-up in stabilized shoulders. Forty-three revision cases in 34 patients met the inclusion for study. Five patients remained unstable after revision. Persistent instability most commonly occurred in persistent deltoid dysfunction and postoperative acromial fractures but also in 1 case of soft tissue impingement. Twenty-one patients remained stable at minimum 2 years of follow-up and had significant improvement of clinical outcome scores and range of motion. Reliability of the classification scheme showed substantial and almost perfect interobserver and intraobserver agreement among all the participants (κ = 0.699 and κ = 0.851, respectively). Instability after RSA can be successfully treated with revision surgery using the reliable treatment-guiding classification scheme presented herein. However, more understanding is needed for patients with greater risk of recurrent instability after revision surgery. Copyright © 2017 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
Zucker, Evan J; Cheng, Joseph Y; Haldipur, Anshul; Carl, Michael; Vasanawala, Shreyas S
2018-01-01
To assess the feasibility and performance of conical k-space trajectory free-breathing ultrashort echo time (UTE) chest magnetic resonance imaging (MRI) versus four-dimensional (4D) flow and effects of 50% data subsampling and soft-gated motion correction. Thirty-two consecutive children who underwent both 4D flow and UTE ferumoxytol-enhanced chest MR (mean age: 5.4 years, range: 6 days to 15.7 years) in one 3T exam were recruited. From UTE k-space data, three image sets were reconstructed: 1) one with all data, 2) one using the first 50% of data, and 3) a final set with soft-gating motion correction, leveraging the signal magnitude immediately after each excitation. Two radiologists in blinded fashion independently scored image quality of anatomical landmarks on a 5-point scale. Ratings were compared using Wilcoxon rank-sum, Wilcoxon signed-ranks, and Kruskal-Wallis tests. Interobserver agreement was assessed with the intraclass correlation coefficient (ICC). For fully sampled UTE, mean scores for all structures were ≥4 (good-excellent). Full UTE surpassed 4D flow for lungs and airways (P < 0.001), with similar pulmonary artery (PA) quality (P = 0.62). 50% subsampling only slightly degraded all landmarks (P < 0.001), as did motion correction. Subsegmental PA visualization was possible in >93% scans for all techniques (P = 0.27). Interobserver agreement was excellent for combined scores (ICC = 0.83). High-quality free-breathing conical UTE chest MR is feasible, surpassing 4D flow for lungs and airways, with equivalent PA visualization. Data subsampling only mildly degraded images, favoring lesser scan times. Soft-gating motion correction overall did not improve image quality. 2 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;47:200-209. © 2017 International Society for Magnetic Resonance in Medicine.
Lawless, Margaret E; Tretiakova, Maria S; True, Lawrence D; Vakar-Lopez, Funda
2018-03-01
Distinguishing urothelial carcinoma in situ (CIS) from other flat lesions of the urinary bladder with cytologic atypia is critically important for the management of patients with bladder neoplasia. However, there is high interpathologist variability in making these distinctions. The aim of this study is to assess interobserver agreement between general and specialized genitourinary pathologists, and to compare these diagnoses with those rendered after an immunohistochemical panel is performed. We hypothesized that addition of a set of immunohistochemical stains would reduce the number of cases classified within intermediate categories of atypia of uncertain significance and low-grade dysplasia. Two genitourinary pathologists independently assessed haematoxylin and eosin (H&E)-stained sections of 127 bladder biopsies from each of the 4 International Society of Urological Pathology/World Health Organization categories of flat lesions diagnosed by general pathologists. A subset of biopsies from 49 patients was reassessed after staining with a 3-antibody panel (CD44, CK20, and p53) and the results were correlated with patient follow-up. Based on these immunohistochemistry (IHC) stains, 26 cases (53.1%) were recategorized. Of most clinical importance, 5 of 27 cases (18.5%) originally diagnosed as either atypia of uncertain significance or low-grade dysplasia were recategorized as CIS, and recurrent disease was identified on subsequent biopsies. None of the 10 cases diagnosed as CIS based on H&E stains were recategorized. This triad of IHC stains can improve the precision of pathologic diagnosis of histologically atypical urothelial lesions of flat bladder mucosa. We recommend that pathologists apply this set of IHC stains to such lesions they find problematic based on H&E stains.
Virtual clinics in glaucoma care: face-to-face versus remote decision-making.
Clarke, Jonathan; Puertas, Renata; Kotecha, Aachal; Foster, Paul J; Barton, Keith
2017-07-01
To examine the agreement in clinical decisions of glaucoma status made in a virtual glaucoma clinic with those made during a face-to-face consultation. A trained nurse and technicians entered data prospectively for 204 patients into a proforma. A subsequent face-to-face clinical assessment was completed by either a glaucoma consultant or fellow. Proformas were reviewed remotely by one of two additional glaucoma consultants, and 12 months later, by the clinicians who had undertaken the original clinical examination. The interobserver and intraobserver decision-making agreements of virtual assessment versus standard care were calculated. We identified adverse disagreement between face-to-face and virtual review in 7/204 (3.4%, 95% CI 0.9% to 5.9%) patients, where virtual review failed to predict a need to accelerated follow-up identified in face-to-face review. Misclassification events were rare, occurring in 1.9% (95% CI 0.3% to 3.8%) of assessments. Interobserver κ (95% CI) showed only fair agreement (0.24 (0.04 to 0.43)); this improved to moderate agreement when only consultant decisions were compared against each other (κ=0.41 (0.16 to 0.65)). The intraobserver agreement κ (95% CI) for the consultant was 0.274 (0.073 to 0.476), and that for the fellow was 0.264 (0.031 to 0.497). The low rate of adverse misclassification, combined with the slowly progressive nature of most glaucoma, and the fact that patients will all be regularly reassessed, suggests that virtual clinics offer a safe, logistically viable option for selected patients with glaucoma. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Gabriele, Alex; Marco, Valeria; Gatto, Laura; Paoletti, Giulia; Di Vito, Luca; Castriota, Fausto; Romagnoli, Enrico; Ricciardi, Andrea; Prati, Francesco
2014-10-01
The optical coherence tomography (OCT) evaluation of the stent anatomy requires the inspection of sequential cross section (CS). However stent coils cannot be appreciated in the conventional format as the OCT CS simply display stent struts, that are poorly representative of the stent architecture. The aim of the present study was to validate a new software (Carpet View), which unfolds the stented segment, reconstructing it as an open structure and displaying the stent meshwork. 21 patients were studied with frequency domain OCT after the deployment of different stents: seven bio-absorbable scaffolds (Dream), seven bare metal stent (Vision/Multilink8), seven drug eluting stent (Cre8). Conventional CS reconstructions were post-processed with the Carpet View software and analyzed by the same reader twice (intra-observer variability) and by two different readers (inter-observer variability). A small average difference in the number of all struts was obtained with the two methods (conventional vs carpet view reconstruction). Using the carpet view, high intra-observer and inter-observer correlations were found for the number of struts obtained in each coil. The Pearson correlation values were 0.98 (p = 0.0001) and 0.96 (p = 0.0001) respectively. The same number of coils was found when analyses were repeated by the same reader or by a different reader whilst mild differences in the count of stent junctions were reported. The Carpet View can be used to address the stent geometry with high reproducibility. This approach enables the matching of the same stent portion during serial time points and promises to improve the stent assessment.
Hanna, Gerard G; McAleese, Jonathan; Carson, Kathryn J; Stewart, David P; Cosgrove, Vivian P; Eakin, Ruth L; Zatari, Ashraf; Lynch, Tom; Jarritt, Peter H; Young, V A Linda; O'Sullivan, Joe M; Hounsell, Alan R
2010-05-01
Positron emission tomography (PET), in addition to computed tomography (CT), has an effect in target volume definition for radical radiotherapy (RT) for non-small-cell lung cancer (NSCLC). In previously PET-CT staged patients with NSCLC, we assessed the effect of using an additional planning PET-CT scan for gross tumor volume (GTV) definition. A total of 28 patients with Stage IA-IIIB NSCLC were enrolled. All patients had undergone staging PET-CT to ensure suitability for radical RT. Of the 28 patients, 14 received induction chemotherapy. In place of a RT planning CT scan, patients underwent scanning on a PET-CT scanner. In a virtual planning study, four oncologists independently delineated the GTV on the CT scan alone and then on the PET-CT scan. Intraobserver and interobserver variability were assessed using the concordance index (CI), and the results were compared using the Wilcoxon signed ranks test. PET-CT improved the CI between observers when defining the GTV using the PET-CT images compared with using CT alone for matched cases (median CI, 0.57 for CT and 0.64 for PET-CT, p = .032). The median of the mean percentage of volume change from GTV(CT) to GTV(FUSED) was -5.21% for the induction chemotherapy group and 18.88% for the RT-alone group. Using the Mann-Whitney U test, this was significantly different (p = .001). PET-CT RT planning scan, in addition to a staging PET-CT scan, reduces interobserver variability in GTV definition for NSCLC. The GTV size with PET-CT compared with CT in the RT-alone group increased and was reduced in the induction chemotherapy group.
Arrhythmia discrimination by physician and defibrillator: importance of atrial channel.
Diemberger, Igor; Martignani, Cristian; Biffi, Mauro; Frabetti, Lorenzo; Valzania, Cinzia; Cooke, Robin M T; Rapezzi, Claudio; Branzi, Angelo; Boriani, Giuseppe
2012-01-26
Many ICD carriers experience inappropriate shocks, but the relative merits of dual- /single-chamber devices for arrhythmia discrimination still remain unclear. We explored possible advantages of the atrial data provided by dual-chamber implantable defibrillators (ICD) for discrimination of real-life supraventricular/ventricular tachyarrhythmias (SVT/VT). 100 dual-chamber traces from 24 ICD were blindly reviewed in dual-chamber and simulated single-chamber (with/without discriminator data) reading modes by five electrophysiologists who determined chamber of origin and provided Likert-scale "confidence" ratings. We assessed 1) intra/interobserver concordance; 2) diagnostic accuracy, using expert diagnoses as a reference standard; 3) ROC curves of sensitivity/specificity of "likelihood perception" scores, generated by combining chamber-of-origin diagnostic judgments with Likert-scale "confidence" ratings. We also assessed diagnostic accuracy of automated discrimination by all possible dual-/single-chamber algorithm configurations. Interobserver concordance was "substantial" (modified Cohen kappa-test values for dual-/single-chamber, 0.79/0.68); intraobserver concordance "almost complete" (kappa ≥ 0.89). Dual-chamber mode provided best diagnostic sensitivity/specificity (99%/92%) and highest reader confidence (p<0.001). Area under ROC curves of sensitivity/specificity values for the "likelihood perception" score (representing electrophysiologists' perceptions of the likelihood that an episode was of ventricular origin) was highest in dual-chamber mode (0.98 vs. 0.93 for both single-chamber modes; p<0.001). Regarding automated discrimination, all four dual-chamber configurations conferred 100% sensitivity (specificity values ranged 39%-88%), whereas single-chamber configurations appeared inferior (best sensitivity/specificity combination, 89%/64%). Availability of the atrial channel helps in reducing inappropriate ICD therapies by providing relevant advantages in terms of both appropriate cardiologist's post-hoc discrimination of SVT/VT (improving program tailoring) and automated arrhythmia discrimination. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.