observer reliability study: Topics by Science.gov

Sample records for observer reliability study

Generalizability and decision studies to inform observational and experimental research in classroom settings.

PubMed

Bottema-Beutel, Kristen; Lloyd, Blair; Carter, Erik W; Asmus, Jennifer M

2014-11-01

Attaining reliable estimates of observational measures can be challenging in school and classroom settings, as behavior can be influenced by multiple contextual factors. Generalizability (G) studies can enable researchers to estimate the reliability of observational data, and decision (D) studies can inform how many observation sessions are necessary to achieve a criterion level of reliability. We conducted G and D studies using observational data from a randomized control trial focusing on social and academic participation of students with severe disabilities in inclusive secondary classrooms. Results highlight the importance of anchoring observational decisions to reliability estimates from existing or pilot data sets. We outline steps for conducting G and D studies and address options when reliability estimates are lower than desired.
Surgeon Reliability for the Assessment of Lumbar Spinal Stenosis on MRI: The Impact of Surgeon Experience.

PubMed

Marawar, Satyajit V; Madom, Ian A; Palumbo, Mark; Tallarico, Richard A; Ordway, Nathaniel R; Metkar, Umesh; Wang, Dongliang; Green, Adam; Lavelle, William F

2017-01-01

Treating surgeon's visual assessment of axial MRI images to ascertain the degree of stenosis has a critical impact on surgical decision-making. The purpose of this study was to prospectively analyze the impact of surgeon experience on inter-observer and intra-observer reliability of assessing severity of spinal stenosis on MRIs by spine surgeons directly involved in surgical decision-making. Seven fellowship trained spine surgeons reviewed MRI studies of 30 symptomatic patients with lumbar stenosis and graded the stenosis in the central canal, the lateral recess and the foramen at T12-L1 to L5-S1 as none, mild, moderate or severe. No specific instructions were provided to what constituted mild, moderate, or severe stenosis. Two surgeons were "senior" (>fifteen years of practice experience); two were "intermediate" (>four years of practice experience), and three "junior" (< one year of practice experience). The concordance correlation coefficient (CCC) was calculated to assess inter-observer reliability. Seven MRI studies were duplicated and randomly re-read to evaluate inter-observer reliability. Surgeon experience was found to be a strong predictor of inter-observer reliability. Senior inter-observer reliability was significantly higher assessing central(p<0.001), foraminal p=0.005 and lateral p=0.001 than "junior" group.Senior group also showed significantly higher inter-observer reliability that intermediate group assessing foraminal stenosis (p=0.036). In intra-observer reliability the results were contrary to that found in inter-observer reliability. Inter-observer reliability of assessing stenosis on MRIs increases with surgeon experience. Lower intra-observer reliability values among the senior group, although not clearly explained, may be due to the small number of MRIs evaluated and quality of MRI images.Level of evidence: Level 3.
Reliability of joint count assessment in rheumatoid arthritis: a systematic literature review.

PubMed

Cheung, Peter P; Gossec, Laure; Mak, Anselm; March, Lyn

2014-06-01

Joint counts are central to the assessment of rheumatoid arthritis (RA) but reliability is an issue. To evaluate the reliability and agreement of joint counts (intra-observer and inter-observer) by health care professionals (physicians, nurses, and metrologists) and patients in RA, and the impact of training and standardization on joint count reliability through a systematic literature review. Articles reporting joint count reliability or agreement in RA in PubMed, EMBase, and the Cochrane library between 1960 and 2012 were selected. Data were extracted regarding tender joint counts (TJCs) and swollen joint counts (SJCs) derived by physicians, metrologists, or patients for intra-observer and inter-observer reliability. In addition, methods and effects of training or standardization were extracted. Statistics expressing reliability such as intraclass correlation coefficients (ICCs) were extracted. Data analysis was primarily descriptive due to high heterogeneity. Twenty-eight studies on health care professionals (HCP) and 20 studies on patients were included. Intra-observer reliability for TJCs and SJCs was good for HCPs and patients (range of ICC: 0.49-0.98). Inter-observer reliability between HCPs for TJCs was higher than for SJCs (range of ICC: 0.64-0.88 vs. 0.29-0.98). Patient inter-observer reliability with HCPs as comparators was better for TJCs (range of ICC: 0.31-0.91) compared to SJCs (0.16-0.64). Nine studies (7 with HCPs and 2 with patients) evaluated consensus or training, with improvement in reliability of TJCs but conflicting evidence for SJCs. Intra- and inter-observer reliability was high for TJCs for HCPs and patients: among all groups, reliability was better for TJCs than SJCs. Inter-observer reliability of SJCs was poorer for patients than HCPs. Data were inconclusive regarding the potential for training to improve SJC reliability. Overall, the results support further evaluation for patient-reported joint counts as an outcome measure. © 2013 Published by Elsevier Inc.
RELIABILITY AND VALIDITY OF SUBJECTIVE ASSESSMENT OF LUMBAR LORDOSIS IN CONVENTIONAL RADIOGRAPHY.

PubMed

Ruhinda, E; Byanyima, R K; Mugerwa, H

2014-10-01

Reliability and validity studies of different lumbar curvature analysis and measurement techniques have been documented however there is limited literature on the reliability and validity of subjective visual analysis. Radiological assessment of lumbar lordotic curve aids in early diagnosis of conditions even before neurologic changes set in. To ascertain the level of reliability and validity of subjective assessment of lumbar lordosis in conventional radiography. A blinded, repeated-measures diagnostic test was carried out on lumbar spine x-ray radiographs. Radiology Department at Joint Clinical Research Centre (JCRC), Mengo-Kampala-Uganda. Seventy (70) lateral lumbar x-ray films were used for this study and were obtained from the archive of JCRC radiology department at Butikiro house, Mengo-Kampala. Poor observer agreement, both inter- and intra-observer, with kappa values of 0.16 was found. Inter-observer agreement was poorer than intra-observer agreement. Kappa values significantly rose when the lumbar lordosis was clustered into four categories without grading each abnormality. The results confirm that subjective assessment of lumbar lordosis has low reliability and validity. Film quality has limited influence on the observer reliability. This study further shows that fewer scale categories of lordosis abnormalities produce better observer reliability.
Hand assessment in older adults with musculoskeletal hand problems: a reliability study.

PubMed

Myers, Helen L; Thomas, Elaine; Hay, Elaine M; Dziedzic, Krysia S

2011-01-07

Musculoskeletal hand pain is common in the general population. This study aims to investigate the inter- and intra-observer reliability of two trained observers conducting a simple clinical interview and physical examination for hand problems in older adults. The reliability of applying the American College of Rheumatology (ACR) criteria for hand osteoarthritis to community-dwelling older adults will also be investigated. Fifty-five participants aged 50 years and over with a current self-reported hand problem and registered with one general practice were recruited from a previous health questionnaire study. Participants underwent a standardised, structured clinical interview and physical examination by two independent trained observers and again by one of these observers a month later. Agreement beyond chance was summarised using Kappa statistics and intra-class correlation coefficients. Median values for inter- and intra-observer reliability for clinical interview questions were found to be "substantial" and "moderate" respectively [median agreement beyond chance (Kappa) was 0.75 (range: -0.03, 0.93) for inter-observer ratings and 0.57 (range: -0.02, 1.00) for intra-observer ratings]. Inter- and intra-observer reliability for physical examination items was variable, with good reliability observed for some items, such as grip and pinch strength, and poor reliability observed for others, notably assessment of altered sensation, pain on resisted movement and judgements based on observation and palpation of individual features at single joints, such as bony enlargement, nodes and swelling. Moderate agreement was observed both between and within observers when applying the ACR criteria for hand osteoarthritis. Standardised, structured clinical interview is reliable for taking a history in community-dwelling older adults with self reported hand problems. Agreement between and within observers for physical examination items is variable. Low Kappa values may have resulted, in part, from a low prevalence of clinical signs and symptoms in the study participants. The decision to use clinical interview and hand assessment variables in clinical practice or further research in primary care should include consideration of clinical applicability and training alongside reliability. Further investigation is required to determine the relationship between these clinical questions and assessments and the clinical course of hand pain and hand problems in community-dwelling older adults.
The intra- and inter-observer reliability of the physical examination methods used to assess patients with patellofemoral joint instability.

PubMed

Smith, Toby O; Clark, Allan; Neda, Sophia; Arendt, Elizabeth A; Post, William R; Grelsamer, Ronald P; Dejour, David; Almqvist, Karl Fredrik; Donell, Simon T

2012-08-01

An accurate physical examination of patients with patellar instability is an important aspect of the diagnosis and treatment. While previous studies have assessed the diagnostic accuracy of such physical examination tests, little has been undertaken to assess the inter- and intra-tester reliability of such techniques. The purpose of this study was to determine the inter- and intra-tester reliability of the physical examination tests used for patients with patellar instability. Five patients (10 knees) with bilateral recurrent patellar instability were assessed by five members of the International Patellofemoral Study Group. Each surgeon assessed each patient twice using 18 reported physical examination tests. The inter- and intra-observer reliability was assessed using weighted Kappa statistics with 95% confidence intervals. The findings of the study suggested that there were very poor inter-observer reliability for the majority of the physical tests, with only the assessments of patellofemoral crepitus, foot arch position and the J-sign presenting with fair to moderate agreement respectively. The intra-observer reliability indicated largely moderate to substantial agreement between the first and second tests performed by each assessor, with the greatest agreement seen for the assessment of tibial torsion, popliteal angle and the Bassett's sign. For the common physical examination tests used in the management of patients with patellar instability inter-observer reliability is poor, while intra-observer reliability is moderate. Standardization of physical exam assessments and further study of these results among different clinicians and more divergent patient groups is indicated. Copyright © 2011 Elsevier B.V. All rights reserved.
Reliability of anthropometric measurements in European preschool children: the ToyBox-study.

PubMed

De Miguel-Etayo, P; Mesana, M I; Cardon, G; De Bourdeaudhuij, I; Góźdź, M; Socha, P; Lateva, M; Iotova, V; Koletzko, B V; Duvinage, K; Androutsos, O; Manios, Y; Moreno, L A

2014-08-01

The ToyBox-study aims to develop and test an innovative and evidence-based obesity prevention programme for preschoolers in six European countries: Belgium, Bulgaria, Germany, Greece, Poland and Spain. In multicentre studies, anthropometric measurements using standardized procedures that minimize errors in the data collection are essential to maximize reliability of measurements. The aim of this paper is to describe the standardization process and reliability (intra- and inter-observer) of height, weight and waist circumference (WC) measurements in preschoolers. All technical procedures and devices were standardized and centralized training was given to the fieldworkers. At least seven children per country participated in the intra- and inter-observer reliability testing. Intra-observer technical error ranged from 0.00 to 0.03 kg for weight and from 0.07 to 0.20 cm for height, with the overall reliability being above 99%. A second training was organized for WC due to low reliability observed in the first training. Intra-observer technical error for WC ranged from 0.12 to 0.71 cm during the first training and from 0.05 to 1.11 cm during the second training, and reliability above 92% was achieved. Epidemiological surveys need standardized procedures and training of researchers to reduce measurement error. In the ToyBox-study, very good intra- and-inter-observer agreement was achieved for all anthropometric measurements performed. © 2014 World Obesity.
Generalizability and Decision Studies to Inform Observational and Experimental Research in Classroom Settings

ERIC Educational Resources Information Center

Bottema-Beutel, Kristen; Lloyd, Blair; Carter, Erik W.; Asmus, Jennifer M.

2014-01-01

Attaining reliable estimates of observational measures can be challenging in school and classroom settings, as behavior can be influenced by multiple contextual factors. Generalizability (G) studies can enable researchers to estimate the reliability of observational data, and decision (D) studies can inform how many observation sessions are…
The reliability of four widely used patellar height ratios.

PubMed

van Duijvenbode, Dennis; Stavenuiter, Michel; Burger, Bart; van Dijke, Cees; Spermon, Jacco; Hoozemans, Marco

2016-03-01

The objective of this study was to evaluate the inter-observer reliability and the intra-observer reliability of four patellar height ratios: Insall-Salvati (IS), modified Insall-Salvati (MIS), Blackburne-Peel (BP) and Caton-Deschamps (CD). The patellar height ratios were assessed by four independent examiners using weight-bearing lateral knee radiographs in 30° flexion. Intra-class correlation coefficients and Fleiss' kappa's were determined. The inter-observer reliability was excellent for the IS and moderate for the other ratios. When the ratio values were categorized, the inter-observer reliability was strong for the IS, moderate for the MIS and BP, and poor for the CD. The intra-observer reliability was excellent for the IS, MIS and CD, and strong for the BP. When the ratio values were categorized, the intra-observer reliability was strong for the IS and MIS, and moderate for the other ratios. Although the IS showed best reliability, we advise to use the MIS as it showed the second best reliability but is, according to the literature, associated with better validity.
Accuracy and reliability of observational gait analysis data: judgments of push-off in gait after stroke.

PubMed

McGinley, Jennifer L; Goldie, Patricia A; Greenwood, Kenneth M; Olney, Sandra J

2003-02-01

Physical therapists routinely observe gait in clinical practice. The purpose of this study was to determine the accuracy and reliability of observational assessments of push-off in gait after stroke. Eighteen physical therapists and 11 subjects with hemiplegia following a stroke participated in the study. Measurements of ankle power generation were obtained from subjects following stroke using a gait analysis system. Concurrent videotaped gait performances were observed by the physical therapists on 2 occasions. Ankle power generation at push-off was scored as either normal or abnormal using two 11-point rating scales. These observational ratings were correlated with the measurements of peak ankle power generation. A high correlation was obtained between the observational ratings and the measurements of ankle power generation (mean Pearson r=.84). Interobserver reliability was moderately high (mean intraclass correlation coefficient [ICC (2,1)]=.76). Intraobserver reliability also was high, with a mean ICC (2,1) of.89 obtained. Physical therapists were able to make accurate and reliable judgments of push-off in videotaped gait of subjects following stroke using observational assessment. Further research is indicated to explore the accuracy and reliability of data obtained with observational gait analysis as it occurs in clinical practice.
Reliability of videotaped observational gait analysis in patients with orthopedic impairments

PubMed Central

Brunnekreef, Jaap J; van Uden, Caro JT; van Moorsel, Steven; Kooloos, Jan GM

2005-01-01

Background In clinical practice, visual gait observation is often used to determine gait disorders and to evaluate treatment. Several reliability studies on observational gait analysis have been described in the literature and generally showed moderate reliability. However, patients with orthopedic disorders have received little attention. The objective of this study is to determine the reliability levels of visual observation of gait in patients with orthopedic disorders. Methods The gait of thirty patients referred to a physical therapist for gait treatment was videotaped. Ten raters, 4 experienced, 4 inexperienced and 2 experts, individually evaluated these videotaped gait patterns of the patients twice, by using a structured gait analysis form. Reliability levels were established by calculating the Intraclass Correlation Coefficient (ICC), using a two-way random design and based on absolute agreement. Results The inter-rater reliability among experienced raters (ICC = 0.42; 95%CI: 0.38–0.46) was comparable to that of the inexperienced raters (ICC = 0.40; 95%CI: 0.36–0.44). The expert raters reached a higher inter-rater reliability level (ICC = 0.54; 95%CI: 0.48–0.60). The average intra-rater reliability of the experienced raters was 0.63 (ICCs ranging from 0.57 to 0.70). The inexperienced raters reached an average intra-rater reliability of 0.57 (ICCs ranging from 0.52 to 0.62). The two expert raters attained ICC values of 0.70 and 0.74 respectively. Conclusion Structured visual gait observation by use of a gait analysis form as described in this study was found to be moderately reliable. Clinical experience appears to increase the reliability of visual gait analysis. PMID:15774012
Strengthening the reliability and credibility of observational epidemiology studies by creating an Observational Studies Register.

PubMed

Swaen, Gerard M H; Carmichael, Neil; Doe, John

2011-05-01

To evaluate the need for the creation of a system in which observational epidemiology studies are registered; an Observational Studies Register (OSR). The current scientific process for observational epidemiology studies is described. Next, a parallel is made with the clinical trials area, where the creation of clinical trial registers has greatly restored and improved their credibility and reliability. Next, the advantages and disadvantages of an OSR are compared. The advantages of an OSR outweigh its disadvantages. The creation of an OSR, similar to the existing Clinical Trials Registers, will improve the assessment of publication bias and will provide an opportunity to compare the original study protocol with the results reported in the publication. Reliability, credibility, and transparency of observational epidemiology studies are strengthened by the creation of an OSR. We propose a structured, collaborative, and coordinated approach for observational epidemiology studies that can provide solutions for existing weaknesses and will strengthen credibility and reliability, similar to the approach currently used in clinical trials, where Clinical Trials Registers have played a key role in strengthening their scientific value. Copyright © 2011 Elsevier Inc. All rights reserved.
Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability

ERIC Educational Resources Information Center

Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca

2018-01-01

Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…
Inter-Observer, Intra-Observer and Intra-Individual Reliability of Uroflowmetry Tests in Aged Men: A Generalizability Theory Approach.

PubMed

Liu, Ying-Buh; Yang, Stephen S; Hsieh, Cheng-Hsing; Lin, Chia-Da; Chang, Shang-Jen

2014-05-01

To evaluate the inter-observer, intra-observer and intra-individual reliability of uroflowmetry and post-void residual urine (PVR) tests in adult men. Healthy volunteers aged over 40 years were enrolled. Every participant underwent two sets of uroflowmetry and PVR tests with a 2-week interval between the tests. The uroflowmetry tests were interpreted by four urologists independently. Uroflowmetry curves were classified as bell-shaped, bell-shaped with tail, obstructive, restrictive, staccato, interrupted and tower-shaped and scored from 1 (highly abnormal) to 5 (absolutely normal). The agreements between the observers, interpretations and tests within individuals were analyzed using kappa statistics and intraclass correlation coefficients. Generalizability theory with decision analysis was used to determine how many observers, tests, and interpretations were needed to obtain an acceptable reliability (> 0.80). Of 108 volunteers, we randomly selected the uroflowmetry results from 25 participants for the evaluation of reliability. The mean age of the studied adults was 55.3 years. The intra-individual and intra-observer reliability on uroflowmetry tests ranged from good to very good. However, the inter-observer reliability on normalcy and specific type of flow pattern were relatively lower. In generalizability theory, three observers were needed to obtain an acceptable reliability on normalcy of uroflow pattern if the patient underwent uroflowmetry tests twice with one observation. The intra-individual and intra-observer reliability on uroflowmetry tests were good while the inter-observer reliability was relatively lower. To improve inter-observer reliability, the definition of uroflowmetry should be clarified by the International Continence Society. © 2013 Wiley Publishing Asia Pty Ltd.
Inter- and intra- observer reliability of risk assessment of repetitive work without an explicit method.

PubMed

Eliasson, Kristina; Palm, Peter; Nyman, Teresia; Forsman, Mikael

2017-07-01

A common way to conduct practical risk assessments is to observe a job and report the observed long term risks for musculoskeletal disorders. The aim of this study was to evaluate the inter- and intra-observer reliability of ergonomists' risk assessments without the support of an explicit risk assessment method. Twenty-one experienced ergonomists assessed the risk level (low, moderate, high risk) of eight upper body regions, as well as the global risk of 10 video recorded work tasks. Intra-observer reliability was assessed by having nine of the ergonomists repeat the procedure at least three weeks after the first assessment. The ergonomists made their risk assessment based on his/her experience and knowledge. The statistical parameters of reliability included agreement in %, kappa, linearly weighted kappa, intraclass correlation and Kendall's coefficient of concordance. The average inter-observer agreement of the global risk was 53% and the corresponding weighted kappa (K w ) was 0.32, indicating fair reliability. The intra-observer agreement was 61% and 0.41 (K w ). This study indicates that risk assessments of the upper body, without the use of an explicit observational method, have non-acceptable reliability. It is therefore recommended to use systematic risk assessment methods to a higher degree. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
The sizing of hamstring grafts for anterior cruciate reconstruction: intra- and inter-observer reliability.

PubMed

Dwyer, Tim; Whelan, Daniel B; Khoshbin, Amir; Wasserstein, David; Dold, Andrew; Chahal, Jaskarndip; Nauth, Aaron; Murnaghan, M Lucas; Ogilvie-Harris, Darrell J; Theodoropoulos, John S

2015-04-01

The objective of this study was to establish the intra- and inter-observer reliability of hamstring graft measurement using cylindrical sizing tubes. Hamstring tendons (gracilis and semitendinosus) were harvested from ten cadavers by a single surgeon and whip stitched together to create ten 4-strand hamstring grafts. Ten sports medicine surgeons and fellows sized each graft independently using either hollow cylindrical sizers or block sizers in 0.5-mm increments—the sizing technique used was applied consistently to each graft. Surgeons moved sequentially from graft to graft and measured each hamstring graft twice. Surgeons were asked to state the measured proximal (femoral) and distal (tibial) diameter of each graft, as well as the diameter of the tibial and femoral tunnels that they would drill if performing an anterior cruciate ligament (ACL) reconstruction using that graft. Reliability was established using intra-class correlation coefficients. Overall, both the inter-observer and intra-observer agreement were >0.9, demonstrating excellent reliability. The inter-observer reliability for drill sizes was also excellent (>0.9). Excellent correlation was seen between cylindrical sizing, and drill sizes (>0.9). Sizing of hamstring grafts by multiple surgeons demonstrated excellent intra-observer and intra-observer reliability, potentially validating clinical studies exploring ACL reconstruction outcomes by hamstring graft diameter when standard techniques are used. III.
Evaluation of General Classes of Reliability Estimators Often Used in Statistical Analyses of Quasi-Experimental Designs

NASA Astrophysics Data System (ADS)

Saini, K. K.; Sehgal, R. K.; Sethi, B. L.

2008-10-01

In this paper major reliability estimators are analyzed and there comparatively result are discussed. There strengths and weaknesses are evaluated in this case study. Each of the reliability estimators has certain advantages and disadvantages. Inter-rater reliability is one of the best ways to estimate reliability when your measure is an observation. However, it requires multiple raters or observers. As an alternative, you could look at the correlation of ratings of the same single observer repeated on two different occasions. Each of the reliability estimators will give a different value for reliability. In general, the test-retest and inter-rater reliability estimates will be lower in value than the parallel forms and internal consistency ones because they involve measuring at different times or with different raters. Since reliability estimates are often used in statistical analyses of quasi-experimental designs.
Clinical assessment of scapular positioning in musicians: an intertester reliability study.

PubMed

Struyf, Filip; Nijs, Jo; De Coninck, Kris; Giunta, Marco; Mottram, Sarah; Meeusen, Romain

2009-01-01

The reliability of the measurement of the distance between the posterior border of the acromion and the wall and the reliability of the modified lateral scapular slide test have not been studied. Overall, the reliability of the clinical tools used to assess scapular positioning has not been studied in musicians. To examine the intertester reliability of scapular observation and 2 clinical tests for the assessment of scapular positioning in musicians. Intertester reliability study. University research laboratory. Thirty healthy student musicians at a single university. Two assessors performed a standardized observation protocol, the measurement of the distance between the posterior border of the acromion and the wall, and the modified lateral scapular slide test. Each assessor was blinded to the other's findings. The intertester reliability coefficients (kappa) for the observation in relaxed position, during unloaded movement, and during loaded movement were 0.41, 0.63, and 0.36, respectively. The kappa values for the observation of tilting and winging at rest were 0.48 and 0.42, respectively; during unloaded movement, the kappa values were 0.52 and 0.78, respectively; and with a 1-kg load, the kappa values were 0.24 and 0.50, respectively. The intraclass correlation coefficient (ICC) of the measurement of the acromial distance was 0.72 in relaxed position and 0.75 with the participant actively retracting both shoulders. The ICCs for the modified lateral scapular slide test varied between 0.63 and 0.58. Our results demonstrated that the modified lateral scapular slide test was not a reliable tool to assess scapular positioning in these participants. Our data indicated that scapular observation in the relaxed position and during unloaded abduction in the frontal plane was a reliable assessment tool. The reliability of the measurement of the distance between the posterior border of the acromion and the wall in healthy musicians was moderate.
Clinical Assessment of Scapular Positioning in Musicians: An Intertester Reliability Study

PubMed Central

Struyf, Filip; Nijs, Jo; De Coninck, Kris; Giunta, Marco; Mottram, Sarah; Meeusen, Romain

2009-01-01

Abstract Context: The reliability of the measurement of the distance between the posterior border of the acromion and the wall and the reliability of the modified lateral scapular slide test have not been studied. Overall, the reliability of the clinical tools used to assess scapular positioning has not been studied in musicians. Objective: To examine the intertester reliability of scapular observation and 2 clinical tests for the assessment of scapular positioning in musicians. Design: Intertester reliability study. Setting: University research laboratory. Patients or Other Participants: Thirty healthy student musicians at a single university. Main Outcome Measure(s): Two assessors performed a standardized observation protocol, the measurement of the distance between the posterior border of the acromion and the wall, and the modified lateral scapular slide test. Each assessor was blinded to the other's findings. Results: The intertester reliability coefficients (κ) for the observation in relaxed position, during unloaded movement, and during loaded movement were 0.41, 0.63, and 0.36, respectively. The κ values for the observation of tilting and winging at rest were 0.48 and 0.42, respectively; during unloaded movement, the κ values were 0.52 and 0.78, respectively; and with a 1-kg load, the κ values were 0.24 and 0.50, respectively. The intraclass correlation coefficient (ICC) of the measurement of the acromial distance was 0.72 in relaxed position and 0.75 with the participant actively retracting both shoulders. The ICCs for the modified lateral scapular slide test varied between 0.63 and 0.58. Conclusions: Our results demonstrated that the modified lateral scapular slide test was not a reliable tool to assess scapular positioning in these participants. Our data indicated that scapular observation in the relaxed position and during unloaded abduction in the frontal plane was a reliable assessment tool. The reliability of the measurement of the distance between the posterior border of the acromion and the wall in healthy musicians was moderate. PMID:19771291
Development of Creative Behavior Observation Form: A Study on Validity and Reliability

ERIC Educational Resources Information Center

Dere, Zeynep; Ömeroglu, Esra

2018-01-01

This study, Creative Behavior Observation Form was developed to assess creativity of the children. While the study group on the reliability and validity of Creative Behavior Observation Form was being developed, 257 children in total who were at the ages of 5-6 were used as samples with stratified sampling method. Content Validity Index (CVI) and…

Reliability Stress-Strength Models for Dependent Observations with Applications in Clinical Trials

NASA Technical Reports Server (NTRS)

Kushary, Debashis; Kulkarni, Pandurang M.

1995-01-01

We consider the applications of stress-strength models in studies involving clinical trials. When studying the effects and side effects of certain procedures (treatments), it is often the case that observations are correlated due to subject effect, repeated measurements and observing many characteristics simultaneously. We develop maximum likelihood estimator (MLE) and uniform minimum variance unbiased estimator (UMVUE) of the reliability which in clinical trial studies could be considered as the chances of increased side effects due to a particular procedure compared to another. The results developed apply to both univariate and multivariate situations. Also, for the univariate situations we develop simple to use lower confidence bounds for the reliability. Further, we consider the cases when both stress and strength constitute time dependent processes. We define the future reliability and obtain methods of constructing lower confidence bounds for this reliability. Finally, we conduct simulation studies to evaluate all the procedures developed and also to compare the MLE and the UMVUE.
The reliability of the Glasgow Coma Scale: a systematic review.

PubMed

Reith, Florence C M; Van den Brande, Ruben; Synnot, Anneliese; Gruen, Russell; Maas, Andrew I R

2016-01-01

The Glasgow Coma Scale (GCS) provides a structured method for assessment of the level of consciousness. Its derived sum score is applied in research and adopted in intensive care unit scoring systems. Controversy exists on the reliability of the GCS. The aim of this systematic review was to summarize evidence on the reliability of the GCS. A literature search was undertaken in MEDLINE, EMBASE and CINAHL. Observational studies that assessed the reliability of the GCS, expressed by a statistical measure, were included. Methodological quality was evaluated with the consensus-based standards for the selection of health measurement instruments checklist and its influence on results considered. Reliability estimates were synthesized narratively. We identified 52 relevant studies that showed significant heterogeneity in the type of reliability estimates used, patients studied, setting and characteristics of observers. Methodological quality was good (n = 7), fair (n = 18) or poor (n = 27). In good quality studies, kappa values were ≥0.6 in 85%, and all intraclass correlation coefficients indicated excellent reliability. Poor quality studies showed lower reliability estimates. Reliability for the GCS components was higher than for the sum score. Factors that may influence reliability include education and training, the level of consciousness and type of stimuli used. Only 13% of studies were of good quality and inconsistency in reported reliability estimates was found. Although the reliability was adequate in good quality studies, further improvement is desirable. From a methodological perspective, the quality of reliability studies needs to be improved. From a clinical perspective, a renewed focus on training/education and standardization of assessment is required.
The Effect of Observation Length and Presentation Order on the Reliability and Validity of an Observational Measure of Teaching Quality

ERIC Educational Resources Information Center

Mashburn, Andrew J.; Meyer, J. Patrick; Allen, Joseph P.; Pianta, Robert C.

2014-01-01

Observational methods are increasingly being used in classrooms to evaluate the quality of teaching. Operational procedures for observing teachers are somewhat arbitrary in existing measures and vary across different instruments. To study the effect of different observation procedures on score reliability and validity, we conducted an experimental…
Reliability of two social cognition tests: The combined stories test and the social knowledge test.

PubMed

Thibaudeau, Élisabeth; Cellard, Caroline; Legendre, Maxime; Villeneuve, Karèle; Achim, Amélie M

2018-04-01

Deficits in social cognition are common in psychiatric disorders. Validated social cognition measures with good psychometric properties are necessary to assess and target social cognitive deficits. Two recent social cognition tests, the Combined Stories Test (COST) and the Social Knowledge Test (SKT), respectively assess theory of mind and social knowledge. Previous studies have shown good psychometric properties for these tests, but the test-retest reliability has never been documented. The aim of this study was to evaluate the test-retest reliability and the inter-rater reliability of the COST and the SKT. The COST and the SKT were administered twice to a group of forty-two healthy adults, with a delay of approximately four weeks between the assessments. Excellent test-retest reliability was observed for the COST, and a good test-retest reliability was observed for the SKT. There was no evidence of practice effect. Furthermore, an excellent inter-rater reliability was observed for both tests. This study shows a good reliability of the COST and the SKT that adds to the good validity previously reported for these two tests. These good psychometrics properties thus support that the COST and the SKT are adequate measures for the assessment of social cognition. Copyright © 2018. Published by Elsevier B.V.
Inter-rater reliability of direct observations of the physical and psychosocial working conditions in eldercare: An evaluation in the DOSES project.

PubMed

Karstad, Kristina; Rugulies, Reiner; Skotte, Jørgen; Munch, Pernille Kold; Greiner, Birgit A; Burdorf, Alex; Søgaard, Karen; Holtermann, Andreas

2018-05-01

The aim of the study was to develop and evaluate the reliability of the "Danish observational study of eldercare work and musculoskeletal disorders" (DOSES) observation instrument to assess physical and psychosocial risk factors for musculoskeletal disorders (MSD) in eldercare work. During 1.5 years, sixteen raters conducted 117 inter-rater observations from 11 nursing homes. Reliability was evaluated using percent agreement and Gwet's AC1 coefficient. Of the 18 examined items, inter-rater reliability was excellent for 7 items (AC1>0.75) fair to good for 7 items (AC1 0.40-0.75) and poor for 2 items (AC1 0-0.40). For 2 items there was no agreement between the raters (AC1 <0). The reliability did not differ between the first and second half of the data collection period and the inter-rater observations were representative regarding occurrence of events in eldercare work. The instrument is appropriate for assessing physical and psychosocial risk factors for MSD among eldercare workers. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Reliability and Validity of the Dyadic Observed Communication Scale (DOCS).

PubMed

Hadley, Wendy; Stewart, Angela; Hunter, Heather L; Affleck, Katelyn; Donenberg, Geri; Diclemente, Ralph; Brown, Larry K

2013-02-01

We evaluated the reliability and validity of the Dyadic Observed Communication Scale (DOCS) coding scheme, which was developed to capture a range of communication components between parents and adolescents. Adolescents and their caregivers were recruited from mental health facilities for participation in a large, multi-site family-based HIV prevention intervention study. Seventy-one dyads were randomly selected from the larger study sample and coded using the DOCS at baseline. Preliminary validity and reliability of the DOCS was examined using various methods, such as comparing results to self-report measures and examining interrater reliability. Results suggest that the DOCS is a reliable and valid measure of observed communication among parent-adolescent dyads that captures both verbal and nonverbal communication behaviors that are typical intervention targets. The DOCS is a viable coding scheme for use by researchers and clinicians examining parent-adolescent communication. Coders can be trained to reliably capture individual and dyadic components of communication for parents and adolescents and this complex information can be obtained relatively quickly.
Intra- and inter-tester reliability and validity of normal finger size measurement using the Japanese ring gauge system.

PubMed

Suzuki, T; Sato, Y; Sotome, S; Arai, H; Arai, A; Yoshida, H

2017-06-01

This study was designed to investigate the reliability and validity of measurements of finger diameters with a ring gauge. A reliability study enrolled two independent samples (50 participants and seven examiners in Study I; 26 participants and 26 examiners in Study II). The sizes of each participant's little fingers were measured twice with a ring gauge by each examiner. To investigate the validity of the measurements, five hand therapists compared the finger size and hand volume of 30 participants with the ring gauge and with a figure-of-eight technique (Study III). The intra-class correlation coefficient for intra-observer reliability ranged from 0.97 to 0.99 in Study I, and 0.90 to 0.97 in Study II. The intra-class correlation coefficient for inter-observer reliability was 0.95 in Study I and 0.94 in Study II. The validity study showed a Pearson product moment correlation coefficient of 0.75. The ring gauge showed high reliability and validity for measurement of finger size. III, diagnostic.
The use and reliability of SymNose for quantitative measurement of the nose and lip in unilateral cleft lip and palate patients.

PubMed

Mosmuller, David; Tan, Robin; Mulder, Frans; Bachour, Yara; de Vet, Henrica; Don Griot, Peter

2016-10-01

It is essential to have a reliable assessment method in order to compare the results of cleft lip and palate surgery. In this study the computer-based program SymNose, a method for quantitative assessment of the nose and lip, will be assessed on usability and reliability. The symmetry of the nose and lip was measured twice in 50 six-year-old complete and incomplete unilateral cleft lip and palate patients by four observers. For the frontal view the asymmetry level of the nose and upper lip were evaluated and for the basal view the asymmetry level of the nose and nostrils were evaluated. A mean inter-observer reliability when tracing each image once or twice was 0.70 and 0.75, respectively. Tracing the photographs with 2 observers and 4 observers gave a mean inter-observer score of 0.86 and 0.92, respectively. The mean intra-observer reliability varied between 0.80 and 0.84. SymNose is a practical and reliable tool for the retrospective assessment of large caseloads of 2D photographs of cleft patients for research purposes. Moderate to high single inter-observer reliability was found. For future research with SymNose reliable outcomes can be achieved by using the average outcomes of single tracings of two observers. Copyright © 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Inter- and intra-observer reliability of clinical movement-control tests for marines

PubMed Central

2012-01-01

Background Musculoskeletal disorders particularly in the back and lower extremities are common among marines. Here, movement-control tests are considered clinically useful for screening and follow-up evaluation. However, few studies have addressed the reliability of clinical tests, and no such published data exists for marines. The present aim was therefore to determine the inter- and intra-observer reliability of clinically convenient tests emphasizing movement control of the back and hip among marines. A secondary aim was to investigate the sensitivity and specificity of these clinical tests for discriminating musculoskeletal pain disorders in this group of military personnel. Methods This inter- and intra-observer reliability study used a test-retest approach with six standardized clinical tests focusing on movement control for back and hip. Thirty-three marines (age 28.7 yrs, SD 5.9) on active duty volunteered and were recruited. They followed an in-vivo observation test procedure that covered both low- and high-load (threshold) tasks relevant for marines on operational duty. Two independent observers simultaneously rated performance as “correct” or “incorrect” following a standardized assessment protocol. Re-testing followed 7–10 days thereafter. Reliability was analysed using kappa (κ) coefficients, while discriminative power of the best-fitting tests for back- and lower-extremity pain was assessed using a multiple-variable regression model. Results Inter-observer reliability for the six tests was moderate to almost perfect with κ-coefficients ranging between 0.56-0.95. Three tests reached almost perfect inter-observer reliability with mean κ-coefficients > 0.81. However, intra-observer reliability was fair-to-moderate with mean κ-coefficients between 0.22-0.58. Three tests achieved moderate intra-observer reliability with κ-coefficients > 0.41. Combinations of one low- and one high-threshold test best discriminated prior back pain, but results were inconsistent for lower-extremity pain. Conclusions Our results suggest that clinical tests of movement control of back and hip are reliable for use in screening protocols using several observers with marines. However, test-retest reproducibility was less accurate, which should be considered in follow-up evaluations. The results also indicate that combinations of low- and high-threshold tests have discriminative validity for prior back pain, but were inconclusive for lower-extremity pain. PMID:23273285
Intra- and inter-observer reliability of ten major histological scoring systems used for the evaluation of in vivo cartilage repair.

PubMed

Bonasia, Davide Edoardo; Marmotti, Antongiulio; Massa, Alessandro Domenico Felice; Ferro, Andrea; Blonna, Davide; Castoldi, Filippo; Rossi, Roberto

2015-09-01

In the last two decades, many surgical techniques have been described for articular cartilage repair. Reliable histological scoring systems are fundamental tools to evaluate new procedures. Several histological scoring systems have been described, and these can be divided in elementary and comprehensive scores, according to the number of sub-items. The aim of this study was to test the inter- and intra-observer reliability of ten main scores used for the histological evaluation of in vivo cartilage repair. The authors tested the starting hypothesis that elementary scores would show superior intra- and inter-observer reliability compared with comprehensive scores. Fifty histological sections obtained from the trochlea of New Zealand Rabbit and stained with Safranin-O fast green were used. The histological sections were analysed by 4 observers: 2 experienced in cartilage histology and 2 inexperienced. Histological evaluations were performed at time 1 and time 2, separated by a 30-day interval. The following scores were used: Mankin, O'Driscoll, Pineda, Wakitani, Fortier, Selleres, ICRS, ICRSII, Oswestry (OsScore) and modified O'Driscoll. Intra- and inter-observer reliability were evaluated for each score. In addition, the pavement-ceiling effect and the Bland-Altman Coefficient of Repeatability were then evaluated for each sub-item of every score. Intra-observer reliability was high for all observers in every score, even though the reliability was significantly lower for non-expert observers compared with expert counterparts. In terms of Coefficient of Repeatability, some scores performed better (O'Driscoll, Modified O'Driscoll and ICRSII) than others (Fortier, Seller). Inter-observer reliability was high for all observers in every score, but significantly lower for non-expert compared with expert observers. In expert hands, all the scores showed high intra- and inter-observer reliability, independently of the complexity. Although every score has advantages and disadvantages, ICRSII, O'Driscoll and Modified O'Driscoll scores should be preferred for the evaluation of in vivo cartilage repair in animal models.
A Turkish Version of the Critical-Care Pain Observation Tool: Reliability and Validity Assessment.

PubMed

Aktaş, Yeşim Yaman; Karabulut, Neziha

2017-08-01

The study aim was to evaluate the validity and reliability of the Critical-Care Pain Observation Tool in critically ill patients. A repeated measures design was used for the study. A convenience sample of 66 patients who had undergone open-heart surgery in the cardiovascular surgery intensive care unit in Ordu, Turkey, was recruited for the study. The patients were evaluated by using the Critical-Care Pain Observation Tool at rest, during a nociceptive procedure (suctioning), and 20 minutes after the procedure while they were conscious and intubated after surgery. The Turkish version of the Critical-Care Pain Observation Tool has shown statistically acceptable levels of validity and reliability. Inter-rater reliability was supported by moderate-to-high-weighted κ coefficients (weighted κ coefficient = 0.55 to 1.00). For concurrent validity, significant associations were found between the scores on the Critical-Care Pain Observation Tool and the Behavioral Pain Scale scores. Discriminant validity was also supported by higher scores during suctioning (a nociceptive procedure) versus non-nociceptive procedures. The internal consistency of the Critical-Care Pain Observation Tool was 0.72 during a nociceptive procedure and 0.71 during a non-nociceptive procedure. The validity and reliability of the Turkish version of the Critical-Care Pain Observation Tool was determined to be acceptable for pain assessment in critical care, especially for patients who cannot communicate verbally. Copyright © 2016 American Society of PeriAnesthesia Nurses. Published by Elsevier Inc. All rights reserved.
Score Reliability of Adolescent Alcohol Screening Measures: A Meta-Analytic Inquiry

ERIC Educational Resources Information Center

Shields, Alan L.; Campfield, Delia C.; Miller, Christopher S.; Howell, Ryan T.; Wallace, Kimberly; Weiss, Roger D.

2008-01-01

This study describes the reliability reporting practices in empirical studies using eight adolescent alcohol screening tools and characterizes and explores variability in internal consistency estimates across samples. Of 119 observed administrations of these instruments, 40 (34%) reported usable reliability information. The Personal Experience…
Colour evaluation in scars: tristimulus colorimeter, narrow-band simple reflectance meter or subjective evaluation?

PubMed

Draaijers, Lieneke J; Tempelman, Fenike R H; Botman, Yvonne A M; Kreis, Robert W; Middelkoop, Esther; van Zuijlen, Paul P M

2004-03-01

The evaluation of scar colour is, at present, usually limited to an assessment according to a scar assessment scale. Although useful, these assessment scales only evaluate subjectively the degree of scar colour. In this study, the reliability of the subjective assessment of scar colour by observers is compared to the reliability of the measurements of two objective colour measurement instruments. Four independent observers subjectively assessed the vascularisation and pigmentation of 49 scar areas in 20 patients. The degree of vascularisation and pigmentation was scored according to a scale ranging from '1', when it appeared to be like healthy skin, to '10', which corresponds to the worst imaginable outcome of vascularisation or pigmentation. The observers also scored the pigmentation categories of the scar (hypopigmention, hyperpigmention or mixed pigmentation). Finally, each observer measured the scar areas with a tristimulus colorimeter (Minolta Chromameter) and a narrow-band simple reflectance meter (DermaSpectrometer). A single observer could reliably carry out measurements of the DermaSpectrometer and the Minolta Chromameter for the evaluation of scar colour (r = 0.72). The vascularisation of scars could also be assessed reliably with a single observer (r = 0.76) whereas for a reliable assessment of pigmentation at least three observers were necessary (r > or = 0.77). The agreement between the observers for the pigmentation categories also turned out to be unacceptably low (k = 0.349). This study shows that an overall evaluation of scar colour with the DermaSpectrometer and the Minolta Chromameter is more reliable than the evaluation of scar colour with observers. Of both instruments for measuring scar colour, we prefer, because of its feasibility, the DermaSpectrometer.
Orofacial Pain during Mastication in People with Dementia: Reliability Testing of the Orofacial Pain Scale for Non-Verbal Individuals.

PubMed

de Vries, Merlijn W; Visscher, Corine; Delwel, Suzanne; van der Steen, Jenny T; Pieper, Marjoleine J C; Scherder, Erik J A; Achterberg, Wilco P; Lobbezoo, Frank

2016-01-01

Objectives. The aim of this study was to establish the reliability of the "chewing" subscale of the OPS-NVI, a novel tool designed to estimate presence and severity of orofacial pain in nonverbal patients. Methods. The OPS-NVI consists of 16 items for observed behavior, classified into four categories and a subjective estimate of pain. Two observers used the OPS-NVI for 237 video clips of people with dementia in Dutch nursing homes during their meal to observe their behavior and to estimate the intensity of orofacial pain. Six weeks later, the same observers rated the video clips a second time. Results. Bottom and ceiling effects for some items were found. This resulted in exclusion of these items from the statistical analyses. The categories which included the remaining items (n = 6) showed reliability varying between fair-to-good and excellent (interobserver reliability, ICC: 0.40-0.47; intraobserver reliability, ICC: 0.40-0.92). Conclusions. The "chewing" subscale of the OPS-NVI showed a fair-to-good to excellent interobserver and intraobserver reliability in this dementia population. This study contributes to the validation process of the OPS-NVI as a whole and stresses the need for further assessment of the reliability of the OPS-NVI with subjects that might already show signs of orofacial pain.
HARBO, a simple computer-aided observation method for recording work postures.

PubMed

Wiktorin, C; Mortimer, M; Ekenvall, L; Kilbom, A; Hjelm, E W

1995-12-01

The aim of the study was to present an observation method focusing on the positions of the hands relative to the body and to evaluate whether this simple observation technique gives a reliable estimate of the total time spent in each of five work postures during one workday. In the first part of the study the interobserver reliability of the observation method was tested with eight blue-collar workers. In the second part the observed time spent with work above the shoulder level was tested in relation to an upper-arm position analyzer, and observed time spent in work below knuckle level was tested in relation to a trunk flexion analyzer, both with 72 blue-collar workers. The interobserver reliability for full-day registrations was high. The intraclass correlation coefficients ranged from 0.99 to 1.00. The observed duration of work with hands above shoulder level correlated well with the measured duration of pronounced arm elevation (> 75 degrees). The product moment correlation coefficient was 0.97. The observed duration of work with hands below knuckle level correlated well with the measured duration of pronounced trunk flexion angles (> 40 degrees). The product moment correlation coefficient was 0.98. The present observation method, designed to make postural observations continuously for several hours, is easy to learn and seems reliable.
Online Studies on Variation in Orthopedic Surgery: Computed Tomography in MPEG4 Versus DICOM Format.

PubMed

Mellema, Jos J; Mallee, Wouter H; Guitton, Thierry G; van Dijk, C Niek; Ring, David; Doornberg, Job N

2017-10-01

The purpose of this study was to compare the observer participation and satisfaction as well as interobserver reliability between two online platforms, Science of Variation Group (SOVG) and Traumaplatform Study Collaborative, for the evaluation of complex tibial plateau fractures using computed tomography in MPEG4 and DICOM format. A total of 143 observers started with the online evaluation of 15 complex tibial plateau fractures via either the SOVG or Traumaplatform Study Collaborative websites using MPEG4 videos or a DICOM viewer, respectively. Observers were asked to indicate the absence or presence of four tibial plateau fracture characteristics and to rate their satisfaction with the evaluation as provided by the respective online platforms. The observer participation rate was significantly higher in the SOVG (MPEG4 video) group compared to that in the Traumaplatform Study Collaborative (DICOM viewer) group (75 and 43%, respectively; P < 0.001). The median observer satisfaction with the online evaluation was seven (range, 0-10) using MPEG4 video compared to six (range, 1-9) using DICOM viewer (P = 0.11). The interobserver reliability for recognition of fracture characteristics in complex tibial plateau fractures was higher for the evaluation using MPEG4 video. In conclusion, observer participation and interobserver reliability for the characterization of tibial plateau fractures was greater with MPEG4 videos than with a standard DICOM viewer, while there was no difference in observer satisfaction. Future reliability studies should account for the method of delivering images.
Utility and Reliability of an App for the System for Observing Play and Recreation in Communities (iSOPARC®)

ERIC Educational Resources Information Center

Santos, Maria P. M.; Rech, Cassiano R.; Alberico, Claudia O.; Fermino, Rogério C.; Rios, Ana P.; David, João; Reis, Rodrigo S.; Sarmiento, Olga L.; McKenzie, Thomas L.; Mota, Jorge

2016-01-01

The app for the System for Observing Play and Recreation in Communities (iSOPARC®) was developed to enhance System for Observing Play and Recreation in Communities data collection and management. The study aim was to examine the usability and inter-rater reliability of iSOPARC®. Trained observers collected data in 16 park areas in two Latin…
Systematic review of methods for quantifying teamwork in the operating theatre

PubMed Central

Marshall, D.; Sykes, M.; McCulloch, P.; Shalhoub, J.; Maruthappu, M.

2018-01-01

Background Teamwork in the operating theatre is becoming increasingly recognized as a major factor in clinical outcomes. Many tools have been developed to measure teamwork. Most fall into two categories: self‐assessment by theatre staff and assessment by observers. A critical and comparative analysis of the validity and reliability of these tools is lacking. Methods MEDLINE and Embase databases were searched following PRISMA guidelines. Content validity was assessed using measurements of inter‐rater agreement, predictive validity and multisite reliability, and interobserver reliability using statistical measures of inter‐rater agreement and reliability. Quantitative meta‐analysis was deemed unsuitable. Results Forty‐eight articles were selected for final inclusion; self‐assessment tools were used in 18 and observational tools in 28, and there were two qualitative studies. Self‐assessment of teamwork by profession varied with the profession of the assessor. The most robust self‐assessment tool was the Safety Attitudes Questionnaire (SAQ), although this failed to demonstrate multisite reliability. The most robust observational tool was the Non‐Technical Skills (NOTECHS) system, which demonstrated both test–retest reliability (P > 0·09) and interobserver reliability (Rwg = 0·96). Conclusion Self‐assessment of teamwork by the theatre team was influenced by professional differences. Observational tools, when used by trained observers, circumvented this.
The Reliability and Validity of the Thin Slice Technique: Observational Research on Video Recorded Medical Interactions

ERIC Educational Resources Information Center

Foster, Tanina S.

2014-01-01

Introduction: Observational research using the thin slice technique has been routinely incorporated in observational research methods, however there is limited evidence supporting use of this technique compared to full interaction coding. The purpose of this study was to determine if this technique could be reliability coded, if ratings are…
Being Reliable: Issues in Determining the Reliability and Making Sense of Observations of Adults with Congenital Deafblindness?

ERIC Educational Resources Information Center

Prain, M. I.; McVilly, K. R.; Ramcharan, P.

2012-01-01

Background: Most research into interactions with people who are congenitally deafblind involves observational data. In order for practitioners and researchers to have confidence in the findings of observational studies, researchers need to demonstrate that the processes employed are replicable and trustworthy. This paper draws on data from an…

Reliability of movement control tests in the lumbar spine

PubMed Central

Luomajoki, Hannu; Kool, Jan; de Bruin, Eling D; Airaksinen, Olavi

2007-01-01

Background Movement control dysfunction [MCD] reduces active control of movements. Patients with MCD might form an important subgroup among patients with non specific low back pain. The diagnosis is based on the observation of active movements. Although widely used clinically, only a few studies have been performed to determine the test reliability. The aim of this study was to determine the inter- and intra-observer reliability of movement control dysfunction tests of the lumbar spine. Methods We videoed patients performing a standardized test battery consisting of 10 active movement tests for motor control in 27 patients with non specific low back pain and 13 patients with other diagnoses but without back pain. Four physiotherapists independently rated test performances as correct or incorrect per observation, blinded to all other patient information and to each other. The study was conducted in a private physiotherapy outpatient practice in Reinach, Switzerland. Kappa coefficients, percentage agreements and confidence intervals for inter- and intra-rater results were calculated. Results The kappa values for inter-tester reliability ranged between 0.24 – 0.71. Six tests out of ten showed a substantial reliability [k > 0.6]. Intra-tester reliability was between 0.51 – 0.96, all tests but one showed substantial reliability [k > 0.6]. Conclusion Physiotherapists were able to reliably rate most of the tests in this series of motor control tasks as being performed correctly or not, by viewing films of patients with and without back pain performing the task. PMID:17850669
Validity and reliability of the Paprosky acetabular defect classification.

PubMed

Yu, Raymond; Hofstaetter, Jochen G; Sullivan, Thomas; Costi, Kerry; Howie, Donald W; Solomon, Lucian B

2013-07-01

The Paprosky acetabular defect classification is widely used but has not been appropriately validated. Reliability of the Paprosky system has not been evaluated in combination with standardized techniques of measurement and scoring. This study evaluated the reliability, teachability, and validity of the Paprosky acetabular defect classification. Preoperative radiographs from a random sample of 83 patients undergoing 85 acetabular revisions were classified by four observers, and their classifications were compared with quantitative intraoperative measurements. Teachability of the classification scheme was tested by dividing the four observers into two groups. The observers in Group 1 underwent three teaching sessions; those in Group 2 underwent one session and the influence of teaching on the accuracy of their classifications was ascertained. Radiographic evaluation showed statistically significant relationships with intraoperative measurements of anterior, medial, and superior acetabular defect sizes. Interobserver reliability improved substantially after teaching and did not improve without it. The weighted kappa coefficient went from 0.56 at Occasion 1 to 0.79 after three teaching sessions in Group 1 observers, and from 0.49 to 0.65 after one teaching session in Group 2 observers. The Paprosky system is valid and shows good reliability when combined with standardized definitions of radiographic landmarks and a structured analysis. Level II, diagnostic study. See the Guidelines for Authors for a complete description of levels of evidence.
Reliability and validity of the symptoms of major depressive illness.

PubMed

Mazure, C; Nelson, J C; Price, L H

1986-05-01

In two consecutive studies, we examined the interrater reliability and then the concurrent validity of interview ratings for individual symptoms of major depressive illness. The concurrent validity of symptoms was determined by assessing the degree to which symptoms observed or reported during an interview were observed in daily behavior. Results indicated that most signs and symptoms of major depression and melancholia can be reliably rated by clinicians during a semistructured interview. Ratings of observable symptoms (signs) assessed during the interview were valid indicators of dysfunction observed in daily behavior. Several but not all ratings based on patient report of symptoms were at variance with observation. These discordant patient-reported symptoms may have value as subjective reports but were not accurate descriptions of observed dysfunction.
Effect of individual shades on reliability and validity of observers in colour matching.

PubMed

Lagouvardos, P E; Diamanti, H; Polyzois, G

2004-06-01

The effect of individual shades in shade guides, on the reliability and validity of measurements in a colour matching process is very important. Observer's agreement on shades and sensitivity/specificity of shades, can give us an estimate of shade's effect on observer's reliability and validity. In the present study, a group of 16 students, matched 15 shades of a Kulzer's guide and 10 human incisors to Kulzer's and/or Vita's shade tabs, in 4 different tests. The results showed shades I, B10, C40, A35 and A10 were those with the highest reliability and validity values. In conclusion, a) the matching process with shades of different materials was not accurate enough, b) some shades produce a more reliable and valid match than others and c) teeth are matched with relative difficulty.
Radiologic analysis of hindfoot alignment: Comparison of Méary, long axial, and hindfoot alignment views.

PubMed

Neri, T; Barthelemy, R; Tourné, Y

2017-12-01

Among radiographic views available for assessing hindfoot alignment, the antero-posterior weight-bearing view with metal cerclage of the hindfoot (Méary view) is the most widely used in France. Internationally, the long axial view (LAV) and hindfoot alignment view (HAV) are used also. The objective of this study was to compare the reliability of these three views. The Méary view with cerclage of the hindfoot is as reliable as the LAV and HAV for assessing hindfoot alignment. All three views were obtained in each of 22 prospectively included patients. Intra-observer and inter-observer reliabilities were assessed by having two observers collect the radiographic measurements then computing the intra-class correlation coefficients (ICCs). The intra-observer and inter-observer ICCs were 0.956 and 0.988 with the Méary view, 0.990 and 0.765 with the HAV, and 0.997 and 0.991 with the LAV, respectively. Correlations were far stronger between the LAV and HAV than between each of these and the Méary view. Compared to the LAV and HAV, the Méary view indicated a greater degree of hindfoot valgus. Intra-observer reliability was excellent with both the LAV and HAV, whereas inter-observer reliability was better with the LAV. Excellent reliability was also obtained with the Méary view. Combining the Méary view to obtain a radiographic image of the clinical deformity with the LAV to measure the angular deviation of the hindfoot axis may be useful when assessing hindfoot malalignment. A comparison of the three views in a larger population is needed before clinical recommendations can be made. II, prospective study. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Validity and inter-observer reliability of subjective hand-arm vibration assessments.

PubMed

Coenen, Pieter; Formanoy, Margriet; Douwes, Marjolein; Bosch, Tim; de Kraker, Heleen

2014-07-01

Exposure to mechanical vibrations at work (e.g., due to handling powered tools) is a potential occupational risk as it may cause upper extremity complaints. However, reliable and valid assessment methods for vibration exposure at work are lacking. Measuring hand-arm vibration objectively is often difficult and expensive, while often used information provided by manufacturers lacks detail. Therefore, a subjective hand-arm vibration assessment method was tested on validity and inter-observer reliability. In an experimental protocol, sixteen tasks handling powered tools were executed by two workers. Hand-arm vibration was assessed subjectively by 16 observers according to the proposed subjective assessment method. As a gold standard reference, hand-arm vibration was measured objectively using a vibration measurement device. Weighted κ's were calculated to assess validity, intra-class-correlation coefficients (ICCs) were calculated to assess inter-observer reliability. Inter-observer reliability of the subjective assessments depicting the agreement among observers can be expressed by an ICC of 0.708 (0.511-0.873). The validity of the subjective assessments as compared to the gold-standard reference can be expressed by a weighted κ of 0.535 (0.285-0.785). Besides, the percentage of exact agreement of the subjective assessment compared to the objective measurement was relatively low (i.e., 52% of all tasks). This study shows that subjectively assessed hand-arm vibrations are fairly reliable among observers and moderately valid. This assessment method is a first attempt to use subjective risk assessments of hand-arm vibration. Although, this assessment method can benefit from some future improvement, it can be of use in future studies and in field-based ergonomic assessments. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Inter-observer reliability of animal-based welfare indicators included in the Animal Welfare Indicators welfare assessment protocol for dairy goats.

PubMed

Vieira, A; Battini, M; Can, E; Mattiello, S; Stilwell, G

2018-01-08

This study was conducted within the context of the Animal Welfare Indicators (AWIN) project and the underlying scientific motivation for the development of the study was the scarcity of data regarding inter-observer reliability (IOR) of welfare indicators, particularly given the importance of reliability as a further step for developing on-farm welfare assessment protocols. The objective of this study is therefore to evaluate IOR of animal-based indicators (at group and individual-level) of the AWIN welfare assessment protocol (prototype) for dairy goats. In the design of the study, two pairs of observers, one in Portugal and another in Italy, visited 10 farms each and applied the AWIN prototype protocol. Farms in both countries were visited between January and March 2014, and all the observers received the same training before the farm visits were initiated. Data collected during farm visits, and analysed in this study, include group-level and individual-level observations. The results of our study allow us to conclude that most of the group-level indicators presented the highest IOR level ('substantial', 0.85 to 0.99) in both field studies, pointing to a usable set of animal-based welfare indicators that were therefore included in the first level of the final AWIN welfare assessment protocol for dairy goats. Inter-observer reliability of individual-level indicators was lower, but the majority of them still reached 'fair to good' (0.41 to 0.75) and 'excellent' (0.76 to 1) levels. In the paper we explore reasons for the differences found in IOR between the group and individual-level indicators, including how the number of individual-level indicators to be assessed on each animal and the restraining method may have affected the results. Furthermore, we discuss the differences found in the IOR of individual-level indicators in both countries: the Portuguese pair of observers reached a higher level of IOR, when compared with the Italian observers. We argue how the reasons behind these differences may stem from the restraining method applied, or the different background and experience of the observers. Finally, the discussion of the results emphasizes the importance of considering that reliability is not an absolute attribute of an indicator, but derives from an interaction between the indicators, the observers and the situation in which the assessment is taking place. This highlights the importance of further considering the indicators' reliability while developing welfare assessment protocols.
Inter-rater reliability for movement pattern analysis (MPA): measuring patterning of behaviors versus discrete behavior counts as indicators of decision-making style

PubMed Central

Connors, Brenda L.; Rende, Richard; Colton, Timothy J.

2014-01-01

The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic – the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts – and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from movement pattern analysis (MPA), an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective), inter-rater reliability for patterning (proportional indicators of each factor) was significantly higher and excellent (ICC = 0.89). Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring patterning versus discrete behavioral counts of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns. PMID:24999336
Inter-rater reliability for movement pattern analysis (MPA): measuring patterning of behaviors versus discrete behavior counts as indicators of decision-making style.

PubMed

Connors, Brenda L; Rende, Richard; Colton, Timothy J

2014-01-01

The unique yield of collecting observational data on human movement has received increasing attention in a number of domains, including the study of decision-making style. As such, interest has grown in the nuances of core methodological issues, including the best ways of assessing inter-rater reliability. In this paper we focus on one key topic - the distinction between establishing reliability for the patterning of behaviors as opposed to the computation of raw counts - and suggest that reliability for each be compared empirically rather than determined a priori. We illustrate by assessing inter-rater reliability for key outcome measures derived from movement pattern analysis (MPA), an observational methodology that records body movements as indicators of decision-making style with demonstrated predictive validity. While reliability ranged from moderate to good for raw counts of behaviors reflecting each of two Overall Factors generated within MPA (Assertion and Perspective), inter-rater reliability for patterning (proportional indicators of each factor) was significantly higher and excellent (ICC = 0.89). Furthermore, patterning, as compared to raw counts, provided better prediction of observable decision-making process assessed in the laboratory. These analyses support the utility of using an empirical approach to inform the consideration of measuring patterning versus discrete behavioral counts of behaviors when determining inter-rater reliability of observable behavior. They also speak to the substantial reliability that may be achieved via application of theoretically grounded observational systems such as MPA that reveal thinking and action motivations via visible movement patterns.
A Reliability Generalization Meta-Analysis of Coefficient Alpha for the Maslach Burnout Inventory

ERIC Educational Resources Information Center

Wheeler, Denna L.; Vassar, Matt; Worley, Jody A.; Barnes, Laura L. B.

2011-01-01

The purpose of this study was to synthesize internal consistency reliability for the subscale scores on the Maslach Burnout Inventory (MBI). The authors addressed three research questions: (a) What is the mean subscale score reliability for the MBI across studies? (b) What factors are associated with observed variance in MBI subscale score…
Reliability of Total Test Scores When Considered as Ordinal Measurements

ERIC Educational Resources Information Center

Biswas, Ajoy Kumar

2006-01-01

This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…
Factors Influencing the Reliability of the Glasgow Coma Scale: A Systematic Review.

PubMed

Reith, Florence Cm; Synnot, Anneliese; van den Brande, Ruben; Gruen, Russell L; Maas, Andrew Ir

2017-06-01

The Glasgow Coma Scale (GCS) characterizes patients with diminished consciousness. In a recent systematic review, we found overall adequate reliability across different clinical settings, but reliability estimates varied considerably between studies, and methodological quality of studies was overall poor. Identifying and understanding factors that can affect its reliability is important, in order to promote high standards for clinical use of the GCS. The aim of this systematic review was to identify factors that influence reliability and to provide an evidence base for promoting consistent and reliable application of the GCS. A comprehensive literature search was undertaken in MEDLINE, EMBASE, and CINAHL from 1974 to July 2016. Studies assessing the reliability of the GCS in adults or describing any factor that influences reliability were included. Two reviewers independently screened citations, selected full texts, and undertook data extraction and critical appraisal. Methodological quality of studies was evaluated with the consensus-based standards for the selection of health measurement instruments checklist. Data were synthesized narratively and presented in tables. Forty-one studies were included for analysis. Factors identified that may influence reliability are education and training, the level of consciousness, and type of stimuli used. Conflicting results were found for experience of the observer, the pathology causing the reduced consciousness, and intubation/sedation. No clear influence was found for the professional background of observers. Reliability of the GCS is influenced by multiple factors and as such is context dependent. This review points to the potential for improvement from training and education and standardization of assessment methods, for which recommendations are presented. Copyright © 2017 by the Congress of Neurological Surgeons.
Assessing Peer Entry and Play in Preschoolers at Risk for Maladjustment

ERIC Educational Resources Information Center

Brotman, Laurie Miller; Gouley, Kathleen Kiely; Chesir-Teran, Daniel

2005-01-01

This study evaluated the psychometric properties of an observational rating system for assessing preschoolers' peer entry and play skills: Observed Peer Play in Unfamiliar Settings (OPPUS). Participants were 84 preschoolers at risk for psychopathology. Reliability and concurrent validity are reported. The 30-min paradigm yielded reliable indexes…
Identifying and classifying hyperostosis frontalis interna via computerized tomography.

PubMed

May, Hila; Peled, Nathan; Dar, Gali; Hay, Ori; Abbas, Janan; Masharawi, Youssef; Hershkovitz, Israel

2010-12-01

The aim of this study was to recognize the radiological characteristics of hyperostosis frontalis interna (HFI) and to establish a valid and reliable method for its identification and classification. A reliability test was carried out on 27 individuals who had undergone a head computerized tomography (CT) scan. Intra-observer reliability was obtained by examining the images three times, by the same researcher, with a 2-week interval between each sample ranking. The inter-observer test was performed by three independent researchers. A validity test was carried out using two methods for identifying and classifying HFI: 46 cadaver skullcaps were ranked twice via computerized tomography scans and then by direct observation. Reliability and validity were calculated using Kappa test (SPSS 15.0). Reliability tests of ranking HFI via CT scans demonstrated good results (K > 0.7). As for validity, a very good consensus was obtained between the CT and direct observation, when moderate and advanced types of HFI were present (K = 0.82). The suggested classification method for HFI, using CT, demonstrated a sensitivity of 84%, specificity of 90.5%, and positive predictive value of 91.3%. In conclusion, volume rendering is a reliable and valid tool for identifying HFI. The suggested three-scale classification is most suitable for radiological diagnosis of the phenomena. Considering the increasing awareness of HFI as an early indicator of a developing malady, this study may assist radiologists in identifying and classifying the phenomena.
Cultivating cohort studies for observational translational research.

PubMed

Ransohoff, David F

2013-04-01

"Discovery" research about molecular markers for diagnosis, prognosis, or prediction of response to therapy has frequently produced results that were not reproducible in subsequent studies. What are the reasons, and can observational cohorts be cultivated to provide strong and reliable answers to those questions? Experimental Selected examples are used to illustrate: (i) what features of research design provide strength and reliability in observational studies about markers of diagnosis, prognosis, and response to therapy? (ii) How can those design features be cultivated in existing observational cohorts, for example, within randomized controlled clinical trial (RCT), other existing observational research studies, or practice settings like health maintenance organization (HMOs)? Examples include a study of RNA expression profiles of tumor tissue to predict prognosis of breast cancer, a study of serum proteomics profiles to diagnose ovarian cancer, and a study of stool-based DNA assays to screen for colon cancer. Strengths and weaknesses of observational study design features are discussed, along with lessons about how features that help assure strength might be "cultivated" in the future. By considering these examples and others, it may be possible to develop a process of "cultivating cohorts" in ongoing RCTs, observational cohort studies, and practice settings like HMOs that have strong features of study design. Such an effort could produce sources of data and specimens to reliably answer questions about the use of molecular markers in diagnosis, prognosis, and response to therapy.
Scoring haemophilic arthropathy on X-rays: improving inter- and intra-observer reliability and agreement using a consensus atlas.

PubMed

Foppen, Wouter; van der Schaaf, Irene C; Beek, Frederik J A; Verkooijen, Helena M; Fischer, Kathelijn

2016-06-01

The radiological Pettersson score (PS) is widely applied for classification of arthropathy to evaluate costly haemophilia treatment. This study aims to assess and improve inter- and intra-observer reliability and agreement of the PS. Two series of X-rays (bilateral elbows, knees, and ankles) of 10 haemophilia patients (120 joints) with haemophilic arthropathy were scored by three observers according to the PS (maximum score 13/joint). Subsequently, (dis-)agreement in scoring was discussed until consensus. Example images were collected in an atlas. Thereafter, second series of 120 joints were scored using the atlas. One observer rescored the second series after three months. Reliability was assessed by intraclass correlation coefficients (ICC), agreement by limits of agreement (LoA). Median Pettersson score at joint level (PSjoint) of affected joints was 6 (interquartile range 3-9). Using the consensus atlas, inter-observer reliability of the PSjoint improved significantly from 0.94 (95 % confidence interval (CI) 0.91-0.96) to 0.97 (CI 0.96-0.98). LoA improved from ±1.7 to ±1.1 for the PSjoint. Therefore, true differences in arthropathy were differences in the PSjoint of >2 points. Intra-observer reliability of the PSjoint was 0.98 (CI 0.97-0.98), intra-observer LoA were ±0.9 points. Reliability and agreement of the PS improved by using a consensus atlas. • Reliability of the Pettersson score significantly improved using the consensus atlas. • The presented consensus atlas improved the agreement among observers. • The consensus atlas could be recommended to obtain a reproducible Pettersson score.
The development of a reliable amateur boxing performance analysis template.

PubMed

Thomson, Edward; Lamb, Kevin; Nicholas, Ceri

2013-01-01

The aim of this study was to devise a valid performance analysis system for the assessment of the movement characteristics associated with competitive amateur boxing and assess its reliability using analysts of varying experience of the sport and performance analysis. Key performance indicators to characterise the demands of an amateur contest (offensive, defensive and feinting) were developed and notated using a computerised notational analysis system. Data were subjected to intra- and inter-observer reliability assessment using median sign tests and calculating the proportion of agreement within predetermined limits of error. For all performance indicators, intra-observer reliability revealed non-significant differences between observations (P > 0.05) and high agreement was established (80-100%) regardless of whether exact or the reference value of ±1 was applied. Inter-observer reliability was less impressive for both analysts (amateur boxer and experienced analyst), with the proportion of agreement ranging from 33-100%. Nonetheless, there was no systematic bias between observations for any indicator (P > 0.05), and the proportion of agreement within the reference range (±1) was 100%. A reliable performance analysis template has been developed for the assessment of amateur boxing performance and is available for use by researchers, coaches and athletes to classify and quantify the movement characteristics of amateur boxing.
First impressions: gait cues drive reliable trait judgements.

PubMed

Thoresen, John C; Vuong, Quoc C; Atkinson, Anthony P

2012-09-01

Personality trait attribution can underpin important social decisions and yet requires little effort; even a brief exposure to a photograph can generate lasting impressions. Body movement is a channel readily available to observers and allows judgements to be made when facial and body appearances are less visible; e.g., from great distances. Across three studies, we assessed the reliability of trait judgements of point-light walkers and identified motion-related visual cues driving observers' judgements. The findings confirm that observers make reliable, albeit inaccurate, trait judgements, and these were linked to a small number of motion components derived from a Principal Component Analysis of the motion data. Parametric manipulation of the motion components linearly affected trait ratings, providing strong evidence that the visual cues captured by these components drive observers' trait judgements. Subsequent analyses suggest that reliability of trait ratings was driven by impressions of emotion, attractiveness and masculinity. Copyright © 2012 Elsevier B.V. All rights reserved.
Observation and Classification of Prehension in Preschool Children: A Reliability Study.

ERIC Educational Resources Information Center

Moss, S. C.; Hogg, J.

1981-01-01

The variety of hand grips of 12 children, most of whom were moderately or severely retarded, were classified in order to begin an analysis of hand function. Test reliability was not as great when items were presented to the children as compared to when children were observed or rated by videotape. (FG)
Developing an Observation Instrument to Support Authentic Independent Reading Time during School in a Data-Driven World

ERIC Educational Resources Information Center

Williams, Lunetta M.; Hall, Katrina W.; Hedrick, Wanda B.; Lamkin, Marcia; Abendroth, Jennifer

2013-01-01

The purpose of the present study was to develop an instrument to measure reading during in-school independent reading (ISIR). Procedures to establish validity and reliability of the instrument included videotaping and observing students during ISIR, gathering feedback from literacy experts, establishing interrater reliability, crosschecking…

A Practical Solution to Optimizing the Reliability of Teaching Observation Measures under Budget Constraints

ERIC Educational Resources Information Center

Meyer, J. Patrick; Liu, Xiang; Mashburn, Andrew J.

2014-01-01

Researchers often use generalizability theory to estimate relative error variance and reliability in teaching observation measures. They also use it to plan future studies and design the best possible measurement procedures. However, designing the best possible measurement procedure comes at a cost, and researchers must stay within their budget…
Reliability of visual and instrumental color matching.

PubMed

Igiel, Christopher; Lehmann, Karl Martin; Ghinea, Razvan; Weyhrauch, Michael; Hangx, Ysbrand; Scheller, Herbert; Paravina, Rade D

2017-09-01

The aim of this investigation was to evaluate intra-rater and inter-rater reliability of visual and instrumental shade matching. Forty individuals with normal color perception participated in this study. The right maxillary central incisor of a teaching model was prepared and restored with 10 feldspathic all-ceramic crowns of different shades. A shade matching session consisted of the observer (rater) visually selecting the best match by using VITA classical A1-D4 (VC) and VITA Toothguide 3D Master (3D) shade guides and the VITA Easyshade Advance intraoral spectrophotometer (ES) to obtain both VC and 3D matches. Three shade matching sessions were held with 4 to 6 weeks between sessions. Intra-rater reliability was assessed based on the percentage of agreement for the three sessions for the same observer, whereas the inter-rater reliability was calculated as mean percentage of agreement between different observers. The Fleiss' Kappa statistical analysis was used to evaluate visual inter-rater reliability. The mean intra-rater reliability for the visual shade selection was 64(11) for VC and 48(10) for 3D. The corresponding ES values were 96(4) for both VC and 3D. The percentages of observers who matched the same shade with VC and 3D were 55(10) and 43(12), respectively, while corresponding ES values were 88(8) for VC and 92(4) for 3D. The results for visual shade matching exhibited a high to moderate level of inconsistency for both intra-rater and inter-rater comparisons. The VITA Easyshade Advance intraoral spectrophotometer exhibited significantly better reliability compared with visual shade selection. This study evaluates the ability of observers to consistently match the same shade visually and with a dental spectrophotometer in different sessions. The intra-rater and inter-rater reliability (agreement of repeated shade matching) of visual and instrumental tooth color matching strongly suggest the use of color matching instruments as a supplementary tool in everyday dental practice to enhance the esthetic outcome. © 2017 Wiley Periodicals, Inc.
Development of a Peer Teaching-Assessment Program and a Peer Observation and Evaluation Tool

PubMed Central

Trujillo, Jennifer M.; Barr, Judith; Gonyeau, Michael; Van Amburgh, Jenny A.; Matthews, S. James; Qualters, Donna

2008-01-01

Objectives To develop a formalized, comprehensive, peer-driven teaching assessment program and a valid and reliable assessment tool. Methods A volunteer taskforce was formed and a peer-assessment program was developed using a multistep, sequential approach and the Peer Observation and Evaluation Tool (POET). A pilot study was conducted to evaluate the efficiency and practicality of the process and to establish interrater reliability of the tool. Intra-class correlation coefficients (ICC) were calculated. Results ICCs for 8 separate lectures evaluated by 2-3 observers ranged from 0.66 to 0.97, indicating good interrater reliability of the tool. Conclusion Our peer assessment program for large classroom teaching, which includes a valid and reliable evaluation tool, is comprehensive, feasible, and can be adopted by other schools of pharmacy. PMID:19325963
A Study on the Reliability of Sasang Constitutional Body Trunk Measurement

PubMed Central

Jang, Eunsu; Kim, Jong Yeol; Lee, Haejung; Kim, Honggie; Baek, Younghwa; Lee, Siwoo

2012-01-01

Objective. Body trunk measurement for human plays an important diagnostic role not only in conventional medicine but also in Sasang constitutional medicine (SCM). The Sasang constitutional body trunk measurement (SCBTM) consists of the 5-widths and the 8-circumferences which are standard locations currently employed in the SCM society. This study suggests to what extent a comprehensive training can improve the reliability of the SCBTM. Methods. We recruited 10 male subjects and 5 male observers with no experience of anthropometric measurement. We conducted measurements twice before and after a comprehensive training. Relative technical error of measurement (%TEMs) was produced to assess intra and inter observer reliabilities. Results. Post-training intra-observer %TEMs of the SCBTM were 0.27% to 1.85% reduced from 0.27% to 6.26% in pre-training, respectively. Post-training inter-observer %TEMs of those were 0.56% to 1.66% reduced from 1.00% to 9.60% in pre-training, respectively. Post-training % total TEMs which represent the whole reliability were 0.68% to 2.18% reduced from maximum value of 10.18%. Conclusion. A comprehensive training makes the SCBTM more reliable, hence giving a sufficiently confident diagnostic tool. It is strongly recommended to give a comprehensive training in advance to take the SCBTM. PMID:21822442
Development and Reliability Testing of a Fast-Food Restaurant Observation Form.

PubMed

Rimkus, Leah; Ohri-Vachaspati, Punam; Powell, Lisa M; Zenk, Shannon N; Quinn, Christopher M; Barker, Dianne C; Pugach, Oksana; Resnick, Elissa A; Chaloupka, Frank J

2015-01-01

To develop a reliable observational data collection instrument to measure characteristics of the fast-food restaurant environment likely to influence consumer behaviors, including product availability, pricing, and promotion. The study used observational data collection. Restaurants were in the Chicago Metropolitan Statistical Area. A total of 131 chain fast-food restaurant outlets were included. Interrater reliability was measured for product availability, pricing, and promotion measures on a fast-food restaurant observational data collection instrument. Analysis was done with Cohen's κ coefficient and proportion of overall agreement for categorical variables and intraclass correlation coefficient (ICC) for continuous variables. Interrater reliability, as measured by average κ coefficient, was .79 for menu characteristics, .84 for kids' menu characteristics, .92 for food availability and sizes, .85 for beverage availability and sizes, .78 for measures on the availability of nutrition information,.75 for characteristics of exterior advertisements, and .62 and .90 for exterior and interior characteristics measures, respectively. For continuous measures, average ICC was .88 for food pricing measures, .83 for beverage prices, and .65 for counts of exterior advertisements. Over 85% of measures demonstrated substantial or almost perfect agreement. Although some measures required revision or protocol clarification, results from this study suggest that the instrument may be used to reliably measure the fast-food restaurant environment.
Reliability and validity of the Pragmatics Observational Measure (POM): a new observational measure of pragmatic language for children.

PubMed

Cordier, Reinie; Munro, Natalie; Wilkes-Gillan, Sarah; Speyer, Renée; Pearce, Wendy M

2014-07-01

There is a need for a reliable and valid assessment of childhood pragmatic language skills during peer-peer interactions. This study aimed to evaluate the psychometric properties of a newly developed pragmatic assessment, the Pragmatic Observational Measure (POM). The psychometric properties of the POM were investigated from observational data of two studies - study 1 involved 342 children aged 5-11 years (108 children with ADHD; 108 typically developing playmates; 126 children in the control group), and study 2 involved 9 children with ADHD who attended a 7-week play-based intervention. The psychometric properties of the POM were determined based on the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) taxonomy of psychometric properties and definitions for health-related outcomes; the Pragmatic Protocol was used as the reference tool against which the POM was evaluated. The POM demonstrated sound psychometric properties in all the reliability, validity and interpretability criteria against which it was assessed. The findings showed that the POM is a reliable and valid measure of pragmatic language skills of children with ADHD between the age of 5 and 11 years and has clinical utility in identifying children with pragmatic language difficulty. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Inter-rater reliability of PATH observations for assessment of ergonomic risk factors in hospital work.

PubMed

Park, Jung-Keun; Boyer, Jon; Tessler, Jamie; Casey, Jeffrey; Schemm, Linda; Gore, Rebecca; Punnett, Laura

2009-07-01

This study examined the inter-rater reliability of expert observations of ergonomic risk factors by four analysts. Ten jobs were observed at a hospital using a newly expanded version of the PATH method (Buchholz et al. 1996), to which selected upper extremity exposures had been added. Two of the four raters simultaneously observed each worker onsite for a total of 443 observation pairs containing 18 categorical exposure items each. For most exposure items, kappa coefficients were 0.4 or higher. For some items, agreement was higher both for the jobs with less rapid hand activity and for the analysts with a higher level of ergonomic job analysis experience. These upper extremity exposures could be characterised reliably with real-time observation, given adequate experience and training of the observers. The revised version of PATH is applicable to the analysis of jobs where upper extremity musculoskeletal strain is of concern.
Development and testing of the cancer multidisciplinary team meeting observational tool (MDT-MOT)

PubMed Central

Harris, Jenny; Taylor, Cath; Sevdalis, Nick; Jalil, Rozh; Green, James S.A.

2016-01-01

Abstract Objective To develop a tool for independent observational assessment of cancer multidisciplinary team meetings (MDMs), and test criterion validity, inter-rater reliability/agreement and describe performance. Design Clinicians and experts in teamwork used a mixed-methods approach to develop and refine the tool. Study 1 observers rated pre-determined optimal/sub-optimal MDM film excerpts and Study 2 observers independently rated video-recordings of 10 MDMs. Setting Study 2 included 10 cancer MDMs in England. Participants Testing was undertaken by 13 health service staff and a clinical and non-clinical observer. Intervention None. Main Outcome Measures Tool development, validity, reliability/agreement and variability in MDT performance. Results Study 1: Observers were able to discriminate between optimal and sub-optimal MDM performance (P ≤ 0.05). Study 2: Inter-rater reliability was good for 3/10 domains. Percentage of absolute agreement was high (≥80%) for 4/10 domains and percentage agreement within 1 point was high for 9/10 domains. Four MDTs performed well (scored 3+ in at least 8/10 domains), 5 MDTs performed well in 6–7 domains and 1 MDT performed well in only 4 domains. Leadership and chairing of the meeting, the organization and administration of the meeting, and clinical decision-making processes all varied significantly between MDMs (P ≤ 0.01). Conclusions MDT-MOT demonstrated good criterion validity. Agreement between clinical and non-clinical observers (within one point on the scale) was high but this was inconsistent with reliability coefficients and warrants further investigation. If further validated MDT-MOT might provide a useful mechanism for the routine assessment of MDMs by the local workforce to drive improvements in MDT performance. PMID:27084499
Development and testing of the cancer multidisciplinary team meeting observational tool (MDT-MOT).

PubMed

Harris, Jenny; Taylor, Cath; Sevdalis, Nick; Jalil, Rozh; Green, James S A

2016-06-01

To develop a tool for independent observational assessment of cancer multidisciplinary team meetings (MDMs), and test criterion validity, inter-rater reliability/agreement and describe performance. Clinicians and experts in teamwork used a mixed-methods approach to develop and refine the tool. Study 1 observers rated pre-determined optimal/sub-optimal MDM film excerpts and Study 2 observers independently rated video-recordings of 10 MDMs. Study 2 included 10 cancer MDMs in England. Testing was undertaken by 13 health service staff and a clinical and non-clinical observer. None. Tool development, validity, reliability/agreement and variability in MDT performance. Study 1: Observers were able to discriminate between optimal and sub-optimal MDM performance (P ≤ 0.05). Study 2: Inter-rater reliability was good for 3/10 domains. Percentage of absolute agreement was high (≥80%) for 4/10 domains and percentage agreement within 1 point was high for 9/10 domains. Four MDTs performed well (scored 3+ in at least 8/10 domains), 5 MDTs performed well in 6-7 domains and 1 MDT performed well in only 4 domains. Leadership and chairing of the meeting, the organization and administration of the meeting, and clinical decision-making processes all varied significantly between MDMs (P ≤ 0.01). MDT-MOT demonstrated good criterion validity. Agreement between clinical and non-clinical observers (within one point on the scale) was high but this was inconsistent with reliability coefficients and warrants further investigation. If further validated MDT-MOT might provide a useful mechanism for the routine assessment of MDMs by the local workforce to drive improvements in MDT performance. © The Author 2016. Published by Oxford University Press in association with the International Society for Quality in Health Care; all rights reserved.
Reliability of the Cardiff Test of basic life support and automated external defibrillation version 3.1.

PubMed

Whitfield, Richard H; Newcombe, Robert G; Woollard, Malcolm

2003-12-01

The introduction of the European Resuscitation Guidelines (2000) for cardiopulmonary resuscitation (CPR) and automated external defibrillation (AED) prompted the development of an up-to-date and reliable method of assessing the quality of performance of CPR in combination with the use of an AED. The Cardiff Test of basic life support (BLS) and AED version 3.1 was developed to meet this need and uses standardised checklists to retrospectively evaluate performance from analyses of video recordings and data drawn from a laptop computer attached to a training manikin. This paper reports the inter- and intra-observer reliability of this test. Data used to assess reliability were obtained from an investigation of CPR and AED skill acquisition in a lay responder AED training programme. Six observers were recruited to evaluate performance in 33 data sets, repeating their evaluation after a minimum interval of 3 weeks. More than 70% of the 42 variables considered in this study had a kappa score of 0.70 or above for inter-observer reliability or were drawn from computer data and therefore not subject to evaluator variability. 85% of the 42 variables had kappa scores for intra-observer reliability of 0.70 or above or were drawn from computer data. The standard deviations for inter- and intra-observer measures of time to first shock were 11.6 and 7.7 s, respectively. The inter- and intra-observer reliability for the majority of the variables in the Cardiff Test of BLS and AED version 3.1 is satisfactory. However, reliability is less acceptable with respect to shaking when checking for responsiveness, initial check/clearing of the airway, checks for signs of circulation, time to first shock and performance of interventions in the correct sequence. Further research is required to determine if modifications to the method of assessing these variables can increase reliability.
IRR (Inter-Rater Reliability) of a COP (Classroom Observation Protocol)--A Critical Appraisal

ERIC Educational Resources Information Center

Rui, Ning; Feldman, Jill M.

2012-01-01

Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers…
The Effect of Different Cultural Lenses on Reliability and Validity in Observational Data: The Example of Chinese Immigrant Parent-Toddler Dinner Interactions

ERIC Educational Resources Information Center

Wang, Yan Z.; Wiley, Angela R.; Zhou, Xiaobin

2007-01-01

This study used a mixed methodology to investigate reliability, validity, and analysis level with Chinese immigrant observational data. European-American and Chinese coders quantitatively rated 755 minutes of Chinese immigrant parent-toddler dinner interactions on parental sensitivity, intrusiveness, detachment, negative affect, positive affect,…
[Reconsidering evaluation criteria regarding health care research: toward an integrative framework of quantitative and qualitative criteria].

PubMed

Miyata, Hiroaki; Kai, Ichiro

2006-05-01

Debate about the relationship between quantitative and qualitative paradigms is often muddled and confused and the clutter of terms and arguments has resulted in the concepts becoming obscure and unrecognizable. It is therefore very important to reconsider evaluation criteria regarding rigor in social science. As Lincoln & Guba have already compared quantitative paradigms (validity, reliability, neutrality, generalizability) with qualitative paradigms (credibility, dependability, confirmability, transferability), we have discuss use of evaluation criteria based on pragmatic perspective. Validity/Credibility is the paradigm concerned to observational framework, while Reliability/Dependability refer to the range of stability in observations, Neutrality/Confirmability reflect influences between observers and subjects, Generalizability/Transferability have epistemological difference in the way findings are applied. Qualitative studies, however, does not always chose the qualitative paradigms. If we assume the stability to some extent, it is better to use the quantitative paradigm (reliability). Moreover as a quantitative study can not always guarantee a perfect observational framework, with stability in all phases of observations, it is useful to use qualitative paradigms to enhance the rigor in the study.
Training induces scapular dyskinesis in pain-free competitive swimmers: a reliability and observational study.

PubMed

Madsen, Pernille H; Bak, Klaus; Jensen, Susanne; Welter, Ulrik

2011-03-01

Scapular dyskinesis is a major etiological factor in overhead athletes' shoulder problems. Our hypotheses were to evaluate if (1) visual observation of scapular dyskinesis during scaption has substantial interobserver reliability, and (2) scapular dyskinesis may be induced by swim training in pain-free swimmers. A reliability and observational study. Bachelor project at a college institution and at a private sports orthopedic hospital. Seventy-eight competitive swimmers with no history of shoulder pain were included in the study. Fourteen swimmers were evaluated regarding reliability. Inclusion criteria were competitive swimmers with high training volume who previously had no shoulder pain. Observations of scapular dyskinesis (yes/no) during simple scaption. The interobserver reliability of scaption and wall push-up was evaluated in 14 swimmers using kappa analysis. Prevalence of scapular dyskinesis at 4 time intervals during a swim training session. The scaption test resulted in a weighted kappa value of 0.75. Scapular dyskinesis was seen in 29 shoulders (37%) after the first time interval, in another 24 (cumulated prevalence 68%) after one-half of the training session, and in an additional 4 swimmers (cumulated prevalence 73%) after three-quarters of the training session. During the last quarter of the training session, another 7 swimmers had dyskinesis, resulting in a cumulated prevalence of 82%. The prevalence of abnormal scapular kinesis during a normal training session is high in previously pain-free swimmers. The prevalence increases with more training and occurs early during the training session.
Assessment of the intraobserver and interobserver reliability of a communicating vessels volumeter to measure wrist-hand volume.

PubMed

de Carvalho, Rogério Mendonca; Perez, Maria Del Carmen Janerio; Miranda, Fausto

2012-10-01

Traditional volumetry based on Archimedes' principle is the gold standard for the measurement of limb volume, but the routine use of this technique is discouraged because of several disadvantages. The purpose of this study was to evaluate intraobserver and interobserver reliability of direct measurements of wrist-hand volume using a new communicating vessels volumeter based on Pascal's law. A reliability study was conducted. To evaluate the reliability of the communicating vessels volumeter in generating measurements, 30 hands of 15 participants (9 women, 6 men) were measured 3 times each by 3 observers, totaling 270 volumetric results. Measurement time was short (X =3 minutes 42 seconds). The intraclass correlation coefficient (ICC) was .9977 for observer 1 and .9976 for observers 2 and 3. The interobserver ICC was .9998. The standard error of measurement was about 3 mL for all observers; the interobserver result was 1 mL. The interrater coefficient of variance (CV) was 1.15% for the series of 9 measurements collected for each segment; the intrarater CV was 1.20%. Limitations No swollen hands were measured, and measurements were not compared with the gold standard technique. Thus, accuracy of the new volumeter was not determined in this study. A new device has been developed for plethysmography of the extremities, and the results of its use to measure the volume of the wrist-hand segment were reliable in both intraobserver and interobserver analyses.
RELIABILITY AND VALIDITY OF A BIOMECHANICALLY BASED ANALYSIS METHOD FOR THE TENNIS SERVE

PubMed Central

Kibler, W. Ben; Lamborn, Leah; Smith, Belinda J.; English, Tony; Jacobs, Cale; Uhl, Tim L.

2017-01-01

Background An observational tennis serve analysis (OTSA) tool was developed using previously established body positions from three-dimensional kinematic motion analysis studies. These positions, defined as nodes, have been associated with efficient force production and minimal joint loading. However, the tool has yet to be examined scientifically. Purpose The primary purpose of this investigation was to determine the inter-observer reliability for each node between two health care professionals (HCPs) that developed the OTSA, and secondarily to investigate the validity of the OTSA. Methods Two separate studies were performed to meet these objectives. An inter-observer reliability study preceded the validity study by examining 28 videos of players serving. Two HCPs graded each video and scored the presence or absence of obtaining each node. Discriminant validity was determined in 33 tennis players using video taped records of three first serves. Serve mechanics were graded using the OSTA and categorized players into those with good ( ≥ 5) and poor ( ≤ 4) mechanics. Participants performed a series of field tests to evaluate trunk flexibility, lower extremity and trunk power, and dynamic balance. Results The group with good mechanics demonstrated greater backward trunk flexibility (p=0.02), greater rotational power (p=0.02), and higher single leg countermovement jump (p=0.05). Reliability of the OTSA ranged from K = 0.36-1.0, with the majority of all the nodes displaying substantial reliability (K>0.61). Conclusion This study provides HCPs with a valid and reliable field tool used to assess serve mechanics. Physical characteristics of trunk mobility and power appear to discriminate serve mechanics between players. Future intervention studies are needed to determine if improvement in physical function contribute to improved serve mechanics. Level of Evidence 3 PMID:28593098
Measuring the Process and Quality of Informed Consent for Clinical Research: Development and Testing

PubMed Central

Cohn, Elizabeth Gross; Jia, Haomiao; Smith, Winifred Chapman; Erwin, Katherine; Larson, Elaine L.

2013-01-01

Purpose/Objectives To develop and assess the reliability and validity of an observational instrument, the Process and Quality of Informed Consent (P-QIC). Design A pilot study of the psychometrics of a tool designed to measure the quality and process of the informed consent encounter in clinical research. The study used professionally filmed, simulated consent encounters designed to vary in process and quality. Setting A major urban teaching hospital in the northeastern region of the United States. Sample 63 students enrolled in health-related programs participated in psychometric testing, 16 students participated in test-retest reliability, and 5 investigator-participant dyads were observed for the actual consent encounters. Methods For reliability and validity testing, students watched and rated videotaped simulations of four consent encounters intentionally varied in process and content and rated them with the proposed instrument. Test-retest reliability was established by raters watching the videotaped simulations twice. Inter-rater reliability was demonstrated by two simultaneous but independent raters observing an actual consent encounter. Main Research Variables The essential elements of information and communication for informed consent. Findings The initial testing of the P-QIC demonstrated reliable and valid psychometric properties in both the simulated standardized consent encounters and actual consent encounters in the hospital setting. Conclusions The P-QIC is an easy-to-use observational tool that provides a quick assessment of the areas of strength and areas that need improvement in a consent encounter. It can be used in the initial trainings of new investigators or consent administrators and in ongoing programs of improvement for informed consent. Implications for Nursing The development of a validated observational instrument will allow investigators to assess the consent process more accurately and evaluate strategies designed to improve it. PMID:21708532
A novel standardized algorithm using SPECT/CT evaluating unhappy patients after unicondylar knee arthroplasty--a combined analysis of tracer uptake distribution and component position.

PubMed

Suter, Basil; Testa, Enrique; Stämpfli, Patrick; Konala, Praveen; Rasch, Helmut; Friederich, Niklaus F; Hirschmann, Michael T

2015-03-20

The introduction of a standardized SPECT/CT algorithm including a localization scheme, which allows accurate identification of specific patterns and thresholds of SPECT/CT tracer uptake, could lead to a better understanding of the bone remodeling and specific failure modes of unicondylar knee arthroplasty (UKA). The purpose of the present study was to introduce a novel standardized SPECT/CT algorithm for patients after UKA and evaluate its clinical applicability, usefulness and inter- and intra-observer reliability. Tc-HDP-SPECT/CT images of consecutive patients (median age 65, range 48-84 years) with 21 knees after UKA were prospectively evaluated. The tracer activity on SPECT/CT was localized using a specific standardized UKA localization scheme. For tracer uptake analysis (intensity and anatomical distribution pattern) a 3D volumetric quantification method was used. The maximum intensity values were recorded for each anatomical area. In addition, ratios between the respective value in the measured area and the background tracer activity were calculated. The femoral and tibial component position (varus-valgus, flexion-extension, internal and external rotation) was determined in 3D-CT. The inter- and intraobserver reliability of the localization scheme, grading of the tracer activity and component measurements were determined by calculating the intraclass correlation coefficients (ICC). The localization scheme, grading of the tracer activity and component measurements showed high inter- and intra-observer reliabilities for all regions (tibia, femur and patella). For measurement of component position there was strong agreement between the readings of the two observers; the ICC for the orientation of the femoral component was 0.73-1.00 (intra-observer reliability) and 0.91-1.00 (inter-observer reliability). The ICC for the orientation of the tibial component was 0.75-1.00 (intra-observer reliability) and 0.77-1.00 (inter-observer reliability). The SPECT/CT algorithm presented combining the mechanical information on UKA component position, alignment and metabolic data is highly reliable and proved to be a valuable, consistent and useful tool for analysing postoperative knees after UKA. Using this standardized approach in clinical studies might be helpful in establishing the diagnosis in patients with pain after UKA.
Clinical assessment of effusion in knee osteoarthritis—A systematic review

PubMed Central

Maricar, Nasimah; Callaghan, Michael J.; Parkes, Matthew J.; Felson, David T.; O׳Neill, Terence W.

2016-01-01

Objective The aim of this systematic review was to determine the validity and inter- and intra-observer reliability of the assessment of knee joint effusion in osteoarthritis (OA) of the knee. Methods MEDLINE, Web of Knowledge, CINAHL, EMBASE, and AMED were searched from their inception to February 2015. Articles were included according to a priori defined criteria: samples containing participants with knee OA; prospective evaluation of clinical tests and assessments of knee effusion that included reliability, sensitivity, and specificity of these tests. Results A total of 10 publications were reviewed. Eight of these considered reliability and four on validity of clinical assessments against ultrasound effusion. It was not possible to undertake a meta-analysis of reliability or validity because of differences in study designs and the clinical tests. Intra-observer kappa agreement for visible swelling ranged from 0.37 (suprapatellar) to 1.0 (prepatellar); for bulge sign 0.47 and balloon sign 0.37. Inter-observer kappa agreement for visible swelling ranged from −0.02 (prepatellar) to 0.65 (infrapatellar), the balloon sign −0.11 to 0.82, patellar tap −0.02 to 0.75 and bulge sign kappa −0.04 to 0.14 or reliability coefficient 0.97. Reliability and diagnostic accuracy tended to be better in experienced observers. Very few data looked at performance of individual clinical tests with sensitivity ranging 18.2–85.7% and specificity 35.3–93.3%, both higher with larger effusions. Conclusion The majority of unstandardized clinical tests to assess joint effusion in knee OA had relatively low intra- and inter-observer reliability. There is some evidence experience improved reliability and diagnostic accuracy of tests. Currently there is insufficient evidence to recommend any particular test in clinical practice. PMID:26581486
Clinical assessment of effusion in knee osteoarthritis-A systematic review.

PubMed

Maricar, Nasimah; Callaghan, Michael J; Parkes, Matthew J; Felson, David T; O'Neill, Terence W

2016-04-01

The aim of this systematic review was to determine the validity and inter- and intra-observer reliability of the assessment of knee joint effusion in osteoarthritis (OA) of the knee. MEDLINE, Web of Knowledge, CINAHL, EMBASE, and AMED were searched from their inception to February 2015. Articles were included according to a priori defined criteria: samples containing participants with knee OA; prospective evaluation of clinical tests and assessments of knee effusion that included reliability, sensitivity, and specificity of these tests. A total of 10 publications were reviewed. Eight of these considered reliability and four on validity of clinical assessments against ultrasound effusion. It was not possible to undertake a meta-analysis of reliability or validity because of differences in study designs and the clinical tests. Intra-observer kappa agreement for visible swelling ranged from 0.37 (suprapatellar) to 1.0 (prepatellar); for bulge sign 0.47 and balloon sign 0.37. Inter-observer kappa agreement for visible swelling ranged from -0.02 (prepatellar) to 0.65 (infrapatellar), the balloon sign -0.11 to 0.82, patellar tap -0.02 to 0.75 and bulge sign kappa -0.04 to 0.14 or reliability coefficient 0.97. Reliability and diagnostic accuracy tended to be better in experienced observers. Very few data looked at performance of individual clinical tests with sensitivity ranging 18.2-85.7% and specificity 35.3-93.3%, both higher with larger effusions. The majority of unstandardized clinical tests to assess joint effusion in knee OA had relatively low intra- and inter-observer reliability. There is some evidence experience improved reliability and diagnostic accuracy of tests. Currently there is insufficient evidence to recommend any particular test in clinical practice. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

[A systematic social observation tool: methods and results of inter-rater reliability].

PubMed

Freitas, Eulilian Dias de; Camargos, Vitor Passos; Xavier, César Coelho; Caiaffa, Waleska Teixeira; Proietti, Fernando Augusto

2013-10-01

Systematic social observation has been used as a health research methodology for collecting information from the neighborhood physical and social environment. The objectives of this article were to describe the operationalization of direct observation of the physical and social environment in urban areas and to evaluate the instrument's reliability. The systematic social observation instrument was designed to collect information in several domains. A total of 1,306 street segments belonging to 149 different neighborhoods in Belo Horizonte, Minas Gerais, Brazil, were observed. For the reliability study, 149 segments (1 per neighborhood) were re-audited, and Fleiss kappa was used to access inter-rater agreement. Mean agreement was 0.57 (SD = 0.24); 53% had substantial or almost perfect agreement, and 20.4%, moderate agreement. The instrument appears to be appropriate for observing neighborhood characteristics that are not time-dependent, especially urban services, property characterization, pedestrian environment, and security.
Inter- and intra-observer reliability of measurement of pedicle screw breach assessed by postoperative CT scans.

PubMed

Lavelle, William F; Ranade, Ashish; Samdani, Amer F; Gaughan, John P; D'Andrea, Linda P; Betz, Randal R

2014-01-01

Pedicle screws are used increasingly in spine surgery. Concerns of complications associated with screw breach necessitates accurate pedicle screw placement. Postoperative CT imaging helps to detect screw malposition and assess its severity. However, accuracy is dependent on the reading of the CT scans. Inter- and intra-observer variability could affect the reliability of CT scans to assess multiple screw types and sites. The purpose of this study was to assess the reliability of multi-observer analysis of CT scans for determining pedicle screw breach for various screw types and sites in patients with spinal deformity or degenerative pathologies. Axial CT scan images of 23 patients (286 screws) were read by four experienced spine surgeons. Pedicle screw placement was considered 'In' when the screw was fully contained and/or the pedicle wall breach was ≤2 mm. 'Out' was defined as a breach in the medial or lateral pedicle wall >2 mm. Intra-class coefficients (ICC) were calculated to assess the inter- and intra-observer reliability. Marked inter- and intra-observer variability was noticed. The overall inter-observer ICC was 0.45 (95% confidence limits 0.25 to 0.65). The intra-observer ICC was 0.49 (95% confidence limits 0.29 to 0.69). Underlying spinal pathology, screw type, and patient age did not seem to impact the reliability of our CT assessments. Our results indicate the evaluation of pedicle screw breach on CT by a single surgeon is highly variable, and care should be taken when using individual CT evaluations of millimeters of breach as a basis for screw removal. This was a Level III study.
Inter-observer and intra-observer reliability in the radiographic diagnosis of avascular necrosis of the femoral head following reconstructive hip surgery in children with cerebral palsy.

PubMed

Hesketh, Kim; Sankar, Wudbhav; Joseph, Benjamin; Narayanan, Unni; Mulpuri, Kishore

2016-04-01

The incidence of avascular necrosis (AVN) following reconstructive hip surgery in cerebral palsy (CP) ranges from 0 to 69 % in the current literature. The purpose of this study was to determine the inter- and intra-observer reliability of radiographically diagnosing AVN in children with CP after hip surgery. A retrospective review of 65 children with CP who had reconstructive hip surgery between 2009 and 2012 at BC Children's Hospital was completed. Anterior-posterior and lateral radiographs were presented to four pediatric orthopaedic surgeons over two rounds. Surgeons were asked to review the set of unidentified radiographs and comment 'yes' or 'no' for the presence of AVN. Two weeks later the same set of radiographs was sent in a different order and the surgeons were again asked to comment on AVN. Inter- and intra-observer reliability was determined using kappa statistics. The intra-observer reliability ranged from 0.65 to 0.88 with an average score of 0.76. Inter-observer reliability showed greater variability, ranging from 0.41 to 0.77 with an average score of 0.56 across all surgeons. Although the intra-rater reliability produced a strength of "good" and the inter-rater reliability a strength of "moderate" agreement, the variability within these scores is clinically important as it demonstrates the difficulty in identifying AVN. This may explain the variability in AVN that is reported in the literature. The need for further education and research in the diagnosis of AVN in children with CP who have undergone reconstructive hip surgery is clinically necessary.
The Reliability of Classification Decisions for the Furtado-Gallagher Computerized Observational Movement Pattern Assessment System--FG-COMPASS

ERIC Educational Resources Information Center

Furtado, Ovande, Jr.; Gallagher, Jere D.

2012-01-01

Mastery of fundamental movement skills (FMS) is an important factor in preventing weight gain and increasing physical activity. To master FMS, performance evaluation is necessary. In this study, we investigated the reliability of a new observational assessment tool. In Phase I, 110 video clips of children performing five locomotor, and six…
Using Generalizability Theory to Examine Sources of Variance in Observed Behaviors within High School Classrooms

ERIC Educational Resources Information Center

Abry, Tashia; Cash, Anne H.; Bradshaw, Catherine P.

2014-01-01

Generalizability theory (GT) offers a useful framework for estimating the reliability of a measure while accounting for multiple sources of error variance. The purpose of this study was to use GT to examine multiple sources of variance in and the reliability of school-level teacher and high school student behaviors as observed using the tool,…
New definitions of 6 clinical signs of perceptual disorder in children with cerebral palsy: an observational study through reliability measures.

PubMed

Ferrari, A; Sghedoni, A; Alboresi, S; Pedroni, E; Lombardi, F

2014-12-01

Recently authors have begun to emphasize the non-motor aspects of Cerebral Palsy and their influence on motor control and recovery prognosis. Much has been written about single clinical signs (i.e., startle reaction) but so far no definitions of the six perceptual signs presented in this study have appeared in literature. This study defines 6 signs (startle reaction, upper limbs in startle position, frequent eye blinking, posture freezing, averted eye gaze, grimacing) suggestive of perceptual disorders in children with cerebral palsy and measures agreement on sign recognition among independent observers and consistency of opinions over time. Observational study with both cross-sectional and prospective components. Fifty-six videos presented to observers in random order. Videos were taken from 19 children with a bilateral form of cerebral palsy referred to the Children Rehabilitation Unit in Reggio Emilia. Thirty-five rehabilitation professionals from all over Italy: 9 doctors and 26 physiotherapists. Measure of agreement among 35 independent observers was compiled from a sample of 56 videos. Interobserver reliability was determined using the K index of Fleiss and reliability intra-observer was calculated by the Spearman correlation index between ranks (rho - ρ). Percentage of agreement between observers and Gold Standard was used as criterion validity. Interobserver reliability was moderate for startle reaction, upper limb in startle position, adverted eye gaze and eye-blinking and fair for posture freezing and grimacing. Intraobserver reliability remained consistent over time. Criterion validity revealed very high agreement between independent observer evaluation and gold standard. Semiotics of perceptual disorders can be used as a specific and sensitive instrument in order to identify a new class of patients within existing heterogeneous clinical types of bilateral cerebral palsy forms and could help clinicians in identifying functional prognosis. To provide clinicians with a definition of 6 clinical signs found in children with cerebral palsy in routine rehabilitation settings. Future research should explore the link between these signs and motor prognosis (i.e., time to independent walking).
INFLUENCES OF RESPONSE RATE AND DISTRIBUTION ON THE CALCULATION OF INTEROBSERVER RELIABILITY SCORES

PubMed Central

Rolider, Natalie U.; Iwata, Brian A.; Bullock, Christopher E.

2012-01-01

We examined the effects of several variations in response rate on the calculation of total, interval, exact-agreement, and proportional reliability indices. Trained observers recorded computer-generated data that appeared on a computer screen. In Study 1, target responses occurred at low, moderate, and high rates during separate sessions so that reliability results based on the four calculations could be compared across a range of values. Total reliability was uniformly high, interval reliability was spuriously high for high-rate responding, proportional reliability was somewhat lower for high-rate responding, and exact-agreement reliability was the lowest of the measures, especially for high-rate responding. In Study 2, we examined the separate effects of response rate per se, bursting, and end-of-interval responding. Response rate and bursting had little effect on reliability scores; however, the distribution of some responses at the end of intervals decreased interval reliability somewhat, proportional reliability noticeably, and exact-agreement reliability markedly. PMID:23322930
Inter-examiner classification reliability of Mechanical Diagnosis and Therapy for extremity problems - Systematic review.

PubMed

Takasaki, Hiroshi; Okuyama, Kousuke; Rosedale, Richard

2017-02-01

Mechanical Diagnosis and Therapy (MDT) is used in the treatment of extremity problems. Classifying clinical problems is one method of providing effective treatment to a target population. Classification reliability is a key factor to determine the precise clinical problem and to direct an appropriate intervention. To explore inter-examiner reliability of the MDT classification for extremity problems in three reliability designs: 1) vignette reliability using surveys with patient vignettes, 2) concurrent reliability, where multiple assessors decide a classification by observing someone's assessment, 3) successive reliability, where multiple assessors independently assess the same patient at different times. Systematic review with data synthesis in a quantitative format. Agreement of MDT subgroups was examined using the Kappa value, with the operational definition of acceptable reliability set at ≥ 0.6. The level of evidence was determined considering the methodological quality of the studies. Six studies were included and all studies met the criteria for high quality. Kappa values for the vignette reliability design (five studies) were ≥ 0.7. There was data from two cohorts in one study for the concurrent reliability design and the Kappa values ranged from 0.45 to 1.0. Kappa values for the successive reliability design (data from three cohorts in one study) were < 0.6. The current review found strong evidence of acceptable inter-examiner reliability of MDT classification for extremity problems in the vignette reliability design, limited evidence of acceptable reliability in the concurrent reliability design and unacceptable reliability in the successive reliability design. Copyright © 2017 Elsevier Ltd. All rights reserved.
Is computed tomography an accurate and reliable method for measuring total knee arthroplasty component rotation?

PubMed

Figueroa, José; Guarachi, Juan Pablo; Matas, José; Arnander, Magnus; Orrego, Mario

2016-04-01

Computed tomography (CT) is widely used to assess component rotation in patients with poor results after total knee arthroplasty (TKA). The purpose of this study was to simultaneously determine the accuracy and reliability of CT in measuring TKA component rotation. TKA components were implanted in dry-bone models and assigned to two groups. The first group (n = 7) had variable femoral component rotations, and the second group (n = 6) had variable tibial tray rotations. CT images were then used to assess component rotation. Accuracy of CT rotational assessment was determined by mean difference, in degrees, between implanted component rotation and CT-measured rotation. Intraclass correlation coefficient (ICC) was applied to determine intra-observer and inter-observer reliability. Femoral component accuracy showed a mean difference of 2.5° and the tibial tray a mean difference of 3.2°. There was good intra- and inter-observer reliability for both components, with a femoral ICC of 0.8 and 0.76, and tibial ICC of 0.68 and 0.65, respectively. CT rotational assessment accuracy can differ from true component rotation by approximately 3° for each component. It does, however, have good inter- and intra-observer reliability.
Real-Time Observation of Apathy in Long-Term Care Residents With Dementia: Reliability of the Person-Environment Apathy Rating Scale.

PubMed

Jao, Ying-Ling; Mogle, Jacqueline; Williams, Kristine; McDermott, Caroline; Behrens, Liza

2018-04-01

Apathy is prevalent in individuals with dementia. Lack of responsiveness to environmental stimulation is a key characteristic of apathy. The Person-Environment Apathy Rating (PEAR) scale consists of environment and apathy subscales, which allow for examination of environmental impact on apathy. The interrater reliability of the PEAR scale was examined via real-time observation. The current study included 45 observations of 15 long-term care residents with dementia. Each participant was observed at three time points for 10 minutes each. Two raters observed the participant and surrounding environment and independently rated the participant's apathy and environmental stimulation using the PEAR scale. Weighted Kappa was 0.5 to 0.82 for the PEAR-Environment subscale and 0.5 to 0.8 for the PEAR-Apathy subscale. Overall, with the exception of three items with relatively weak reliability (0.50 to 0.56), the PEAR scale showed moderate to strong interrater reliability (0.63 to 0.82). The results support the use of the PEAR scale to measure environmental stimulation and apathy via real-time observation in long-term care residents with dementia. [Journal of Gerontological Nursing, 44(4), 23-28.]. Copyright 2018, SLACK Incorporated.
Observer Use of Standardized Observation Protocols in Consequential Observation Systems

ERIC Educational Resources Information Center

Bell, Courtney A.; Yi, Qi; Jones, Nathan D.; Lewis, Jennifer M.; McLeod, Monica; Liu, Shuangshuang

2014-01-01

Evidence from a handful of large-scale studies suggests that although observers can be trained to score reliably using observation protocols, there are concerns related to initial training and calibration activities designed to keep observers scoring accurately over time (e.g., Bell, et al, 2012; BMGF, 2012). Studies offer little insight into how…
Reliability of plain radiographic parameters for developmental dysplasia of the hip in children.

PubMed

Upasani, Vidyadhar V; Bomar, James D; Parikh, Gaurav; Hosalkar, Harish

2012-07-01

Few studies have evaluated the reliability and reproducibility of the femoral neck-shaft angle (NSA), center-edge angle (CEA), and acetabular index (AI) in young children with developmental dysplasia of the hip (DDH). We wanted to determine whether these parameters could be used reliably by practitioners. Fifty radiographs from 21 children with DDH were reviewed. Analysis was performed by three observers, at two time periods. The intra- and inter-observer reliability for each measure was assessed. At time period one, we noted a "high" level of agreement between observers when measuring the NSA, a "low" level when measuring the CEA, and a "moderate" level when measuring the AI. At time period two, we noted a "very high" level of agreement between observers when measuring the NSA and a "high" level when measuring the CEA and AI. When comparing the measurements of observer 1 at the two different time periods, we noted nearly "very high" agreement when measuring the NSA, a "moderate" agreement when measuring the CEA, and a "high" agreement for the AI. In comparing the measurements of observer 2, we noted "very high" agreement for the NSA and "high" agreement for the CEA and AI. In comparing the measurements for observer 3, we noted nearly "very high" agreement for the NSA, nearly "high" agreement for the CEA, and "high" agreement for the AI. It is difficult to reliably measure three-dimensional pelvic morphology on a frontal plane radiograph, especially when important pelvic landmarks have yet to ossify.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Roach, Mack, E-mail: mroach@radonc.ucsf.edu; Ceron Lizarraga, Tania L.; Lazar, Ann A.

Purpose: The optimal treatment of clinically localized prostate cancer is controversial. Most studies focus on biochemical (PSA) failure when comparing radical prostatectomy (RP) with radiation therapy (RT), but this endpoint has not been validated as predictive of overall survival (OS) or cause-specific survival (CSS). We analyzed the available literature to determine whether reliable conclusions could be made concerning the effectiveness of RP compared with RT with or without androgen deprivation therapy (ADT), assuming current treatment standards. Methods: Articles published between February 29, 2004, and March 1, 2015, that compared OS and CSS after RP or RT with or without ADTmore » were included. Because the GRADE (Grading of Recommendations, Assessment, Development, and Evaluation) system emphasis is on randomized controlled clinical trials, a reliability score (RS) was explored to further understand the issues associated with the study quality of observational studies, including appropriateness of treatment, source of data, clinical characteristics, and comorbidity. Lower RS values indicated lower reliability. Results: Fourteen studies were identified, and 13 were completely evaluable. Thirteen of the 14 studies (93%) were observational studies with low-quality evidence. The median RS was 12 (range, 5-18); the median difference in 10-year OS and CSS favored RP over RT: 10% and 4%, respectively. In studies with a RS ≤12 (average RS 9) the 10-year OS and CSS median differences were 17% and 6%, respectively. For studies with a RS >12 (average RS 15.5), the 10-year OS and CSS median differences were 5.5% and 1%, respectively. Thus, we observed an association between low RS and a higher percentage difference in OS and CSS. Conclusions: Reliable evidence that RP provides a superior CSS to RT with ADT is lacking. The most reliable studies suggest that the differences in 10-year CSS between RP and RT are small, possibly <1%.« less
Reliability and sources of variation of the ABILHAND-Kids questionnaire in children with cerebral palsy.

PubMed

de Jong, Lex D; van Meeteren, Annemiek; Emmelot, Cornelis H; Land, Nanne E; Dijkstra, Pieter U

2018-03-01

To determine reliability of the ABILHAND-Kids, explore sources of variation associated with these measurement results, and generate repeatability coefficients. A reliability study with a repeated measures design was performed in an ambulatory rehabilitation care department from a rehabilitation center, and a center for special education. A physician, an occupational therapist, and parents of 27 children with spastic cerebral palsy independently rated the children's manual capacity when performing 21 standardized tasks of the ABILHAND-Kids from video recordings twice with a three week time interval (27 first-, and 25 second video recordings available). Parents additionally rated their children's performance based on their own perception of their child's ability to perform manual activities in everyday life, resulting in eight ratings per child. ABILHAND-Kids ratings were systematically different between observers, sessions, and rating method. Participant × observer interaction (66%) and residual variance (20%) contributed the most to error variance (9%). Test-retest reliability was 0.92. Repeatability coefficients (between 0.81 and 1.82 logit points) were largest for the parents' performance-based ratings. ABILHAND-Kids scores can be reliably used as a performance- and capacity-based rating method across different raters. Parents' performance-based ratings are less reliable than their capacity-based ratings. Resulting repeatability coefficients can be used to interpret ABILHAND-Kids ratings with more confidence. Implications for Rehabilitation The ABILHAND-Kids is a valuable tool to assess a child's unimanual and bimanual upper limb activities. The reliability of the ABILHANDS-Kids is good across different observers as a performance- and capacity-based rating method. Parents' performance-based ratings are less reliable than their capacity-based ones. This study has generated repeatability coefficients for clinical decision making.
Monkeys and humans take local uncertainty into account when localizing a change.

PubMed

Devkar, Deepna; Wright, Anthony A; Ma, Wei Ji

2017-09-01

Since sensory measurements are noisy, an observer is rarely certain about the identity of a stimulus. In visual perception tasks, observers generally take their uncertainty about a stimulus into account when doing so helps task performance. Whether the same holds in visual working memory tasks is largely unknown. Ten human and two monkey subjects localized a single change in orientation between a sample display containing three ellipses and a test display containing two ellipses. To manipulate uncertainty, we varied the reliability of orientation information by making each ellipse more or less elongated (two levels); reliability was independent across the stimuli. In both species, a variable-precision encoding model equipped with an "uncertainty-indifferent" decision rule, which uses only the noisy memories, fitted the data poorly. In both species, a much better fit was provided by a model in which the observer also takes the levels of reliability-driven uncertainty associated with the memories into account. In particular, a measured change in a low-reliability stimulus was given lower weight than the same change in a high-reliability stimulus. We did not find strong evidence that observers took reliability-independent variations in uncertainty into account. Our results illustrate the importance of studying the decision stage in comparison tasks and provide further evidence for evolutionary continuity of working memory systems between monkeys and humans.
Monkeys and humans take local uncertainty into account when localizing a change

PubMed Central

Devkar, Deepna; Wright, Anthony A.; Ma, Wei Ji

2017-01-01

Since sensory measurements are noisy, an observer is rarely certain about the identity of a stimulus. In visual perception tasks, observers generally take their uncertainty about a stimulus into account when doing so helps task performance. Whether the same holds in visual working memory tasks is largely unknown. Ten human and two monkey subjects localized a single change in orientation between a sample display containing three ellipses and a test display containing two ellipses. To manipulate uncertainty, we varied the reliability of orientation information by making each ellipse more or less elongated (two levels); reliability was independent across the stimuli. In both species, a variable-precision encoding model equipped with an “uncertainty–indifferent” decision rule, which uses only the noisy memories, fitted the data poorly. In both species, a much better fit was provided by a model in which the observer also takes the levels of reliability-driven uncertainty associated with the memories into account. In particular, a measured change in a low-reliability stimulus was given lower weight than the same change in a high-reliability stimulus. We did not find strong evidence that observers took reliability-independent variations in uncertainty into account. Our results illustrate the importance of studying the decision stage in comparison tasks and provide further evidence for evolutionary continuity of working memory systems between monkeys and humans. PMID:28877535
Intra- and Inter-Observer Reliability of the Trunk Impairment Scale for Children with Cerebral Palsy

ERIC Educational Resources Information Center

Saether, Rannei; Jorgensen, Lone

2011-01-01

Standardized scales to evaluate qualities of trunk movements in children with dysfunction are sparse. An examination of the reliability of scales that may be useful in the clinic is important. The aim of this study was to examine the reliability of the Trunk Impairment Scale (TIS) for children with cerebral palsy (CP). Standardized scales are…
Reliability and criterion validity of an observation protocol for working technique assessments in cash register work.

PubMed

Palm, Peter; Josephson, Malin; Mathiassen, Svend Erik; Kjellberg, Katarina

2016-06-01

We evaluated the intra- and inter-observer reliability and criterion validity of an observation protocol, developed in an iterative process involving practicing ergonomists, for assessment of working technique during cash register work for the purpose of preventing upper extremity symptoms. Two ergonomists independently assessed 17 15-min videos of cash register work on two occasions each, as a basis for examining reliability. Criterion validity was assessed by comparing these assessments with meticulous video-based analyses by researchers. Intra-observer reliability was acceptable (i.e. proportional agreement >0.7 and kappa >0.4) for 10/10 questions. Inter-observer reliability was acceptable for only 3/10 questions. An acceptable inter-observer reliability combined with an acceptable criterion validity was obtained only for one working technique aspect, 'Quality of movements'. Thus, major elements of the cashiers' working technique could not be assessed with an acceptable accuracy from short periods of observations by one observer, such as often desired by practitioners. Practitioner Summary: We examined an observation protocol for assessing working technique in cash register work. It was feasible in use, but inter-observer reliability and criterion validity were generally not acceptable when working technique aspects were assessed from short periods of work. We recommend the protocol to be used for educational purposes only.
Examining the accuracy of students' self-reported academic grades from a correlational and a discrepancy perspective: Evidence from a longitudinal study.

PubMed

Sticca, Fabio; Goetz, Thomas; Bieg, Madeleine; Hall, Nathan C; Eberle, Franz; Haag, Ludwig

2017-01-01

The present longitudinal study examined the reliability of self-reported academic grades across three phases in four subject domains for a sample of 916 high-school students. Self-reported grades were found to be highly positively correlated with actual grades in all academic subjects and across grades 9 to 11 underscoring the reliability of self-reported grades as an achievement indicator. Reliability of self-reported grades was found to differ across subject areas (e.g., mathematics self-reports more reliable than language studies), with a slight yet consistent tendency to over-report achievement levels also observed across grade levels and academic subjects. Overall, the absolute value of over- and underreporting was low and these patterns were not found to differ between mathematics and verbal subjects. In sum, study findings demonstrate the consistent predictive utility of students' self-reported achievement across grade levels and subject areas with the observed tendency to over-report academic grades and slight differences between domains nonetheless warranting consideration in future education research.
Examining the accuracy of students’ self-reported academic grades from a correlational and a discrepancy perspective: Evidence from a longitudinal study

PubMed Central

Goetz, Thomas

2017-01-01

The present longitudinal study examined the reliability of self-reported academic grades across three phases in four subject domains for a sample of 916 high-school students. Self-reported grades were found to be highly positively correlated with actual grades in all academic subjects and across grades 9 to 11 underscoring the reliability of self-reported grades as an achievement indicator. Reliability of self-reported grades was found to differ across subject areas (e.g., mathematics self-reports more reliable than language studies), with a slight yet consistent tendency to over-report achievement levels also observed across grade levels and academic subjects. Overall, the absolute value of over- and underreporting was low and these patterns were not found to differ between mathematics and verbal subjects. In sum, study findings demonstrate the consistent predictive utility of students’ self-reported achievement across grade levels and subject areas with the observed tendency to over-report academic grades and slight differences between domains nonetheless warranting consideration in future education research. PMID:29112979

Assessment of Interobserver Reliability in Nutrition Studies that Use Direct Observation of School Meals

PubMed Central

BAGLIO, MICHELLE L.; BAXTER, SUZANNE DOMEL; GUINN, CAROLINE H.; THOMPSON, WILLIAM O.; SHAFFER, NICOLE M.; FRYE, FRANCESCA H. A.

2005-01-01

This article (a) provides a general review of interobserver reliability (IOR) and (b) describes our method for assessing IOR for items and amounts consumed during school meals for a series of studies regarding the accuracy of fourth-grade children's dietary recalls validated with direct observation of school meals. A widely used validation method for dietary assessment is direct observation of meals. Although many studies utilize several people to conduct direct observations, few published studies indicate whether IOR was assessed. Assessment of IOR is necessary to determine that the information collected does not depend on who conducted the observation. Two strengths of our method for assessing IOR are that IOR was assessed regularly throughout the data collection period and that IOR was assessed for foods at the item and amount level instead of at the nutrient level. Adequate agreement among observers is essential to the reasoning behind using observation as a validation tool. Readers are encouraged to question the results of studies that fail to mention and/or to include the results for assessment of IOR when multiple people have conducted observations. PMID:15354155
Psychometric considerations in the measurement of event-related brain potentials: Guidelines for measurement and reporting.

PubMed

Clayson, Peter E; Miller, Gregory A

2017-01-01

Failing to consider psychometric issues related to reliability and validity, differential deficits, and statistical power potentially undermines the conclusions of a study. In research using event-related brain potentials (ERPs), numerous contextual factors (population sampled, task, data recording, analysis pipeline, etc.) can impact the reliability of ERP scores. The present review considers the contextual factors that influence ERP score reliability and the downstream effects that reliability has on statistical analyses. Given the context-dependent nature of ERPs, it is recommended that ERP score reliability be formally assessed on a study-by-study basis. Recommended guidelines for ERP studies include 1) reporting the threshold of acceptable reliability and reliability estimates for observed scores, 2) specifying the approach used to estimate reliability, and 3) justifying how trial-count minima were chosen. A reliability threshold for internal consistency of at least 0.70 is recommended, and a threshold of 0.80 is preferred. The review also advocates the use of generalizability theory for estimating score dependability (the generalizability theory analog to reliability) as an improvement on classical test theory reliability estimates, suggesting that the latter is less well suited to ERP research. To facilitate the calculation and reporting of dependability estimates, an open-source Matlab program, the ERP Reliability Analysis Toolbox, is presented. Copyright © 2016 Elsevier B.V. All rights reserved.
Increasing Reliability of Direct Observation Measurement Approaches in Emotional and/or Behavioral Disorders Research Using Generalizability Theory

ERIC Educational Resources Information Center

Gage, Nicholas A.; Prykanowski, Debra; Hirn, Regina

2014-01-01

Reliability of direct observation outcomes ensures the results are consistent, dependable, and trustworthy. Typically, reliability of direct observation measurement approaches is assessed using interobserver agreement (IOA) and the calculation of observer agreement (e.g., percentage of agreement). However, IOA does not address intraobserver…
A study on the reproducibility of cephalometric landmarks when undertaking a three-dimensional (3D) cephalometric analysis

PubMed Central

Llamas, José M.; Cibrián, Rosa; Gandia, José L.; Paredes, Vanessa

2012-01-01

Objectives: Cone Beam Computerized Tomography (CBCT) allows the possibility of modifying some of the diagnostic tools used in orthodontics, such as cephalometry. The first step must be to study the characteristics of these devices in terms of accuracy and reliability of the most commonly used landmarks. The aims were 1- To assess intra and inter-observer reliability in the location of anatomical landmarks belonging to hard tissues of the skull in images taken with a CBCT device, 2- To determine which of those landmarks are more vs. less reliable and 3- To introduce planes of reference so as to create cephalometric analyses appropriated to the 3D reality. Study design: Fifteen patients who had a CBCT (i-CAT®) as a diagnostic register were selected. To assess the reproducibility on landmark location and the differences in the measurements of two observers at different times, 41 landmarks were defined on the three spatial axes (X,Y,Z) and located. 3.690 measurements were taken and, as each determination has 3 coordinates, 11.070 data were processed with SPSS® statistical package. To discover the reproducibility of the method on landmark location, an ANOVA was undertaken using two variation factors: time (t1, t2 and t3) and observer (Ob1 and Ob2) for each axis (X, Y and Z) and landmark. The order of the CBCT scans submitted to the observers (Ob1, Ob2) at t1, t2, and t3, were different and randomly allocated. Multiple comparisons were undertaken using the Bonferroni test. The intra- and inter-examiner ICC´s were calculated. Results: Intra- and inter-examiner reliability was high, both being ICC ≥ 0.99, with the best frequency on axis Z. Conclusions: The most reliable landmarks were: Nasion, Sella, Basion, left Porion, point A, anterior nasal spine, Pogonion, Gnathion, Menton, frontozygomatic sutures, first lower molars and upper and lower incisors. Those with less reliability were the supraorbitals, right zygion and posterior nasal spine. Key words:Cone Beam Computed Tomography, cephalometry, landmark, orthodontics, reliability. PMID:22322503
Superior Temporal Activation as a Function of Linguistic Knowledge: Insights from Deaf Native Signers Who Speechread

ERIC Educational Resources Information Center

Capek, Cheryl M.; Woll, Bencie; MacSweeney, Mairead; Waters, Dafydd; McGuire, Philip K.; David, Anthony S.; Brammer, Michael J.; Campbell, Ruth

2010-01-01

Studies of spoken and signed language processing reliably show involvement of the posterior superior temporal cortex. This region is also reliably activated by observation of meaningless oral and manual actions. In this study we directly compared the extent to which activation in posterior superior temporal cortex is modulated by linguistic…
Validity and reliability of the Japanese version of the FIM + FAM in patients with cerebrovascular accident.

PubMed

Miki, Emi; Yamane, Shingo; Yamaoka, Mai; Fujii, Hiroe; Ueno, Hiroka; Kawahara, Toshie; Tanaka, Keiko; Tamashiro, Hiroaki; Inoue, Eiji; Okamoto, Takatsugu; Kuriyama, Masaru

2016-09-01

The study aim was to investigate the validity and reliability of the Functional Independence Measure and Functional Assessment Measure (FIM + FAM), which is unfamiliar in Japan, by using its Japanese version (FIM + FAM-j) in patients with cerebrovascular accident (CVA). Forty-two CVA patients participated. Criterion validity was examined by correlating the full scale and subscales of FIM + FAM-j with several well-established measurements using Spearman's correlation coefficient. Reliability was evaluated by internal consistency (tested by Cronbach's alpha coefficient) and intra-rater reliability (tested by Kendall's tau correlation coefficient). Good-to-excellent criterion validity was found between the full scale and motor subscales of the FIM + FAM-j and the Barthel Index, National Institutes of Health Stroke Scale, modified Rankin Scale, and lower extremity Brunnstrom Recovery Stage. High internal consistency was observed within the full-scale FIM + FAM-j and the motor and cognitive subscales (Cronbach's alphas were 0.968, 0.954, and 0.948, respectively). Additionally, good intra-rater reliability was observed within the full scale and motor subscales, and excellent reliability for the cognitive subscales (taus were 0.83, 0.80, and 0.98, respectively). This study showed that the FIM + FAM-j demonstrated acceptable levels of validity and reliability when used for CVA as a measure of disability.
Measures of Reliability in Behavioral Observation: The Advantage of "Real Time" Data Acquisition.

ERIC Educational Resources Information Center

Hollenbeck, Albert R.; Slaby, Ronald G.

Two observers who were using an electronic digital data acquisition system were spot checked for reliability at random times over a four month period. Between-and within-observer reliability was assessed for frequency, duration, and duration-per-event measures of four infant behaviors. The results confirmed the problem of observer drift--the…
Reliability measures of functional magnetic resonance imaging in a longitudinal evaluation of mild cognitive impairment.

PubMed

Zanto, Theodore P; Pa, Judy; Gazzaley, Adam

2014-01-01

As the aging population grows, it has become increasingly important to carefully characterize amnestic mild cognitive impairment (aMCI), a preclinical stage of Alzheimer's disease (AD). Functional magnetic resonance imaging (fMRI) is a valuable tool for monitoring disease progression in selectively vulnerable brain regions associated with AD neuropathology. However, the reliability of fMRI data in longitudinal studies of older adults with aMCI is largely unexplored. To address this, aMCI participants completed two visual working tasks, a Delayed-Recognition task and a One-Back task, on three separate scanning sessions over a three-month period. Test-retest reliability of the fMRI blood oxygen level dependent (BOLD) activity was assessed using an intraclass correlation (ICC) analysis approach. Results indicated that brain regions engaged during the task displayed greater reliability across sessions compared to regions that were not utilized by the task. During task-engagement, differential reliability scores were observed across the brain such that the frontal lobe, medial temporal lobe, and subcortical structures exhibited fair to moderate reliability (ICC=0.3-0.6), while temporal, parietal, and occipital regions exhibited moderate to good reliability (ICC=0.4-0.7). Additionally, reliability across brain regions was more stable when three fMRI sessions were used in the ICC calculation relative to two fMRI sessions. In conclusion, the fMRI BOLD signal is reliable across scanning sessions in this population and thus a useful tool for tracking longitudinal change in observational and interventional studies in aMCI. © 2013.
German version, inter- and intrarater reliability and internal consistency of the "Agitated Behavior Scale" (ABS-G) in patients with moderate to severe traumatic brain injury.

PubMed

Hellweg, Stephanie; Schuster-Amft, Corina

2016-07-19

Agitation is frequently observed during early recovery after traumatic brain injury (TBI). Agitated behaviour often interferes with a goal-orientated rehabilitation and can be a substantial hindrance to therapy. Despite the relatively high occurance of agitation in TBI population there is no objective assessement in German (G) available. An existing scale with excellent psychometric properties is the "Agitated Behavior Scale (ABS)" developed by Corrigan in 1989. The aim of the study was to translate the Agitated Behavior Scale (ABS) into German (ABS-G) and investigate the inter- and intrarater reliability and internal consistency in patients with moderate to severe TBI. A formal nine-step translation and cross-cultural adaptation procedure (TCCA) was applied. Subsequently a prospective observational patient study was conducted. To examine the interrater reliability and internal consistency, two therapists rated 20 patients independently after a therapy session. This procedure was repeated twice on a weekly basis. The intrarater reliability was assessed through video recordings from three patients. Nine raters scored the demonstrated behaviour on the videotape with the ABS-G independently twice within one month. The inter- and intrarater reliability were evaluated with the Spearman rank correlation coefficient and the quadratic weighted kappa. The internal consistency was tested with Cronbach's alpha. Behaviour of 20 patients (18 males; mean age 41 ± 20.7; mean Functional Independence Measure (FIM) cognitive score on admission 7.1 ± 4.04; mean ABS-G score at first observation 17.3 ± 2.83) was assessed threefold. Interrater reliability yielded a correlation coefficient for ABS-G total score of all 60 paired observations of r s 0.845 and a weighted Kappa of 0.738. Intrarater reliability for ABS-G total score ranged between r s 0.719 and 0.953 and showed a weighted Kappa between 0.871 and 0.953. Cronbach's alpha indicated moderate internal consistency with 0.661. This study demonstrates that the ABS-G is a reliable instrument for evaluating agitation in patients with moderate to severe TBI. Hereby it would be possible to monitor agitation objectively and optimise the management of agitated patients according to international recommendations.
Reliability of 3D laser-based anthropometry and comparison with classical anthropometry.

PubMed

Kuehnapfel, Andreas; Ahnert, Peter; Loeffler, Markus; Broda, Anja; Scholz, Markus

2016-05-26

Anthropometric quantities are widely used in epidemiologic research as possible confounders, risk factors, or outcomes. 3D laser-based body scans (BS) allow evaluation of dozens of quantities in short time with minimal physical contact between observers and probands. The aim of this study was to compare BS with classical manual anthropometric (CA) assessments with respect to feasibility, reliability, and validity. We performed a study on 108 individuals with multiple measurements of BS and CA to estimate intra- and inter-rater reliabilities for both. We suggested BS equivalents of CA measurements and determined validity of BS considering CA the gold standard. Throughout the study, the overall concordance correlation coefficient (OCCC) was chosen as indicator of agreement. BS was slightly more time consuming but better accepted than CA. For CA, OCCCs for intra- and inter-rater reliability were greater than 0.8 for all nine quantities studied. For BS, 9 of 154 quantities showed reliabilities below 0.7. BS proxies for CA measurements showed good agreement (minimum OCCC > 0.77) after offset correction. Thigh length showed higher reliability in BS while upper arm length showed higher reliability in CA. Except for these issues, reliabilities of CA measurements and their BS equivalents were comparable.
Assessing physical activity during youth sport: the Observational System for Recording Activity in Children: Youth Sports.

PubMed

Cohen, Alysia; McDonald, Samantha; McIver, Kerry; Pate, Russell; Trost, Stewart

2014-05-01

The purpose of this study was to evaluate the validity and interrater reliability of the Observational System for Recording Activity in Children: Youth Sports (OSRAC:YS). Children (N = 29) participating in a parks and recreation soccer program were observed during regularly scheduled practices. Physical activity (PA) intensity and contextual factors were recorded by momentary time-sampling procedures (10-second observe, 20-second record). Two observers simultaneously observed and recorded children's PA intensity, practice context, social context, coach behavior, and coach proximity. Interrater reliability was based on agreement (Kappa) between the observer's coding for each category, and the Intraclass Correlation Coefficient (ICC) for percent of time spent in MVPA. Validity was assessed by calculating the correlation between OSRAC:YS estimated and objectively measured MVPA. Kappa statistics for each category demonstrated substantial to almost perfect interobserver agreement (Kappa = 0.67-0.93). The ICC for percent time in MVPA was 0.76 (95% C.I. = 0.49-0.90). A significant correlation (r = .73) was observed for MVPA recorded by observation and MVPA measured via accelerometry. The results indicate the OSRAC:YS is a reliable and valid tool for measuring children's PA and contextual factors during a youth soccer practice.
Reliability of the Adult Myopathy Assessment Tool in Individuals with Myositis

PubMed Central

Harris-Love, Michael O.; Joe, Galen; Davenport, Todd E.; Koziol, Deloris; Rose, Kristen Abbett; Shrader, Joseph A.; Vasconcelos, Olavo M.; McElroy, Beverly; Dalakas, Marinos C.

2015-01-01

Objective The Adult Myopathy Assessment Tool (AMAT) is a 13-item performance-based battery developed to assess functional status and muscle endurance. The purpose of this study was to determine the intrarater and interrater reliability of the AMAT in adults with myosits. Methods Nineteen raters (13 physical therapists and 6 physicians) scored videotaped recordings of patients with myositis performing the AMAT for a total of 114 tests and 1,482 item observations per session. Raters rescored the AMAT test and item observations during a follow up session (19 ±6 days between scoring sessions). All raters completed a single, self-directed, electronic training module prior to the initial scoring session. Results Intrarater and interrater reliability correlation coefficients were .94 or greater for the AMAT Functional Subscale, Endurance Subscale, and Total score (all p < 0.02 for Ho:ρ ≤ 0.75). All AMAT items had satisfactory intrarater agreement (Kappa statistics with Fleiss-Cohen weights, Kw = .57-1.00). Interrater agreement was acceptable for each AMAT item (K = .56-.89) except the sit up (K = .16). The standard error of measurement and 95% confidence interval range for the AMAT Total scores did not exceed 2 points across all observations (AMAT Total score range = 0-45). Conclusions The AMAT is a reliable, domain-specific assessment of functional status and muscle endurance for adult subjects with myositis. Results of this study suggest that physicians and physical therapists may reliably score the AMAT following a single training session. The AMAT Functional Subscale, Endurance Subscale, and Total score exhibit interrater and intrarater reliability suitable for clinical and research use. PMID:25201624
Medial tibial stress syndrome can be diagnosed reliably using history and physical examination.

PubMed

Winters, M; Bakker, E W P; Moen, M H; Barten, C C; Teeuwen, R; Weir, A

2017-02-08

The majority of sporting injuries are clinically diagnosed using history and physical examination as the cornerstone. There are no studies supporting the reliability of making a clinical diagnosis of medial tibial stress syndrome (MTSS). Our aim was to assess if MTSS can be diagnosed reliably, using history and physical examination. We also investigated if clinicians were able to reliably identify concurrent lower leg injuries. A clinical reliability study was performed at multiple sports medicine sites in The Netherlands. Athletes with non-traumatic lower leg pain were assessed for having MTSS by two clinicians, who were blinded to each others' diagnoses. We calculated the prevalence, percentage of agreement, observed percentage of positive agreement (Ppos), observed percentage of negative agreement (Pneg) and Kappa-statistic with 95%CI. Forty-nine athletes participated in this study, of whom 46 completed both assessments. The prevalence of MTSS was 74%. The percentage of agreement was 96%, with Ppos and Pneg of 97% and 92%, respectively. The inter-rater reliability was almost perfect; k=0.89 (95% CI 0.74 to 1.00), p<0.000001. Of the 34 athletes with MTSS, 11 (32%) had a concurrent lower leg injury, which was reliably noted by our clinicians, k=0.73, 95% CI 0.48 to 0.98, p<0.0001. Our findings show that MTSS can be reliably diagnosed clinically using history and physical examination, in clinical practice and research settings. We also found that concurrent lower leg injuries are common in athletes with MTSS. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Reliability of the Phi angle to assess rotational alignment of the talar component in total ankle replacement.

PubMed

Manzi, Luigi; Villafañe, Jorge Hugo; Indino, Cristian; Tamini, Jacopo; Berjano, Pedro; Usuelli, Federico Giuseppe

2017-11-08

The purpose of this study was to investigate the test-retest reliability of the Phi angle in patients undergoing total ankle replacement (TAR) for end stage ankle osteoarthritis (OA) to assess the rotational alignment of the talar component. Retrospective observational cross-sectional study of prospectively collected data. Post-operative anteroposterior radiographs of the foot of 170 patients who underwent TAR for the ankle OA were evaluated. Three physicians measured Phi on the 170 randomly sorted and anonymized radiographs on two occasions, one week apart (test and retest conditions), inter and intra-observer agreement were evaluated. Test-retest reliability of Phi angle measurement was excellent for patients with Hintegra TAR (ICC=0.995; p<0.001) and Zimmer TAR (ICC=0.995; p<0.001) on radiographs of subjects with ankle OA. There were no significant differences in the reliability of the Phi angle measurement between patients with Hintegra vs. Zimmer implants (p>0.05). Measurement of Phi angle on weight-bearing dorsoplantar radiograph showed an excellent reliability among orthopaedic surgeons in determining the position of the talar component in the axial plane. Level II, cross sectional study. Copyright © 2017 European Foot and Ankle Society. Published by Elsevier Ltd. All rights reserved.
Using the Hemophilia Joint Health Score for assessment of children: Reliability of the Spanish version.

PubMed

R, Cuesta-Barriuso; A, Torres-Ortuño; S, Pérez-Alenda; J, Carrasco Juan; F, Querol; J, Nieto-Munuera; Ja, López-Pina

2018-02-27

Numerous measuring instruments for the evaluation of hemophilic arthropathy have been developed. One of the most used systems is the Hemophilia Joint Health Score (HJHS) given its sensitivity to clinical changes appearing in the joints because of recurrent hemarthrosis. Assessing the interrater reliability, using the Spanish version of the HJHS (version 2.1) in children with hemophilia. Reliability study to assess the interrater reliability of the Spanish version of HJHS. A sample of 36 children aged 7-13 years diagnosed with hemophilia A or B was used. Two physiotherapists performed physical assessments with the Spanish version of the HJHS. Descriptive statistics (range, mean, standard deviation) and the analysis of interrater reliability were calculated. The interrater reliability was heterogeneous since the Kappa coefficient range (ĸ), although significant (p < 0.001), ranged 0.31-1.00 in the variables of HJHS (swelling, duration of swelling, muscle atrophy, crepitus on motion, flexion loss, extension loss, joint pain, strength, and global gait). In assessing the bias of observers with the Bland and Altman method, the observer 1 scored 0.41 (CI [-0.67, 1.49]) units above observer 2, and the difference between the two was significant (t(36) = 4.48), p < 0.001). The interrater reliability of the Spanish population version of the HJHS is high. This scale should be used generically in evaluating musculoskeletal pediatric patients with hemophilia.
Psychometric Properties of a Standardized Observation Protocol to Quantify Pediatric Physical Therapy Actions.

PubMed

Sonderer, Patrizia; Akhbari Ziegler, Schirin; Gressbach Oertle, Barbara; Meichtry, André; Hadders-Algra, Mijna

2017-07-01

Pediatric physical therapy (PPT) is characterized by heterogeneity. This blurs the evaluation of effective components of PPT. The Groningen Observation Protocol (GOP) was developed to quantify contents of PPT. This study assesses the reliability and completeness of the GOP. Sixty infant PPT sessions were video-taped. Two random samples of 10 videos were used to determine interrater and intrarater reliability using interclass correlation coefficients (ICCs) with 95% confidence intervals. Completeness of GOP 2.0 was based on 60 videos. Interrater reliability of quantifying PPT actions was excellent (ICC, 0.75-1.0) in 71% and sufficient to good (ICC, 0.4-0.74) in 24% of PPT actions. Intrarater reliability was excellent in 94% and sufficient to good in 6% of PPT actions. Completeness was good for greater than 90% of PPT actions. GOP 2.0 has good reliability and completeness. After appropriate training, it is a useful tool to quantify PPT for children with developmental disorders.
Measuring physical activity in preschoolers: Reliability and validity of The System for Observing Fitness Instruction Time for Preschoolers (SOFIT-P)

PubMed Central

Sharma, Shreela; Chuang, Ru-Jye; Skala, Katherine; Atteberry, Heather

2012-01-01

The purpose of this study is describe the initial feasibility, reliability, and validity of an instrument to measure physical activity in preschoolers using direct observation. The System for Observing Fitness Instruction Time for Preschoolers was developed and tested among 3- to 6-year-old children over fall 2008 for feasibility and reliability (Phase I, n=67) and in fall 2009 for concurrent validity (Phase II, n=27). Phase I showed that preschoolers spent >75% of their active time at preschool in light physical activity. The mean inter-observer agreements scores were ≥.75 for physical activity level and type. Correlation coefficients, measuring construct validity between the lesson context and physical activity types with and with the activity levels, were moderately strong. Phase II showed moderately strong correlations ranging from .50 to .54 between the System for Observing Fitness Instruction Time for Preschoolers and Actigraph accelerometers for physical activity levels. The System for Observing Fitness Instruction Time for Preschoolers shows promising initial results as a new method for measuring physical activity among preschoolers. PMID:22485071
[Reliability and reproducibility of the Fitzpatrick phototype scale for skin sensitivity to ultraviolet light].

PubMed

Sánchez, Guillermo; Nova, John; Arias, Nilsa; Peña, Bibiana

2008-12-01

The Fitzpatrick phototype scale has been used to determine skin sensitivity to ultraviolet light. The reliability of this scale in estimating sensitivity permits risk evaluation of skin cancer based on phototype. Reliability and changes in intra and inter-observer concordance was determined for the Fitzpatrick phototype scale after the assessment methods for establishing the phototype were standardized. An analytical study of intra and inter-observer concordance was performed. The Fitzpatrick phototype scale was standardized using focus group methodology. To determine intra and inter-observer agreement, the weighted kappa statistical method was applied. The standardization effect was measured using the equal kappa contrast hypothesis and Wald test for dependent measurements. The phototype scale was applied to 155 patients over 15 years of age who were assessed four times by two independent observers. The sample was drawn from patients of the Centro Dermatol6gico Federico Lleras Acosta. During the pre-standardization phase, the baseline and six-week inter-observer weighted kappa were 0.31 and 0.40, respectively. The intra-observer kappa values for observers A and B were 0.47 and 0.51, respectively. After the standardization process, the baseline and six-week inter-observer weighted kappa values were 0.77, and 0.82, respectively. Intra-observer kappa coefficients for observers A and B were 0.78 and 0.82. Statistically significant differences were found between coefficients before and after standardization (p<0.001) in all comparisons. Following a standardization exercise, the Fitzpatrick phototype scale yielded reliable, reproducible and consistent results.
Is Ultrasound a Valid and Reliable Imaging Modality for Airway Evaluation?: An Observational Computed Tomographic Validation Study Using Submandibular Scanning of the Mouth and Oropharynx.

PubMed

Abdallah, Faraj W; Yu, Eugene; Cholvisudhi, Phantila; Niazi, Ahtsham U; Chin, Ki J; Abbas, Sherif; Chan, Vincent W

2017-01-01

Ultrasound (US) imaging of the airway may be useful in predicting difficulty of airway management (DAM); but its use is limited by lack of proof of its validity and reliability. We sought to validate US imaging of the airway by comparison to CT-scan, and to assess its inter- and intra-observer reliability. We used submandibular sonographic imaging of the mouth and oropharynx to examine how well the ratio of tongue thickness to oral cavity height correlates with the ratio of tongue volume to oral cavity volume, an established tomographic measure of DAM. A cohort of 34 patients undergoing CT-scan was recruited. Study standardized assessments included CT-measured ratios of tongue volume to oropharyngeal cavity volume; tongue thickness to oral cavity height; and US-measured ratio of tongue thickness to oral cavity height. Two sonographers independently performed US imaging of the airway before and after CT-scan. Our findings indicate that the US-measured ratio of tongue thickness to oral cavity height highly correlates with the CT-measured ratio of tongue volume to oral cavity volume. US measurements also demonstrated strong inter- and intra-observer reliability. This study suggests that US is a valid and reliable tool for imaging the oral and oropharyngeal parts of the airway, as well as for measuring the volumetric relationship between the tongue and oral cavity, and may therefore be a useful predictor of DAM. © 2016 by the American Institute of Ultrasound in Medicine.
Reliability of a four-column classification for tibial plateau fractures.

PubMed

Martínez-Rondanelli, Alfredo; Escobar-González, Sara Sofía; Henao-Alzate, Alejandro; Martínez-Cano, Juan Pablo

2017-09-01

A four-column classification system offers a different way of evaluating tibial plateau fractures. The aim of this study is to compare the intra-observer and inter-observer reliability between four-column and classic classifications. This is a reliability study, which included patients presenting with tibial plateau fractures between January 2013 and September 2015 in a level-1 trauma centre. Four orthopaedic surgeons blindly classified each fracture according to four different classifications: AO, Schatzker, Duparc and four-column. Kappa, intra-observer and inter-observer concordance were calculated for the reliability analysis. Forty-nine patients were included. The mean age was 39 ± 14.2 years, with no gender predominance (men: 51%; women: 49%), and 67% of the fractures included at least one of the posterior columns. The intra-observer and inter-observer concordance were calculated for each classification: four-column (84%/79%), Schatzker (60%/71%), AO (50%/59%) and Duparc (48%/58%), with a statistically significant difference among them (p = 0.001/p = 0.003). Kappa coefficient for intr-aobserver and inter-observer evaluations: Schatzker 0.48/0.39, four-column 0.61/0.34, Duparc 0.37/0.23, and AO 0.34/0.11. The proposed four-column classification showed the highest intra and inter-observer agreement. When taking into account the agreement that occurs by chance, Schatzker classification showed the highest inter-observer kappa, but again the four-column had the highest intra-observer kappa value. The proposed classification is a more inclusive classification for the posteromedial and posterolateral fractures. We suggest, therefore, that it be used in addition to one of the classic classifications in order to better understand the fracture pattern, as it allows more attention to be paid to the posterior columns, it improves the surgical planning and allows the surgical approach to be chosen more accurately.

Intra- and inter-observer reliability of quantitative analysis of the infra-patellar fat pad and comparison between fat- and non-fat-suppressed imaging--Data from the osteoarthritis initiative.

PubMed

Steidle-Kloc, E; Wirth, W; Ruhdorfer, A; Dannhauer, T; Eckstein, F

2016-03-01

The infra-patellar fat pad (IPFP), as intra-articular adipose tissue represents a potential source of pro-inflammatory cytokines and its size has been suggested to be associated with osteoarthritis (OA) of the knee. This study examines inter- and intra-observer reliability of fat-suppressed (fs) and non-fat-suppressed (nfs) MR imaging for determination of IPFP morphological measurements as novel biomarkers. The IPFP of nine right knees of healthy Osteoarthritis Initiative participants was segmented by five readers, using fs and nfs baseline sagittal MRIs. The intra-observer reliability was determined from baseline and 1-year follow-up images. All segmentations were quality controlled (QC) by an expert reader. Reliability was expressed as root mean square coefficient of variation (RMS CV%). After QC, the inter-observer reliability for fs (nfs) imaging was 2.0% (1.1%) for IPFP volume, 2.1%/2.5% (1.6%/1.8%) for anterior/posterior surface areas, 1.8% (1.8%) for depth, and 2.1% (2.4%) for maximum sagittal area. The intra-observer reliability was 3.1% (5.0%) for volume, 2.3%/2.8% (2.5%/2.9%) for anterior/posterior surfaces, 1.9% (3.5%) for depth, and 3.3% (4.5%) for maximum sagittal area. IPFP volume from nfs images was systematically greater (+7.3%) than from fs images, but highly correlated (r=0.98). The results suggest that quantitative measurements of IPFP morphology can be performed with satisfactory reliability when expert QC is implemented. The IPFP is more clearly depicted in nfs images, and there is a small systematic off-set versus analysis from fs images. However, the high linear relationship between fs and nfs imaging suggests that fs images can be used to analyze IPFP morphology, when nfs images are not available. Copyright © 2015 Elsevier GmbH. All rights reserved.
Intra- and inter-observer reliability of quantitative analysis of the infra-patellar fat pad and comparison between fat- and non-fat-suppressed imaging—Data from the osteoarthritis initiative

PubMed Central

Steidle-Kloc, E.; Wirth, W.; Ruhdorfer, A.; Dannhauer, T.; Eckstein, F.

2015-01-01

The infra-patellar fat pad (IPFP), as intra-articular adipose tissue represents a potential source of pro-inflammatory cytokines and its size has been suggested to be associated with osteoarthritis (OA) of the knee. This study examines inter- and intra-observer reliability of fat-suppressed (fs) and non-fat-suppressed (nfs) MR imaging for determination of IPFP morphological measurements as novel biomarkers. The IPFP of nine right knees of healthy Osteoarthritis Initiative participants was segmented by five readers, using fs and nfs baseline sagittal MRIs. The intra-observer reliability was determined from baseline and 1-year follow-up images. All segmentations were quality controlled (QC) by an expert reader. Reliability was expressed as root mean square coefficient of variation (RMS CV%). After QC, the inter-observer reliability for fs (nfs) imaging was 2.0% (1.1%) for IPFP volume, 2.1%/2.5% (1.6%/1.8%) for anterior/posterior surface areas, 1.8% (1.8%) for depth, and 2.1% (2.4%) for maximum sagittal area. The intra-observer reliability was 3.1% (5.0%) for volume, 2.3%/2.8% (2.5%/2.9%) for anterior/posterior surfaces, 1.9% (3.5%) for depth, and 3.3% (4.5%) for maximum sagittal area. IPFP volume from nfs images was systematically greater (+7.3%) than from fs images, but highly correlated (r = 0.98). The results suggest that quantitative measurements of IPFP morphology can be performed with satisfactory reliability when expert QC is implemented. The IPFP is more clearly depicted in nfs images, and there is a small systematic off-set versus analysis from fs images. However, the high linear relationship between fs and nfs imaging suggests that fs images can be used to analyze IPFP morphology, when nfs images are not available. PMID:26569532
Reliability of the Serbian version of the International Physical Activity Questionnaire for older adults.

PubMed

Milanović, Zoran; Pantelić, Saša; Trajković, Nebojša; Jorgić, Bojan; Sporiš, Goran; Bratić, Milovan

2014-01-01

The purpose of this study was to determine the test-retest reliability of the International Physical Activity Questionnaire (IPAQ) for older adults in Serbia. Six hundred and sixty older adults (352 men, 53%; 308 women, 47%; mean age 67.65±5.76 years) participated in the study. To examine test-retest reliability, the participants were asked to complete the IPAQ on two occasions 2 weeks apart. Moderate reliability was observed between the repeated IPAQ, with intraclass correlation coefficients ranging from 0.53 to 0.91. The least reliability was established in leisure time activity (0.53) and the most reliability in the transport domain (0.91). Men and women had similar intraclass correlation coefficients for total physical activity (0.71 versus 0.74, respectively), while the biggest difference was obtained for housework in men (0.68) and in women (0.90). Our study shows that the long version of the IPAQ is a reliable instrument for assessing physical activity levels in older adults and that it may be useful for generating internationally comparable data.
Clinical Knowledge from Observational Studies: Everything You Wanted to Know but Were Afraid to Ask.

PubMed

Gershon, Andrea S; Jafarzadeh, S Reza; Wilson, Kevin C; Walkey, Allan J

2018-05-07

Well-done randomized trials provide accurate estimates of treatment effect by producing groups that are similar on all measures except for the intervention of interest. However, inferences of efficacy in tightly-controlled experimental settings may not translate into similar effectiveness in real-world settings. Observational studies generally enable inferences over a wider range of patient characteristics and evaluation of a broader range of outcomes over a longer period than randomized trials. However, clinicians are often reluctant to incorporate the findings of observational studies into clinical practice. Reason for uncertainty regarding observational studies include a lack of familiarity with observational research methods, occasional disagreements between results of observational studies and randomized trials, the perceived risk of spurious results from systematic bias, and prior teaching that randomized trials are the most reliable source of medical evidence. We propose that a better understanding of observational research will enhance clinicians' ability to distinguish reliable observational studies from those that are subjected to biases and, therefore, provide more confidence to apply observational research results into clinical practice when appropriate. Herein, we explain why observational studies may be perceived as less conclusive than randomized trials, address situations in which observational research and randomized trials produced different findings, and provide information on observational study design so that quality can be evaluated. We conclude that observational research is a valuable source of medical evidence and that clinical action is strongest when supported by both high quality observational studies and randomized trials.
Standardization for Ki-67 Assessment in Moderately Differentiated Breast Cancer. A Retrospective Analysis of the SAKK 28/12 Study

PubMed Central

Varga, Zsuzsanna; Cassoly, Estelle; Li, Qiyu; Oehlschlegel, Christian; Tapia, Coya; Lehr, Hans Anton; Klingbiel, Dirk; Thürlimann, Beat; Ruhstaller, Thomas

2015-01-01

Background Proliferative activity (Ki-67 Labelling Index) in breast cancer increasingly serves as an additional tool in the decision for or against adjuvant chemotherapy in midrange hormone receptor positive breast cancer. Ki-67 Index has been previously shown to suffer from high inter-observer variability especially in midrange (G2) breast carcinomas. In this study we conducted a systematic approach using different Ki-67 assessments on large tissue sections in order to identify the method with the highest reliability and the lowest variability. Materials and Methods Five breast pathologists retrospectively analyzed proliferative activity of 50 G2 invasive breast carcinomas using large tissue sections by assessing Ki-67 immunohistochemistry. Ki-67-assessments were done on light microscopy and on digital images following these methods: 1) assessing five regions, 2) assessing only darkly stained nuclei and 3) considering only condensed proliferative areas (‘hotspots’). An individual review (the first described assessment from 2008) was also performed. The assessments on light microscopy were done by estimating. All measurements were performed three times. Inter-observer and intra-observer reliabilities were calculated using the approach proposed by Eliasziw et al. Clinical cutoffs (14% and 20%) were tested using Fleiss’ Kappa. Results There was a good intra-observer reliability in 5 of 7 methods (ICC: 0.76–0.89). The two highest inter-observer reliability was fair to moderate (ICC: 0.71 and 0.74) in 2 methods (region-analysis and individual-review) on light microscopy. Fleiss’-kappa-values (14% cut-off) were the highest (moderate) using the original recommendation on light-microscope (Kappa 0.58). Fleiss’ kappa values (20% cut-off) were the highest (Kappa 0.48 each) in analyzing hotspots on light-microscopy and digital-analysis. No methodologies using digital-analysis were superior to the methods on light microscope. Conclusion Our results show that all methods on light-microscopy for Ki-67 assessment in large tissue sections resulted in a good intra-observer reliability. Region analysis and individual review (the original recommendation) on light-microscopy yielded the highest inter-observer reliability. These results show slight improvement to previously published data on poor-reproducibility and thus might be a practical-pragmatic way for routine assessment of Ki-67 Index in G2 breast carcinomas. PMID:25885288
Nutrition Environment Measures Survey in stores (NEMS-S): development and evaluation.

PubMed

Glanz, Karen; Sallis, James F; Saelens, Brian E; Frank, Lawrence D

2007-04-01

Eating, or nutrition, environments are believed to contribute to obesity and chronic diseases. There is a need for valid, reliable measures of nutrition environments. This article reports on the development and evaluation of measures of nutrition environments in retail food stores. The Nutrition Environment Measures Study developed observational measures of the nutrition environment within retail food stores (NEMS-S) to assess availability of healthy options, price, and quality. After pretesting, measures were completed by independent raters to evaluate inter-rater reliability and across two occasions to assess test-retest reliability in grocery and convenience stores in four neighborhoods differing on income and community design in the Atlanta metropolitan area. Data were collected and analyzed in 2004 and 2005. Ten food categories (e.g., fruits) or indicator food items (e.g., ground beef) were evaluated in 85 stores. Inter-rater reliability and test-retest reliability of availability were high: inter-rater reliability kappas were 0.84 to 1.00, and test-retest reliabilities were .73 to 1.00. Inter-rater reliability for quality across fresh produce was moderate (kappas, 0.44 to 1.00). Healthier options were higher priced for hot dogs, lean ground beef, and baked chips. More healthful options were available in grocery than convenience stores and in stores in higher income neighborhoods. The NEMS-S tool was found to have a high degree of inter-rater and test-retest reliability, and to reveal significant differences across store types and neighborhoods of high and low socioeconomic status. These observational measures of nutrition environments can be applied in multilevel studies of community nutrition, and can inform new approaches to conducting and evaluating nutrition interventions.
VFS interjudge reliability using a free and directed search.

PubMed

Bryant, Karen N; Finnegan, Eileen; Berbaum, Kevin

2012-03-01

Reports in the literature suggest that clinicians demonstrate poor reliability in rating videofluoroscopic swallow (VFS) variables. Contemporary perception theories suggest that the methods used in VFS reliability studies constrain subjects to make judgments in an abnormal way. The purpose of this study was to determine whether a directed search or a free search approach to rating swallow studies results in better interjudge reliability. Ten speech pathologists served as judges. Five clinical judges were assigned to the directed search group (use checklist) and five to the free search group (unguided observations). Clinical judges interpreted 20 VFS examinations of swallowing. Interjudge reliability of ratings of dysphagia severity, affected stage of swallow, dysphagia symptoms, and attributes identified by clinical judges using a directed search was compared with that using a free search approach. Interjudge reliability for rating the presence of aspiration and penetration was significantly better using a free search ("substantial" to "almost perfect" agreement) compared to a directed search ("moderate" agreement). Reliability of dysphagia severity ratings ranged from "moderate" to "almost perfect" agreement for both methods of search. Reliability for reporting all other symptoms and attributes of dysphagia was variable and was not significantly different between the groups.
Inter-Observer Reliability of DSM-5 Substance Use Disorders*

PubMed Central

Denis, Cécile M.; Gelernter, Joel; Hart, Amy B.; Kranzler, Henry R.

2015-01-01

Aims Although studies have examined the impact of changes made in DSM-5 on the estimated prevalence of substance use disorder (SUD) diagnoses, there is limited evidence of the reliability of DSM-5 SUDs. We evaluated the inter-observer reliability of four DSM-5 SUDs in a sample in which we had previously evaluated the reliability of DSM-IV diagnoses, allowing us to compare the two systems. Methods Two different interviewers each assessed 173 subjects over a 2-week period using the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA). Using the percent agreement and kappa (κ) coefficient, we examined the reliability of DSM-5 lifetime alcohol, opioid, cocaine, and cannabis use disorders, which we compared to that of SSADDA-derived DSM-IV SUD diagnoses. We also assessed the effect of additional lifetime SUD and lifetime mood or anxiety disorder diagnoses on the reliability of the DSM-5 SUD diagnoses. Results Reliability was good to excellent for the four disorders, with κ values ranging from 0.65 to 0.94. Agreement was consistently lower for SUDs of mild severity than for moderate or severe disorders. DSM-5 SUD diagnoses showed greater reliability than DSM-IV diagnoses of abuse or dependence or dependence only. Co-occurring SUD and lifetime mood or anxiety disorders exerted a modest effect on the reliability of the DSM-5 SUD diagnoses. Conclusions For alcohol, opioid, cocaine and cannabis use disorders, DSM-5 criteria and diagnoses are at least as reliable as those of DSM-IV. PMID:26048641
Use of the Environment and Policy Evaluation and Observation as a Self-Report Instrument (EPAO-SR) to measure nutrition and physical activity environments in child care settings: validity and reliability evidence.

PubMed

Ward, Dianne S; Mazzucca, Stephanie; McWilliams, Christina; Hales, Derek

2015-09-26

Early care and education (ECE) centers are important settings influencing young children's diet and physical activity (PA) behaviors. To better understand their impact on diet and PA behaviors as well as to evaluate public health programs aimed at ECE settings, we developed and tested the Environment and Policy Assessment and Observation - Self-Report (EPAO-SR), a self-administered version of the previously validated, researcher-administered EPAO. Development of the EPAO-SR instrument included modification of items from the EPAO, community advisory group and expert review, and cognitive interviews with center directors and classroom teachers. Reliability and validity data were collected across 4 days in 3-5 year old classrooms in 50 ECE centers in North Carolina. Center teachers and directors completed relevant portions of the EPAO-SR on multiple days according to a standardized protocol, and trained data collectors completed the EPAO for 4 days in the centers. Reliability and validity statistics calculated included percent agreement, kappa, correlation coefficients, coefficients of variation, deviations, mean differences, and intraclass correlation coefficients (ICC), depending on the response option of the item. Data demonstrated a range of reliability and validity evidence for the EPAO-SR instrument. Reporting from directors and classroom teachers was consistent and similar to the observational data. Items that produced strongest reliability and validity estimates included beverages served, outside time, and physical activity equipment, while items such as whole grains served and amount of teacher-led PA had lower reliability (observation and self-report) and validity estimates. To overcome lower reliability and validity estimates, some items need administration on multiple days. This study demonstrated appropriate reliability and validity evidence for use of the EPAO-SR in the field. The self-administered EPAO-SR is an advancement of the measurement of ECE settings and can be used by researchers and practitioners to assess the nutrition and physical activity environments of ECE settings.
A method for recording verbal behavior in free-play settings.

PubMed

Nordquist, V M

1971-01-01

The present study attempted to test the reliability of a new method of recording verbal behavior in a free-play preschool setting. Six children, three normal and three speech impaired, served as subjects. Videotaped records of verbal behavior were scored by two experimentally naive observers. The results suggest that the system provides a means of obtaining reliable records of both normal and impaired speech, even when the subjects exhibit nonverbal behaviors (such as hyperactivity) that interfere with direct observation techniques.
Investigations of the reliability of observational gait analysis for the assessment of lameness in horses.

PubMed

Hewetson, M; Christley, R M; Hunt, I D; Voute, L C

2006-06-24

The objectives of this study were to assess the reliability of a numerical rating scale (NRS) and a verbal rating scale (VRS) for the assessment of lameness in horses and to determine whether they can be used interchangeably. Sixteen independent observers graded the severity of lameness in 20 videotaped horses, and the agreement between and within observers, correlation and bias were determined for each scale. The observers agreed with each other in 56 per cent of the observations with the NRS and in 60 per cent of the observations with the VRS, and the associated Kendall coefficient of concordance was high. Similar trends were evident in the agreement between two observations by each observer. The correlation between and within observers was high for both scales. There were no significant differences (bias) among the observers' mean scores when using either scale. There was a significant correlation between the lameness scores attributed when using the two scales, but the differences between the scores when plotted against their overall mean were unacceptable for clinical purposes. The results indicate that the NRS and VRS are only moderately reliable when used to assess lameness severity in the horse, and that they should not be used interchangeably.
Five times sit-to-stand test in subjects with total knee replacement: Reliability and relationship with functional mobility tests.

PubMed

Medina-Mirapeix, Francesc; Vivo-Fernández, Iván; López-Cañizares, Juan; García-Vidal, José A; Benítez-Martínez, Josep Carles; Del Baño-Aledo, María Elena

2018-01-01

The objective was to determine the inter-observer and test/retest reliability of the "Five-repetition sit-to-stand" (5STS) test in patients with total knee replacement (TKR). To explore correlation between 5STS and two mobility tests. A reliability study was conducted among 24 (mean age 72.13, S.D. 10.67; 50% were women) outpatients with TKR. They were recruited from a traumatology unit of a public hospital via convenience sampling. A physiotherapist and trauma physician assessed each patient at the same time. The same physiotherapist realized a 5STS second measurement 45-60min after the first one. Reliability was assessed with intraclass correlation coefficients (ICCs) and Bland-Altman plots. Pearson coefficient was calculated to assess the correlation between 5STS, time up to go test (TUG) and four meters gait speed (4MGS). ICC for inter-observer and test-retest reliability of the 5STS were 0.998 (95% confidence interval [CI], 0.995-0.999) and 0.982 (95% CI, 0.959-0.992). Bland-Altman plot inter-observer showed limits between -0.82 and 1.06 with a mean of 0.11 and no heteroscedasticity within the data. Bland-Altman plot for test-retest showed the limits between 1.76 and 4.16, a mean of 1.20 and heteroscedasticity within the data. Pearson correlation coefficient revealed significant correlation between 5STS and TUG (r=0.7, p<0.001) and 4MGS (r=-0.583, p=0.003). This study demonstrates excellent inter-observer and test-retest reliability when it is used in people with TKR, and also significant correlation with other functional mobility tests. These findings support the use of 5STS as outcome measure in TKR population. Copyright © 2017 Elsevier B.V. All rights reserved.
Multidimensional measures validated for home health needs of older persons: A systematic review.

PubMed

de Rossi Figueiredo, Daniela; Paes, Lucilene Gama; Warmling, Alessandra Martins; Erdmann, Alacoque Lorenzini; de Mello, Ana Lúcia Schaefer Ferreira

2018-01-01

To conduct a systematic review of the literature on valid and reliable multidimensional instruments to assess home health needs of older persons. Systematic review. Electronic databases, PubMed/Medline, Web of Science, Scopus, Cumulative Index to Nursing and Allied Health Literature, Scientific Electronic Library Online and the Latin American and Caribbean Health Sciences Information. All English, Portuguese and Spanish literature which included studies of reliability and validity of instruments that assessed at least two dimensions: physical, psychological, social support and functional independence, self-rated health behaviors and contextual environment and if such instruments proposed interventions after evaluation and/or monitoring changes over a period of time. Older persons aged 60 years or older. Of the 2397 studies identified, 32 were considered eligible. Two-thirds of the instruments proposed the physical, psychological, social support and functional independence dimensions. Inter-observer and intra-observer reliability and internal consistency values were 0.7 or above. More than two-thirds of the studies included validity (n=26) and more than one validity was tested in 15% (n=4) of these. Only 7% (n=2) proposed interventions after evaluation and/or monitoring changes over a period of time. Although the multidimensional assessment was performed, and the reliability values of the reviewed studies were satisfactory, different validity tests were not present in several studies. A gap at the instrument conception was observed related to interventions after evaluation and/or monitoring changes over a period of time. Further studies with this purpose are necessary for home health needs of the older persons. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evaluating the intra- and interobserver reliability of three-dimensional ultrasound and power Doppler angiography (3D-PDA) for assessment of placental volume and vascularity in the second trimester of pregnancy.

PubMed

Jones, Nia W; Raine-Fenning, Nick J; Mousa, Hatem A; Bradley, Eileen; Bugg, George J

2011-03-01

Three-dimensional (3-D) power Doppler angiography (3-D-PDA) allows visualisation of Doppler signals within the placenta and their quantification is possible by the generation of vascular indices by the 4-D View software programme. This study aimed to investigate intra- and interobserver reproducibility of 3-D-PDA analysis of stored datasets at varying gestations with the ultimate goal being to develop a tool for predicting placental dysfunction. Women with an uncomplicated, viable singleton pregnancy were scanned at 12, 16 or 20 weeks gestational age groups. 3-D-PDA datasets acquired of the whole placenta were analysed using the VOCAL software processing tool. Each volume was analysed by three observers twice in the A plane. Intra- and interobserver reliability was assessed by intraclass correlation coefficients (ICCs) and Bland Altman plots. At each gestational age group, 20 low risk women were scanned resulting in 60 datasets in total. The ICC demonstrated a high level of measurement reliability at each gestation with intraobserver values >0.90 and interobserver values of >0.6 for the vascular indices. Bland Altman plots also showed high levels of agreement. Systematic bias was seen at 20 weeks in the vascular indices obtained by different observers. This study demonstrates that 3-D-PDA data can be measured reliably by different observers from stored datasets up to 18 weeks gestation. Measurements become less reliable as gestation advances with bias between observers evident at 20 weeks. Copyright © 2011 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Design, application and testing of the Work Observation Method by Activity Timing (WOMBAT) to measure clinicians' patterns of work and communication.

PubMed

Westbrook, Johanna I; Ampt, Amanda

2009-04-01

Evidence regarding how health information technologies influence clinicians' patterns of work and support efficient practices is limited. Traditional paper-based data collection methods are unable to capture clinical work complexity and communication patterns. The use of electronic data collection tools for such studies is emerging yet is rarely assessed for reliability or validity. Our aim was to design, apply and test an observational method which incorporated the use of an electronic data collection tool for work measurement studies which would allow efficient, accurate and reliable data collection, and capture greater degrees of work complexity than current approaches. We developed an observational method and software for personal digital assistants (PDAs) which captures multiple dimensions of clinicians' work tasks, namely what task, with whom, and with what; tasks conducted in parallel (multi-tasking); interruptions and task duration. During field-testing over 7 months across four hospital wards, fifty-two nurses were observed for 250 h. Inter-rater reliability was tested and validity was measured by (i) assessing whether observational data reflected known differences in clinical role work tasks and (ii) by comparing observational data with participants' estimates of their task time distribution. Observers took 15-20 h of training to master the method and data collection process. Only 1% of tasks observed did not match the classification developed and were classified as 'other'. Inter-rater reliability scores of observers were maintained at over 85%. The results discriminated between the work patterns of enrolled and registered nurses consistent with differences in their roles. Survey data (n=27) revealed consistent ratings of tasks by nurses, and their rankings of most to least time-consuming tasks were significantly correlated with those derived from the observational data. Over 40% of nurses' time was spent in direct care or professional communication, with 11.8% of time spent multi-tasking. Nurses were interrupted approximately every 49 min. One quarter of interruptions occurred while nurses were preparing or administering medications. This method efficiently produces reliable and valid data. The multi-dimensional nature of the data collected provides greater insights into patterns of clinicians' work and communication than has previously been possible using other methods.
The Assumption of a Reliable Instrument and Other Pitfalls to Avoid When Considering the Reliability of Data

PubMed Central

Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.

2012-01-01

The purpose of this article is to help researchers avoid common pitfalls associated with reliability including incorrectly assuming that (a) measurement error always attenuates observed score correlations, (b) different sources of measurement error originate from the same source, and (c) reliability is a function of instrumentation. To accomplish our purpose, we first describe what reliability is and why researchers should care about it with focus on its impact on effect sizes. Second, we review how reliability is assessed with comment on the consequences of cumulative measurement error. Third, we consider how researchers can use reliability generalization as a prescriptive method when designing their research studies to form hypotheses about whether or not reliability estimates will be acceptable given their sample and testing conditions. Finally, we discuss options that researchers may consider when faced with analyzing unreliable data. PMID:22518107
Remote Digital Preoperative Assessments for Cleft Lip and Palate May Improve Clinical and Economic Impact in Global Plastic Surgery.

PubMed

Hughes, Christopher; Campbell, Jacob; Mukhopadhyay, Swagoto; McCormack, Susan; Silverman, Richard; Lalikos, Janice; Babigian, Alan; Castiglione, Charles

2017-09-01

Reconstructive surgical care can play a vital role in the resource-poor settings of low- and middle-income countries. Telemedicine platforms can improve the efficiency and effectiveness of surgical care. The purpose of this study is to determine whether remote digital video evaluations are reliable in the context of a short-term plastic surgical intervention. The setting for this study was a district hospital located in Latacunga, Ecuador. Participants were 27 consecutive patients who presented for operative repair of cleft lip and palate. We calculated kappa coefficients for reliability between in-person and remote digital video assessments for the classification of cleft lip and palate between two separate craniofacial surgeons. We hypothesized that the technology would be a reliable method of preoperative assessment for cleft disease. Of the 27 (81.4%) participants, 22 received operative treatment for their cleft disorder. Mean age was 11.1 ± 8.3 years. Patients presented with a spectrum of disorders, including cleft lip (24 of 27, 88.9%), cleft palate (19 of 27, 70.4%), and alveolar cleft (19 of 27, 70.4%). We found a 95.7% agreement between observers for cleft lip with substantial reliability (κ = .78, P < .01). There was an 82.6% agreement between observers for cleft palate, with a moderate interrater reliability (κ = .55, P = .01). We found only a 47.8% agreement between observers for alveolar cleft with a nonsignificant, weak kappa agreement (κ = .06, P = .74). Remote digital assessments are a reliable way to preoperatively diagnose cleft lip and palate in the context of short-term plastic surgical interventions in low- and middle-income countries. Future work will evaluate the potential for real-time, telemedicine assessments to reduce cost and improve clinical effectiveness in global plastic surgery.
Test-retest reliability of a computer-assisted self-administered questionnaire on early life exposure in a nasopharyngeal carcinoma case-control study.

PubMed

Mai, Zhi-Ming; Lin, Jia-Huang; Chiang, Shing-Chun; Ngan, Roger Kai-Cheong; Kwong, Dora Lai-Wan; Ng, Wai-Tong; Ng, Alice Wan-Ying; Yuen, Kam-Tong; Ip, Kai-Ming; Chan, Yap-Hang; Lee, Anne Wing-Mui; Ho, Sai-Yin; Lung, Maria Li; Lam, Tai-Hing

2018-05-04

We evaluated the reliability of early life nasopharyngeal carcinoma (NPC) aetiology factors in the questionnaire of an NPC case-control study in Hong Kong during 2014-2017. 140 subjects aged 18+ completed the same computer-assisted questionnaire twice, separated by at least 2 weeks. The questionnaire included most known NPC aetiology factors and the present analysis focused on early life exposure. Test-retest reliability of all the 285 questionnaire items was assessed in all subjects and in 5 subgroups defined by cases/controls, sex, time between 1 st and 2 nd questionnaire (2-29/≥30 weeks), education (secondary or less/postsecondary), and age (25-44/45-59/60+ years) at the first questionnaire. The reliability of items on dietary habits, body figure, skin tone and sun exposure in early life periods (age 6-12 and 13-18) was moderate-to-almost perfect, and most other items had fair-to-substantial reliability in all life periods (age 6-12, 13-18 and 19-30, and 10 years ago). Differences in reliability by strata of the 5 subgroups were only observed in a few items. This study is the first to report the reliability of an NPC questionnaire, and make the questionnaire available online. Overall, our questionnaire had acceptable reliability, suggesting that previous NPC study results on the same risk factors would have similar reliability.
Reliability of laser Doppler flowmetry curve reading for measurement of toe and ankle pressures: intra- and inter-observer variation.

PubMed

Høyer, C; Paludan, J P D; Pavar, S; Biurrun Manresa, J A; Petersen, L J

2014-03-01

To assess the intra- and inter-observer variation in laser Doppler flowmetry curve reading for measurement of toe and ankle pressures. A prospective single blinded diagnostic accuracy study was conducted on 200 patients with known or suspected peripheral arterial disease (PAD), with a total of 760 curve sets produced. The first curve reading for this study was performed by laboratory technologists blinded to clinical clues and previous readings at least 3 months after the primary data sampling. The pressure curves were later reassessed following another period of at least 3 months. Observer agreement in diagnostic classification according to TASC-II criteria was quantified using Cohen's kappa. Reliability was quantified using intra-class correlation coefficients, coefficients of variance, and Bland-Altman analysis. The overall agreement in diagnostic classification (PAD/not PAD) was 173/200 (87%) for intra-observer (κ = .858) and 175/200 (88%) for inter-observer data (κ = .787). Reliability analysis confirmed excellent correlation for both intra- and inter-observer data (ICC all ≥.931). The coefficients of variance ranged from 2.27% to 6.44% for intra-observer and 2.39% to 8.42% for inter-observer data. Subgroup analysis showed lower observer-variation for reading of toe pressures in patients with diabetes and/or chronic kidney disease than patients not diagnosed with these conditions. Bland-Altman plots showed higher variation in toe pressure readings than ankle pressure readings. This study shows substantial intra- and inter-observer agreement in diagnostic classification and reading of absolute pressures when using laboratory technologists as observers. The study emphasises that observer variation for curve reading is an important factor concerning the overall reproducibility of the method. Our data suggest diabetes and chronic kidney disease have an influence on toe pressure reproducibility. Copyright © 2013 European Society for Vascular Surgery. Published by Elsevier Ltd. All rights reserved.
What Works Clearinghouse Quick Review: "Gathering Feedback for Teachers: Combining High-Quality Observations with Student Surveys and Achievement Gains"

ERIC Educational Resources Information Center

What Works Clearinghouse, 2012

2012-01-01

This study examined five instruments used to assess the effectiveness of teacher practices based on classroom observations. The study first examined whether observers could reliably assess teachers with each instrument, and then examined how well each instrument, along with other information, predicted student achievement. The study reported that,…

Are photographic records reliable for orthodontic screening?

PubMed

Mandall, N A

2002-06-01

The aim of the study was to evaluate the reliability of a panel of orthodontists for accepting new patient referrals based on clinical photographs. Eight orthodontists from Greater Manchester, Lancashire, Chester, and Derbyshire observed clinical photographs of 40 consecutive new patients attending the orthodontic department, Hope Hospital, Salford. They recorded whether or not they would accept the patient, as a new patient referral, in their department. Each consultant was asked to take into account factors, such as oral hygiene, dental development, and severity of the malocclusion. Kappa statistic for multiple-rater agreement and kappa statistic for intra-observer reliability were calculated. Inter-observer panel agreement for accepting new patient referrals based on photographic information was low (multiple rater kappa score 0.37). Intra-examiner agreement was better (kappa range 0.34-0.90). Clinician agreement for screening and accepting orthodontic referrals based on clinical photographs is comparable to that previously reported for other clinical decision making.
Research Review: Test-retest reliability of standardized diagnostic interviews to assess child and adolescent psychiatric disorders: a systematic review and meta-analysis.

PubMed

Duncan, Laura; Comeau, Jinette; Wang, Li; Vitoroulis, Irene; Boyle, Michael H; Bennett, Kathryn

2018-02-19

A better understanding of factors contributing to the observed variability in estimates of test-retest reliability in published studies on standardized diagnostic interviews (SDI) is needed. The objectives of this systematic review and meta-analysis were to estimate the pooled test-retest reliability for parent and youth assessments of seven common disorders, and to examine sources of between-study heterogeneity in reliability. Following a systematic review of the literature, multilevel random effects meta-analyses were used to analyse 202 reliability estimates (Cohen's kappa = ҡ) from 31 eligible studies and 5,369 assessments of 3,344 children and youth. Pooled reliability was moderate at ҡ = .58 (CI 95% 0.53-0.63) and between-study heterogeneity was substantial (Q = 2,063 (df = 201), p < .001 and I 2 = 79%). In subgroup analysis, reliability varied across informants for specific types of psychiatric disorder (ҡ = .53-.69 for parent vs. ҡ = .39-.68 for youth) with estimates significantly higher for parents on attention deficit hyperactivity disorder, oppositional defiant disorder and the broad groupings of externalizing and any disorder. Reliability was also significantly higher in studies with indicators of poor or fair study methodology quality (sample size <50, retest interval <7 days). Our findings raise important questions about the meaningfulness of published evidence on the test-retest reliability of SDIs and the usefulness of these tools in both clinical and research contexts. Potential remedies include the introduction of standardized study and reporting requirements for reliability studies, and exploration of other approaches to assessing and classifying child and adolescent psychiatric disorder. © 2018 Association for Child and Adolescent Mental Health.
Validation of Clinical Observations of Mastication in Persons with ALS.

PubMed

Simione, Meg; Wilson, Erin M; Yunusova, Yana; Green, Jordan R

2016-06-01

Amyotrophic lateral sclerosis (ALS) is a progressive neurological disease that can result in difficulties with mastication leading to malnutrition, choking or aspiration, and reduced quality of life. When evaluating mastication, clinicians primarily observe spatial and temporal aspects of jaw motion. The reliability and validity of clinical observations for detecting jaw movement abnormalities is unknown. The purpose of this study is to determine the reliability and validity of clinician-based ratings of chewing performance in neuro-typical controls and persons with varying degrees of chewing impairments due to ALS. Adults chewed a solid food consistency while full-face video were recorded along with jaw kinematic data using a 3D optical motion capture system. Five experienced speech-language pathologists watched the videos and rated the spatial and temporal aspects of chewing performance. The jaw kinematic data served as the gold-standard for validating the clinicians' ratings. Results showed that the clinician-based rating of temporal aspects of chewing performance had strong inter-rater reliability and correlated well with comparable kinematic measures. In contrast, the reliability of rating the spatial and spatiotemporal aspects of chewing (i.e., range of motion of the jaw, consistency of the chewing pattern) was mixed. Specifically, ratings of range of motion were at best only moderately reliable. Ratings of chewing movement consistency were reliable but only weakly correlated with comparable measures of jaw kinematics. These findings suggest that clinician ratings of temporal aspects of chewing are appropriate for clinical use, whereas ratings of the spatial and spatiotemporal aspects of chewing may not be reliable or valid.
Intra- and interobserver reliability of quantitative ultrasound measurement of the plantar fascia.

PubMed

Rathleff, Michael Skovdal; Moelgaard, Carsten; Lykkegaard Olesen, Jens

2011-01-01

To determine intra- and interobserver reliability and measurement precision of sonographic assessment of plantar fascia thickness when using one, the mean of two, or the mean of three measurements. Two experienced observers scanned 20 healthy subjects twice with 60 minutes between test and retest. A GE LOGIQe ultrasound scanner was used in the study. The built-in software in the scanner was used to measure the thickness of the plantar fascia (PF). Reliability was calculated using intraclass correlation coefficient (ICC) and limits of agreement (LOA). Intraobserver reliability (ICC) using one measurement was 0.50 for one observer and 0.52 for the other, and using the mean of three measurements intraobserver reliability increased up to 0.77 and 0.67, respectively. Interobserver reliability (ICC) when using one measurement was 0.62 and increased to 0.82 when using the average of three measurements. LOA showed that when using the average of three measurements, LOA decreased to 0.6 mm, corresponding to 17.5% of the mean thickness of the PF. The results showed that reliability increases when using the mean of three measurements compared with one. Limits of agreement based on intratester reliability shows that changes in thickness that are larger than 0.6 mm can be considered actual changes in thickness and not a result of measurement error. Copyright © 2011 Wiley Periodicals, Inc.
Reliability Concerns in Measuring Respondent Skin Tone by Interviewer Observation

PubMed Central

Hannon, Lance; DeFina, Robert

2016-01-01

The current study assesses the intercoder reliability of one of the most important skin tone measurement instruments—the Massey–Martin scale. This scale is used in several high-profile social surveys, but has not yet been psychometrically evaluated. The current evaluation is only possible because, for the first time, the General Social Survey’s 2010–2014 panel used the instrument to guide interviewers’ skin tone observation of the same respondents in two different years (2012 and 2014). Despite the widespread use of the Massey–Martin scale to investigate potential effects of skin tone on social attitudes and outcomes, the data suggest that the measure has low intercoder reliability. Implications for researchers and survey practitioners are discussed. PMID:27274576
Comparison of two- and three-dimensional assessment methods of nasolabial appearance in cleft lip and palate patients: Do the assessment methods measure the same outcome?

PubMed

Mosmuller, David G M; Maal, Thomas J; Prahl, Charlotte; Tan, Robin A; Mulder, Frans J; Schwirtz, Roderic M F; de Vet, Henrica C W; Bergé, Stefaan J; Don Griot, J P W

2017-08-01

For the assessment of the nasolabial appearance in cleft patients, a widely accepted, reliable scoring system is not available. In this study four different methods of assessment are compared, including 2D and 3D asymmetry and aesthetic assessments. The data and ratings from an earlier study using the Asher-McDade aesthetic index on 3D photographs and the outcomes of 3D facial distance mapping were compared to a 2D aesthetic assessment, the Cleft Aesthetic Rating Scale, and to SymNose, a computerized 2D asymmetry assessment technique. The reliability and correlation between the four assessment techniques were tested using a sample of 79 patients. The 3D asymmetry assessment had the highest reliability and could be performed by just one observer (Intraclass correlation coefficient (ICC): 0.99). The 2D asymmetry assessment of the nose was highly reliable when performed by just one observer (ICC: 0.89). However, for the 2D asymmetry assessment of the lip more observers were needed. For the 2D aesthetic assessments 3 observers were needed. The 3D aesthetic assessment had the lowest single-observer reliability (ICC: 0.38-0.56) of all four techniques. The agreement between the different assessment methods is poor to very poor. The highest correlation (R: 0.48) was found between 2D and 3D aesthetic assessments. Remarkably, the lowest correlations were found between 2D and 3D asymmetry assessments (0.08-0.17). Different assessment methods are not in agreement and seem to measure different nasolabial aspects. More research is needed to establish exactly what each assessment technique measures and which measurements or outcomes are relevant for the patients. Copyright © 2017 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Reliability of pelvic floor measurements on three- and four-dimensional ultrasound during and after first pregnancy: implications for training.

PubMed

van Veelen, G A; Schweitzer, K J; van der Vaart, C H

2013-11-01

To evaluate the reliability of measurements of the levator hiatus and levator-urethra gap (LUG) using three/four-dimensional (3D/4D) transperineal ultrasound in women during their first pregnancy and 6 months postpartum, and to assess the learning process for these measurements. An inexperienced observer was taught to perform measurements of the levator hiatus and LUG by an experienced observer. After training, 3D/4D ultrasound volume datasets of 40 women in the first trimester were analyzed by these two observers. Another training session then took place and both observers repeated the analyses of the same volume datasets. Finally, analyses of 40 volume datasets of the women 6 months postpartum were performed by both observers. Intra- and interobserver reliability were determined by intraclass correlation coefficients (ICC) with 95% CIs. For levator hiatal measurements, in the women during their first pregnancy the interobserver reliability was substantial to almost perfect after both the first and second training session (ICC, 0.62-0.83 and 0.71-0.89, respectively, for anteroposterior diameter, transverse diameter and area at rest, on contraction and on Valsalva) and the intraobserver reliability was substantial to almost perfect for both observers. For these measurements performed once the women had delivered, interobserver reliability was moderate to almost perfect. For LUG measurements performed during pregnancy, interobserver reliability was slight to moderate after the first training session (ICC, 0.14-0.54), but improved after the second training session (ICC, 0.38-0.71), and intraobserver reliability was moderate to substantial for the experienced observer and slight to moderate for the inexperienced observer. For these measurements performed when the women had delivered, interobserver reliability was fair to moderate. The levator hiatus and LUG can be measured reliably using 3D/4D ultrasound in primigravid and primiparous women. The technique to measure dimensions of the levator hiatus requires limited teaching, but LUG measurements are more difficult and require more extensive training. Copyright © 2013 ISUOG. Published by John Wiley & Sons Ltd.
Development and evaluation of the OHCITIES instrument: assessing alcohol urban environments in the Heart Healthy Hoods project

PubMed Central

Sureda, Xisca; Espelt, Albert; Villalbí, Joan R; Cebrecos, Alba; Baranda, Lucía; Pearce, Jamie; Franco, Manuel

2017-01-01

Objectives To describe the development and test–retest reliability of OHCITIES, an instrument characterising alcohol urban environment in terms of availability, promotion and signs of consumption. Design This study involved: (1) developing the conceptual framework for alcohol urban environment by means of literature reviewing and previous alcohol environment research experience; (2) pilot testing and redesigning the instrument; (3) instrument digitalisation; (4) instrument evaluation using test–retest reliability. Setting Data for testing the reliability of the instrument were collected in seven census sections in Madrid in 2016 by two observers. Primary and secondary outcome measures We computed per cent agreement and Cohen’s kappa coefficients to estimate inter-rater and test–retest reliability for alcohol outlet environment measures. We calculated interclass coefficients and their 95% CIs to provide a measure of inter-rater reliability for signs of alcohol consumption measures. Results We collected information on 92 on-premise and 24 off-premise alcohol outlets identified in the studied areas about availability, accessibility and promotion of alcohol. Most per cent-agreement values for alcohol measures in on-premise and off-premise alcohol outlets were greater than 80%, and inter-rater and test–retest reliability values were generally above 0.80. Observers identified 26 streets and 3 public squares with signs of alcohol consumption. Intraclass correlation coefficient between observers for any type of signs of alcohol consumption was 0.50 (95% CI −0.09 to 0.77). Few items promoting alcohol unrelated to alcohol outlets were found on public spaces. Conclusions The OHCITIES instrument is a reliable instrument to characterise alcohol urban environment. This instrument might be used to understand how alcohol environment associates with alcohol behaviours and its related health outcomes, and can help in the design and evaluation of policies to reduce the harm caused by alcohol. PMID:28982829
Development and evaluation of the OHCITIES instrument: assessing alcohol urban environments in the Heart Healthy Hoods project.

PubMed

Sureda, Xisca; Espelt, Albert; Villalbí, Joan R; Cebrecos, Alba; Baranda, Lucía; Pearce, Jamie; Franco, Manuel

2017-10-05

To describe the development and test-retest reliability of OHCITIES, an instrument characterising alcohol urban environment in terms of availability, promotion and signs of consumption. This study involved: (1) developing the conceptual framework for alcohol urban environment by means of literature reviewing and previous alcohol environment research experience; (2) pilot testing and redesigning the instrument; (3) instrument digitalisation; (4) instrument evaluation using test-retest reliability. Data for testing the reliability of the instrument were collected in seven census sections in Madrid in 2016 by two observers. We computed per cent agreement and Cohen's kappa coefficients to estimate inter-rater and test-retest reliability for alcohol outlet environment measures. We calculated interclass coefficients and their 95% CIs to provide a measure of inter-rater reliability for signs of alcohol consumption measures. We collected information on 92 on-premise and 24 off-premise alcohol outlets identified in the studied areas about availability, accessibility and promotion of alcohol. Most per cent-agreement values for alcohol measures in on-premise and off-premise alcohol outlets were greater than 80%, and inter-rater and test-retest reliability values were generally above 0.80. Observers identified 26 streets and 3 public squares with signs of alcohol consumption. Intraclass correlation coefficient between observers for any type of signs of alcohol consumption was 0.50 (95% CI -0.09 to 0.77). Few items promoting alcohol unrelated to alcohol outlets were found on public spaces. The OHCITIES instrument is a reliable instrument to characterise alcohol urban environment. This instrument might be used to understand how alcohol environment associates with alcohol behaviours and its related health outcomes, and can help in the design and evaluation of policies to reduce the harm caused by alcohol. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Development and assessment of a digital X-ray software tool to determine vertebral rotation in adolescent idiopathic scoliosis.

PubMed

Eijgenraam, Susanne M; Boselie, Toon F M; Sieben, Judith M; Bastiaenen, Caroline H G; Willems, Paul C; Arts, Jacobus J; Lataster, Arno

2017-02-01

The amount of vertebral rotation in the axial plane is of key importance in the prognosis and treatment of adolescent idiopathic scoliosis (AIS). Current methods to determine vertebral rotation are either designed for use in analogue plain radiographs and not useful in digital images, or lack measurement precision and are therefore less suitable for the follow-up of rotation in AIS patients. This study aimed to develop a digital X-ray software tool with high measurement precision to determine vertebral rotation in AIS, and to assess its (concurrent) validity and reliability. In this study a combination of basic science and reliability methodology applied in both laboratory and clinical settings was used. Software was developed using the algorithm of the Perdriolle torsion meter for analogue AP plain radiographs of the spine. Software was then assessed for (1) concurrent validity and (2) intra- and interobserver reliability. Plain radiographs of both human cadaver vertebrae and outpatient AIS patients were used. Concurrent validity was measured by two independent observers, both experienced in the assessment of plain radiographs. Reliability-measurements were performed by three independent spine surgeons. Pearson correlation of the software compared with the analogue Perdriolle torsion meter for mid-thoracic vertebrae was 0.98, for low-thoracic vertebrae 0.97 and for lumbar vertebrae 0.97. Measurement exactness of the software was within 5° in 62% of cases and within 10° in 97% of cases. Intraclass correlation coefficient (ICC) for inter-observer reliability was 0.92 (0.91-0.95), ICC for intra-observer reliability was 0.96 (0.94-0.97). We developed a digital X-ray software tool to determine vertebral rotation in AIS with a substantial concurrent validity and reliability, which may be useful for the follow-up of vertebral rotation in AIS patients. Copyright © 2015 Elsevier Inc. All rights reserved.
Reliability and validity of a tool to measure the severity of tongue thrust in children: the Tongue Thrust Rating Scale.

PubMed

Serel Arslan, S; Demir, N; Karaduman, A A

2017-02-01

This study aimed to develop a scale called Tongue Thrust Rating Scale (TTRS), which categorised tongue thrust in children in terms of its severity during swallowing, and to investigate its validity and reliability. The study describes the developmental phase of the TTRS and presented its content and criterion-based validity and interobserver and intra-observer reliability. For content validation, seven experts assessed the steps in the scale over two Delphi rounds. Two physical therapists evaluated videos of 50 children with cerebral palsy (mean age, 57·9 ± 16·8 months), using the TTRS to test criterion-based validity, interobserver and intra-observer reliability. The Karaduman Chewing Performance Scale (KCPS) and Drooling Severity and Frequency Scale (DSFS) were used for criterion-based validity. All the TTRS steps were deemed necessary. The content validity index was 0·857. A very strong positive correlation was found between two examinations by one physical therapist, which indicated intra-observer reliability (r = 0·938, P < 0·001). A very strong positive correlation was also found between the TTRS scores of two physical therapists, indicating interobserver reliability (r = 0·892, P < 0·001). There was also a strong positive correlation between the TTRS and KCPS (r = 0·724, P < 0·001) and a very strong positive correlation between the TTRS scores and DSFS (r = 0·822 and r = 0·755; P < 0·001). These results demonstrated the criterion-based validity of the TTRS. The TTRS is a valid, reliable and clinically easy-to-use functional instrument to document the severity of tongue thrust in children. © 2016 John Wiley & Sons Ltd.
Establishing the reliability of rhesus macaque social network assessment from video observations

PubMed Central

Feczko, Eric; Mitchell, Thomas A. J.; Walum, Hasse; Brooks, Jenna M.; Heitz, Thomas R.; Young, Larry J.; Parr, Lisa A.

2015-01-01

Understanding the properties of a social environment is important for understanding the dynamics of social relationships. Understanding such dynamics is relevant for multiple fields, ranging from animal behaviour to social and cognitive neuroscience. To quantify social environment properties, recent studies have incorporated social network analysis. Social network analysis quantifies both the global and local properties of a social environment, such as social network efficiency and the roles played by specific individuals, respectively. Despite the plethora of studies incorporating social network analysis, methods to determine the amount of data necessary to derive reliable social networks are still being developed. Determining the amount of data necessary for a reliable network is critical for measuring changes in the social environment, for example following an experimental manipulation, and therefore may be critical for using social network analysis to statistically assess social behaviour. In this paper, we extend methods for measuring error in acquired data and for determining the amount of data necessary to generate reliable social networks. We derived social networks from a group of 10 male rhesus macaques, Macaca mulatta, for three behaviours: spatial proximity, grooming and mounting. Behaviours were coded using a video observation technique, where video cameras recorded the compound where the 10 macaques resided. We collected, coded and used 10 h of video data to construct these networks. Using the methods described here, we found in our data that 1 h of spatial proximity observations produced reliable social networks. However, this may not be true for other studies due to differences in data acquisition. Our results have broad implications for measuring and predicting the amount of error in any social network, regardless of species. PMID:26392632
Measuring disability: a systematic review of the validity and reliability of the Global Activity Limitations Indicator (GALI).

PubMed

Van Oyen, Herman; Bogaert, Petronille; Yokota, Renata T C; Berger, Nicolas

2018-01-01

GALI or Global Activity Limitation Indicator is a global survey instrument measuring participation restriction. GALI is the measure underlying the European indicator Healthy Life Years (HLY). Gali has a substantial policy use within the EU and its Member States. The objective of current paper is to bring together what is known from published manuscripts on the validity and the reliability of GALI. Following the PRISMA guidelines, two search strategies (PUBMED, Google Scholar) were combined to identify manuscripts published in English with publication date 2000 or beyond. Articles were classified as reliability studies, concurrent or predictive validity studies, in national or international populations. Four cross-sectional studies (of which 2 international) studied how GALI relates to other health measures (concurrent validity). A dose-response effect by GALI severity level on the association with the other health status measures was observed in the national studies. The 2 international studies (SHARE, EHIS) concluded that the odds of reporting participation restriction was higher in subjects with self-reported or observed functional limitations. In SHARE, the size of the Odds Ratio's (ORs) in the different countries was homogeneous, while in EHIS the size of the ORs varied more strongly. For the predictive validity, subjects were followed over time (4 studies of which one international). GALI proved, both in national and international data, to be a consistent predictor of future health outcomes both in terms of mortality and health care expenditure. As predictors of mortality, the two distinct health concepts, self-rated health and GALI, acted independently and complementary of each other. The one reliability study identified reported a sufficient reliability of GALI. GALI as inclusive one question instrument fits all conceptual characteristics specified for a global measure on participation restriction. In none of the studies, included in the review, there was evidence of a failing validity. The review shows that GALI has a good and sufficient concurrent and predictive validity, and reliability.
Reliability of a store observation tool in measuring availability of alcohol and selected foods.

PubMed

Cohen, Deborah A; Schoeff, Diane; Farley, Thomas A; Bluthenthal, Ricky; Scribner, Richard; Overton, Adrian

2007-11-01

Alcohol and food items can compromise or contribute to health, depending on the quantity and frequency with which they are consumed. How much people consume may be influenced by product availability and promotion in local retail stores. We developed and tested an observational tool to objectively measure in-store availability and promotion of alcoholic beverages and selected food items that have an impact on health. Trained observers visited 51 alcohol outlets in Los Angeles and southeastern Louisiana. Using a standardized instrument, two independent observations were conducted documenting the type of outlet, the availability and shelf space for alcoholic beverages and selected food items, the purchase price of standard brands, the placement of beer and malt liquor, and the amount of in-store alcohol advertising. Reliability of the instrument was excellent for measures of item availability, shelf space, and placement of malt liquor. Reliability was lower for alcohol advertising, beer placement, and items that measured the "least price" of apples and oranges. The average kappa was 0.87 for categorical items and the average intraclass correlation coefficient was 0.83 for continuous items. Overall, systematic observation of the availability and promotion of alcoholic beverages and food items was feasible, acceptable, and reliable. Measurement tools such as the one we evaluated should be useful in studies of the impact of availability of food and beverages on consumption and on health outcomes.
Concurrent validity and reliability of the Alberta Infant Motor Scale in premature infants.

PubMed

Almeida, Kênnea Martins; Dutra, Maria Virginia Peixoto; Mello, Rosane Reis de; Reis, Ana Beatriz Rodrigues; Martins, Priscila Silveira

2008-01-01

To verify the concurrent validity and interobserver reliability of the Alberta Infant Motor Scale (AIMS) in premature infants followed-up at the outpatient clinic of Instituto Fernandes Figueira, Fundação Oswaldo Cruz (IFF/Fiocruz), in Rio de Janeiro, Brazil. A total of 88 premature infants were enrolled at the follow-up clinic at IFF/Fiocruz, between February and December of 2006. For the concurrent validity study, 46 infants were assessed at either 6 (n = 26) or 12 (n = 20) months' corrected age using the AIMS and the second edition of the Bayley Scales of Infant Development, by two different observers, and applying Pearson's correlation coefficient to analyze the results. For the reliability study, 42 infants between 0 and 18 months were assessed using the Alberta Infant Motor Scale, by two different observers and the results analyzed using the intraclass correlation coefficient. The concurrent validity study found a high level of correlation between the two scales (r = 0.95) and one that was statistically significant (p < 0.01) for the entire population of infants, with higher values at 12 months (r = 0.89) than at 6 months (r = 0.74). The interobserver reliability study found satisfactory intraclass correlation coefficients at all ages tested, varying from 0.76 to 0.99. The AIMS is a valid and reliable instrument for the evaluation of motor development in high-risk infants within the Brazilian public health system.
Analysis Testing of Sociocultural Factors Influence on Human Reliability within Sociotechnical Systems: The Algerian Oil Companies.

PubMed

Laidoune, Abdelbaki; Rahal Gharbi, Med El Hadi

2016-09-01

The influence of sociocultural factors on human reliability within an open sociotechnical systems is highlighted. The design of such systems is enhanced by experience feedback. The study was focused on a survey related to the observation of working cases, and by processing of incident/accident statistics and semistructured interviews in the qualitative part. In order to consolidate the study approach, we considered a schedule for the purpose of standard statistical measurements. We tried to be unbiased by supporting an exhaustive list of all worker categories including age, sex, educational level, prescribed task, accountability level, etc. The survey was reinforced by a schedule distributed to 300 workers belonging to two oil companies. This schedule comprises 30 items related to six main factors that influence human reliability. Qualitative observations and schedule data processing had shown that the sociocultural factors can negatively and positively influence operator behaviors. The explored sociocultural factors influence the human reliability both in qualitative and quantitative manners. The proposed model shows how reliability can be enhanced by some measures such as experience feedback based on, for example, safety improvements, training, and information. With that is added the continuous systems improvements to improve sociocultural reality and to reduce negative behaviors.
Representing Geospatial Environment Observation Capability Information: A Case Study of Managing Flood Monitoring Sensors in the Jinsha River Basin

PubMed Central

Hu, Chuli; Guan, Qingfeng; Li, Jie; Wang, Ke; Chen, Nengcheng

2016-01-01

Sensor inquirers cannot understand comprehensive or accurate observation capability information because current observation capability modeling does not consider the union of multiple sensors nor the effect of geospatial environmental features on the observation capability of sensors. These limitations result in a failure to discover credible sensors or plan for their collaboration for environmental monitoring. The Geospatial Environmental Observation Capability (GEOC) is proposed in this study and can be used as an information basis for the reliable discovery and collaborative planning of multiple environmental sensors. A field-based GEOC (GEOCF) information representation model is built. Quintuple GEOCF feature components and two GEOCF operations are formulated based on the geospatial field conceptual framework. The proposed GEOCF markup language is used to formalize the proposed GEOCF. A prototype system called GEOCapabilityManager is developed, and a case study is conducted for flood observation in the lower reaches of the Jinsha River Basin. The applicability of the GEOCF is verified through the reliable discovery of flood monitoring sensors and planning for the collaboration of these sensors. PMID:27999247
Representing Geospatial Environment Observation Capability Information: A Case Study of Managing Flood Monitoring Sensors in the Jinsha River Basin.

PubMed

Hu, Chuli; Guan, Qingfeng; Li, Jie; Wang, Ke; Chen, Nengcheng

2016-12-16

Sensor inquirers cannot understand comprehensive or accurate observation capability information because current observation capability modeling does not consider the union of multiple sensors nor the effect of geospatial environmental features on the observation capability of sensors. These limitations result in a failure to discover credible sensors or plan for their collaboration for environmental monitoring. The Geospatial Environmental Observation Capability (GEOC) is proposed in this study and can be used as an information basis for the reliable discovery and collaborative planning of multiple environmental sensors. A field-based GEOC (GEOCF) information representation model is built. Quintuple GEOCF feature components and two GEOCF operations are formulated based on the geospatial field conceptual framework. The proposed GEOCF markup language is used to formalize the proposed GEOCF. A prototype system called GEOCapabilityManager is developed, and a case study is conducted for flood observation in the lower reaches of the Jinsha River Basin. The applicability of the GEOCF is verified through the reliable discovery of flood monitoring sensors and planning for the collaboration of these sensors.
Inter-observer reliability of DSM-5 substance use disorders.

PubMed

Denis, Cécile M; Gelernter, Joel; Hart, Amy B; Kranzler, Henry R

2015-08-01

Although studies have examined the impact of changes made in DSM-5 on the estimated prevalence of substance use disorder (SUD) diagnoses, there is limited evidence concerning the reliability of DSM-5 SUDs. We evaluated the inter-observer reliability of four DSM-5 SUDs in a sample in which we had previously evaluated the reliability of DSM-IV diagnoses, allowing us to compare the two systems. Two different interviewers each assessed 173 subjects over a 2-week period using the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA). Using the percent agreement and kappa (κ) coefficient, we examined the reliability of DSM-5 lifetime alcohol, opioid, cocaine, and cannabis use disorders, which we compared to that of SSADDA-derived DSM-IV SUD diagnoses. We also assessed the effect of additional lifetime SUD and lifetime mood or anxiety disorder diagnoses on the reliability of the DSM-5 SUD diagnoses. Reliability was good to excellent for the four disorders, with κ values ranging from 0.65 to 0.94. Agreement was consistently lower for SUDs of mild severity than for moderate or severe disorders. DSM-5 SUD diagnoses showed greater reliability than DSM-IV diagnoses of abuse or dependence or dependence only. Co-occurring SUD and lifetime mood or anxiety disorders exerted a modest effect on the reliability of the DSM-5 SUD diagnoses. For alcohol, opioid, cocaine and cannabis use disorders, DSM-5 criteria and diagnoses are at least as reliable as those of DSM-IV. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Reliability of Two Smartphone Applications for Radiographic Measurements of Hallux Valgus Angles.

PubMed

Mattos E Dinato, Mauro Cesar; Freitas, Marcio de Faria; Milano, Cristiano; Valloto, Elcio; Ninomiya, André Felipe; Pagnano, Rodrigo Gonçalves

The objective of the present study was to assess the reliability of 2 smartphone applications compared with the traditional goniometer technique for measurement of radiographic angles in hallux valgus and the time required for analysis with the different methods. The radiographs of 31 patients (52 feet) with a diagnosis of hallux valgus were analyzed. Four observers, 2 with >10 years' experience in foot and ankle surgery and 2 in-training surgeons, measured the hallux valgus angle and intermetatarsal angle using a manual goniometer technique and 2 smartphone applications (Hallux Angles and iPinPoint). The interobserver and intermethod reliability were estimated using intraclass correlation coefficients (ICCs), and the time required for measurement of the angles among the 3 methods was compared using the Friedman test. A very good or good interobserver reliability was found among the 4 observers measuring the hallux valgus angle and intermetatarsal angle using the goniometer (ICC 0.913 and 0.821, respectively) and iPinPoint (ICC 0.866 and 0.638, respectively). Using the Hallux Angles application, a very good interobserver reliability was found for measurements of the hallux valgus angle (ICC 0.962) and intermetatarsal angle (ICC 0.935) only among the more experienced observers. The time required for the measurements was significantly shorter for the measurements using both smartphone applications compared with the goniometer method. One smartphone application (iPinPoint) was reliable for measurements of the hallux valgus angles by either experienced or nonexperienced observers. The use of these tools might save time in the evaluation of radiographic angles in the hallux valgus. Copyright © 2016 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

Application of objective clinical human reliability analysis (OCHRA) in assessment of technical performance in laparoscopic rectal cancer surgery.

PubMed

Foster, J D; Miskovic, D; Allison, A S; Conti, J A; Ockrim, J; Cooper, E J; Hanna, G B; Francis, N K

2016-06-01

Laparoscopic rectal resection is technically challenging, with outcomes dependent upon technical performance. No robust objective assessment tool exists for laparoscopic rectal resection surgery. This study aimed to investigate the application of the objective clinical human reliability analysis (OCHRA) technique for assessing technical performance of laparoscopic rectal surgery and explore the validity and reliability of this technique. Laparoscopic rectal cancer resection operations were described in the format of a hierarchical task analysis. Potential technical errors were defined. The OCHRA technique was used to identify technical errors enacted in videos of twenty consecutive laparoscopic rectal cancer resection operations from a single site. The procedural task, spatial location, and circumstances of all identified errors were logged. Clinical validity was assessed through correlation with clinical outcomes; reliability was assessed by test-retest. A total of 335 execution errors identified, with a median 15 per operation. More errors were observed during pelvic tasks compared with abdominal tasks (p < 0.001). Within the pelvis, more errors were observed during dissection on the right side than the left (p = 0.03). Test-retest confirmed reliability (r = 0.97, p < 0.001). A significant correlation was observed between error frequency and mesorectal specimen quality (r s = 0.52, p = 0.02) and with blood loss (r s = 0.609, p = 0.004). OCHRA offers a valid and reliable method for evaluating technical performance of laparoscopic rectal surgery.
Reliability and Construct Validity of Limits of Stability Test in Adolescents Using a Portable Forceplate System.

PubMed

Alsalaheen, Bara; Haines, Jamie; Yorke, Amy; Broglio, Steven P

2015-12-01

To examine the reliability, convergent, and discriminant validity of the limits of stability (LOS) test to assess dynamic postural stability in adolescents using a portable forceplate system. Cross-sectional reliability observational study. School setting. Adolescents (N=36) completed all measures during the first session. To examine the reliability of the LOS test, a subset of 15 participants repeated the LOS test after 1 week. Not applicable. Outcome measurements included the LOS test, Balance Error Scoring System, Instrumented Balance Error Scoring System, and Modified Clinical Test for Sensory Interaction on Balance. A significant relation was observed among LOS composite scores (r=.36-.87, P<.05). However, no relation was observed between LOS and static balance outcome measurements. The reliability of the LOS composite scores ranged from moderate to good (intraclass correlation coefficient model 2,1=.73-.96). The results suggest that the LOS composite scores provide unique information about dynamic postural stability, and the LOS test completed at 100% of the theoretical limit appeared to be a reliable test of dynamic postural stability in adolescents. Clinicians should use dynamic balance measurement as part of their balance assessment and should not use static balance testing (eg, Balance Error Scoring System) to make inferences about dynamic balance, especially when balance assessment is used to determine rehabilitation outcomes, or when making return to play decisions after injury. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Observer reliability of the Gross Motor Performance Measure and the Quality of Upper Extremity Skills Test, based on video recordings.

PubMed

Sorsdahl, Anne Brit; Moe-Nilssen, Rolf; Strand, Liv Inger

2008-02-01

The aim of this study was to examine observer reliability of the Gross Motor Performance Measure (GMPM) and the Quality of Upper Extremity Skills Test (QUEST) based on video clips. The tests were administered to 26 children with cerebral palsy (CP; 14 males, 12 females; range 2-13y, mean 7y 6mo), 24 with spastic CP, and two with dyskinesia. Respectively, five, six, five, four, and six children were classified in Gross Motor Function Classification System Levels I to V; and four, nine, five, five, and three children were classified in Manual Ability Classification System levels I to V. The children's performances were recorded and edited. Two experienced paediatric physical therapists assessed the children from watching the video clips. Intraobserver and interobserver reliability values of the total scores were mostly high, intraclass correlation coefficient (ICC)(1,1) varying from 0.69 to 0.97 with only one coefficient below 0.89. The ICCs of subscores varied from 0.36 to 0.95, finding'Alignment'and'Weight shift'in GMPM and'Protective extension'in QUEST highly reliable. The subscores'Dissociated movements'in GMPM and QUEST, and'Grasp'in QUEST were the least reliable, and recommendations are made to increase reliability of these subscores. Video scoring was time consuming, but was found to offer many advantages; the possibility to review performance, to use special trained observers for scoring and less demanding assessment for the children.
The long-term reliability of static and dynamic quantitative sensory testing in healthy individuals.

PubMed

Marcuzzi, Anna; Wrigley, Paul J; Dean, Catherine M; Adams, Roger; Hush, Julia M

2017-07-01

Quantitative sensory tests (QSTs) have been increasingly used to investigate alterations in somatosensory function in a wide range of painful conditions. The interpretation of these findings is based on the assumption that the measures are stable and reproducible. To date, reliability of QST has been investigated for short test-retest intervals. The aim of this study was to investigate the long-term reliability of a multimodal QST assessment in healthy people, with testing conducted on 3 occasions over 4 months. Forty-two healthy people were enrolled in the study. Static and dynamic tests were performed, including cold and heat pain threshold (CPT, HPT), mechanical wind-up [wind-up ratio (WUR)], pressure pain threshold (PPT), 2-point discrimination (TPD), and conditioned pain modulation (CPM). Systematic bias, relative reliability and agreement were analysed using repeated measure analysis of variance, intraclass correlation coefficients (ICCs3,1) and SE of the measurement (SEM), respectively. Static QST (CPT, HPT, PPT, and TPD) showed good-to-excellent reliability (ICCs: 0.68-0.90). Dynamic QST (WUR and CPM) showed poor-to-good reliability (ICCs: 0.35-0.61). A significant linear decrease over time was observed for mechanical QST at the back (PPT and TPD) and for CPM (P < 0.01). Static QST were stable over a period of 4 months; however, a small systematic decrease over time has been observed for mechanical QST. Dynamic QST showed considerable variability over time; in particular, CPM using PPT as the test stimulus did not show adequate reliability, suggesting that this test paradigm may be less useful for monitoring individuals over time.
Accuracy of remote burn scar evaluation via live video-conferencing technology.

PubMed

Cai, Lawrence Z; Caceres, Maria; Dangol, Mohan Krishna; Nakarmi, Kiran; Rai, Shankar Man; Chang, James; Gibran, Nicole S; Pham, Tam N

2016-12-05

Telemedicine in outpatient burn care, particularly in burn scar management, may provide cost-effective care and comes highly rated by patients. However, an effective scar scale using both video and photographic elements has not been validated. The purpose of this study is to test the reliability of the Patient and Observer Scar Assessment Scale (POSAS) using live video-conferencing. A prospective study was conducted with individuals with healed burn scars in Kathmandu, Nepal. Three independent observers assessed 85 burn scars from 17 subjects, using the Observer portion to evaluate vascularity, pigmentation, thickness, relief, pliability, surface area, and overall opinion. The on-site observer was physically present with the subjects and used a live videoconferencing application to show the scars to two remote observers in the United States. Subjects used the Patient portion to evaluate the scar that they believed appeared the worst appearance and had the greatest impact on function. The single-rater reliability of the Observer scale was acceptable (ICC>0.70) in overall opinion, thickness, pliability, and surface area. The average-rater reliability for three observers was acceptable (ICC>0.70) for all parameters except for vascularity. When comparing Patients' and Observers' overall opinion scores, patients consistently reported worse opinion. Evaluation of burn scars using the Patient and Observer Scar Assessment Scale can be accurately performed via live videoconferencing and presents an opportunity to expand access to burn care to rural communities, particularly in low- and middle-income countries, where patients face significant access barriers to appropriate follow-up care. Copyright © 2016 Elsevier Ltd and ISBI. All rights reserved.
Precision of lumbar intervertebral measurements: does a computer-assisted technique improve reliability?

PubMed

Pearson, Adam M; Spratt, Kevin F; Genuario, James; McGough, William; Kosman, Katherine; Lurie, Jon; Sengupta, Dilip K

2011-04-01

Comparison of intra- and interobserver reliability of digitized manual and computer-assisted intervertebral motion measurements and classification of "instability." To determine if computer-assisted measurement of lumbar intervertebral motion on flexion-extension radiographs improves reliability compared with digitized manual measurements. Many studies have questioned the reliability of manual intervertebral measurements, although few have compared the reliability of computer-assisted and manual measurements on lumbar flexion-extension radiographs. Intervertebral rotation, anterior-posterior (AP) translation, and change in anterior and posterior disc height were measured with a digitized manual technique by three physicians and by three other observers using computer-assisted quantitative motion analysis (QMA) software. Each observer measured 30 sets of digital flexion-extension radiographs (L1-S1) twice. Shrout-Fleiss intraclass correlation coefficients for intra- and interobserver reliabilities were computed. The stability of each level was also classified (instability defined as >4 mm AP translation or 10° rotation), and the intra- and interobserver reliabilities of the two methods were compared using adjusted percent agreement (APA). Intraobserver reliability intraclass correlation coefficients were substantially higher for the QMA technique THAN the digitized manual technique across all measurements: rotation 0.997 versus 0.870, AP translation 0.959 versus 0.557, change in anterior disc height 0.962 versus 0.770, and change in posterior disc height 0.951 versus 0.283. The same pattern was observed for interobserver reliability (rotation 0.962 vs. 0.693, AP translation 0.862 vs. 0.151, change in anterior disc height 0.862 vs. 0.373, and change in posterior disc height 0.730 vs. 0.300). The QMA technique was also more reliable for the classification of "instability." Intraobserver APAs ranged from 87 to 97% for QMA versus 60% to 73% for digitized manual measurements, while interobserver APAs ranged from 91% to 96% for QMA versus 57% to 63% for digitized manual measurements. The use of QMA software substantially improved the reliability of lumbar intervertebral measurements and the classification of instability based on flexion-extension radiographs.
A method for recording verbal behavior in free-play settings1

PubMed Central

Nordquist, Vey M.

1971-01-01

The present study attempted to test the reliability of a new method of recording verbal behavior in a free-play preschool setting. Six children, three normal and three speech impaired, served as subjects. Videotaped records of verbal behavior were scored by two experimentally naive observers. The results suggest that the system provides a means of obtaining reliable records of both normal and impaired speech, even when the subjects exhibit nonverbal behaviors (such as hyperactivity) that interfere with direct observation techniques. ImagesFig. 1Fig. 2 PMID:16795310
The FLIR ONE thermal imager for the assessment of burn wounds: Reliability and validity study.

PubMed

Jaspers, M E H; Carrière, M E; Meij-de Vries, A; Klaessens, J H G M; van Zuijlen, P P M

2017-11-01

Objective measurement tools may be of great value to provide early and reliable burn wound assessment. Thermal imaging is an easy, accessible and objective technique, which measures skin temperature as an indicator of tissue perfusion. These thermal images might be helpful in the assessment of burn wounds. However, before implementation of a novel measurement tool into clinical practice is considered, it is appropriate to test its clinimetric properties (i.e. reliability and validity). The objective of this study was to assess the reliability and validity of the recently introduced FLIR ONE thermal imager. Two observers obtained thermal images of burn wounds in adult patients at day 1-3, 4-7 and 8-10 after burn. Subsequently, temperature differences between the burn wound and healthy skin (ΔT) were calculated on an iPad mini containing the FLIR Tools app. To assess reliability, ΔT values of both observers were compared by calculating the intraclass correlation coefficient (ICC) and measurement error parameters. To assess validity, the ΔT values of the first observer were compared to the registered healing time of the burn wounds, which was specified into three categories: (I) ≤14 days, (II) 15-21 days and (III) >21 days. The ability of the FLIR ONE to discriminate between healing ≤21 days and >21 days was evaluated by means of a receiver operating characteristic curve and an optimal ΔT cut-off value. Reliability: ICCs were 0.99 for each time point, indicating excellent reliability up to 10 days after burn. The standard error of measurement varied between 0.17-0.22°C. the area under the curve was calculated at 0.69 (95% CI 0.54-0.84). A cut-off value of -1.15°C shows a moderate discrimination between burn wound healing ≤21 days and >21 days (46% sensitivity; 82% specificity). Our results show that the FLIR ONE thermal imager is highly reliable, but the moderate validity calls for additional research. However, the FLIR ONE is pre-eminently feasible, allowing easy and fast measurements in clinical burn practice. Copyright © 2017 Elsevier Ltd and ISBI. All rights reserved.
Disease severity assessment in epidemiological studies: accuracy and reliability of visual estimates of Septoria leaf blotch (SLB) in winter wheat

USDA-ARS?s Scientific Manuscript database

The accuracy and reliability of visual assessments of SLB severity by raters (i.e. one plant pathologist with extensive experience and three other raters trained prior to field observations using standard area diagrams and DISTRAIN) was determined by comparison with assumed actual values obtained by...
Influences of Response Rate and Distribution on the Calculation of Interobserver Reliability Scores

ERIC Educational Resources Information Center

Rolider, Natalie U.; Iwata, Brian A.; Bullock, Christopher E.

2012-01-01

We examined the effects of several variations in response rate on the calculation of total, interval, exact-agreement, and proportional reliability indices. Trained observers recorded computer-generated data that appeared on a computer screen. In Study 1, target responses occurred at low, moderate, and high rates during separate sessions so that…
A Comprehensive Observational Coding Scheme for Analyzing Instrumental, Affective, and Relational Communication in Health Care Contexts

PubMed Central

SIMINOFF, LAURA A.; STEP, MARY M.

2011-01-01

Many observational coding schemes have been offered to measure communication in health care settings. These schemes fall short of capturing multiple functions of communication among providers, patients, and other participants. After a brief review of observational communication coding, the authors present a comprehensive scheme for coding communication that is (a) grounded in communication theory, (b) accounts for instrumental and relational communication, and (c) captures important contextual features with tailored coding templates: the Siminoff Communication Content & Affect Program (SCCAP). To test SCCAP reliability and validity, the authors coded data from two communication studies. The SCCAP provided reliable measurement of communication variables including tailored content areas and observer ratings of speaker immediacy, affiliation, confirmation, and disconfirmation behaviors. PMID:21213170
Reliability of the Robinson classification for displaced comminuted midshaft clavicular fractures.

PubMed

Stegeman, Sylvia A; Fernandes, Nicole C; Krijnen, Pieta; Schipper, Inger B

2015-01-01

This study aimed to assess the reliability of the Robinson classification for displaced comminuted midshaft fractures. A total of 102 surgeons and 52 radiologists classified 15 displaced comminuted midshaft clavicular fractures on anteroposterior (AP) and 30-degree caudocephalad radiographs twice. For both surgeons and radiologists, inter-observer and intra-observer agreement significantly improved after showing the 30-degree caudocephalad view in addition to the AP view. Radiologists had significantly higher inter- and intra-observer agreement than surgeons after judging both radiographs (κmultirater of 0.81 vs. 0.56; κintra-observer of 0.73 vs. 0.44). We advise to use two-plane radiography and to routinely incorporate the Robinson classification in the radiology reports. Copyright © 2015 Elsevier Inc. All rights reserved.
Validation of the NOSCA - nurses' observation scale of cognitive abilities.

PubMed

Persoon, Anke; Schoonhoven, Lisette; Melis, Rene J F; van Achterberg, Theo; Kessels, Roy P C; Rikkert, Marcel G M Olde

2012-11-01

To examine the psychometric properties of the Nurses' Observation Scale for Cognitive Abilities. Nurses' Observation Scale for Cognitive Abilities is a behavioural rating scale comprising eight subscales that represent different cognitive domains. It is based on observations during contact between nurse and patient. Observational study. A total of 50 patients from two geriatric wards in acute care hospitals participated in this study. Reliability was examined via internal consistency and inter-rater reliability. Construct validity of the Nurses' Observation Scale for Cognitive Abilities and its subscales were explored by means of convergent and divergent validity and post hoc analyses for group differences. Cronbach's αs of the total Nurses' Observation Scale for Cognitive Abilities and its subscales were 0·98 and 0·66-0·93, respectively. The item-total correlations were satisfactory (overall > 0·4). The intra-class coefficients were good (37 of 39 items > 0·4). The convergent validity of the Nurses' Observation Scale for Cognitive Abilities against cognitive ratings (MMSE, NOSGER) and severity of dementia (Clinical Dementia Rating) demonstrated satisfactory correlations (0·59-0·70, p < 0·01), except for IQCODE (0·30, p > 0·05). The divergent validity of the Nurses' Observation Scale for Cognitive Abilities against depressive symptoms was low (0·12, p > 0·05). The construct validity of the Nurses' Observation Scale for Cognitive Abilities subscales against 13 specific neuropsychological tests showed correlations varying from poor to fair (0·18-0·74; 10 of 13 correlations p < 0·05). Validity and reliability of the total Nurses' Observation Scale for Cognitive Abilities are excellent. The correlations between the Nurses' Observation Scale for Cognitive Abilities subscales and standard neuropsychological tests were moderate. More conclusive results may be found if the Nurses' Observation Scale for Cognitive Abilities subscales were to be validated using more ecologically valid tests and in a patient population with less cognitive impairment. Use of the Nurses' Observation Scale for Cognitive Abilities yields standardised, reliable and valid information about patient's cognitive behaviour in daily practice. The Nurses' Observation Scale for Cognitive Abilities aids in tailoring nursing interventions to patients' specific cognitive needs. We advocate the implementation of the Nurses' Observation Scale for Cognitive Abilities both in research and at geriatric units in acute care hospitals. © 2012 Blackwell Publishing Ltd.
Visual judgements of steadiness in one-legged stance: reliability and validity.

PubMed

Haupstein, T; Goldie, P

2000-01-01

There is a paucity of information about the validity and reliability of clinicians' visual judgements of steadiness in one-legged stance. Such judgements are used frequently in clinical practice to support decisions about treatment in the fields of neurology, sports medicine, paediatrics and orthopaedics. The aim of the present study was to address the validity and reliability of visual judgements of steadiness in one-legged stance in a group of physiotherapists. A videotape of 20 five-second performances was shown to 14 physiotherapists with median clinical experience of 6.75 years. Validity of visual judgement was established by correlating scores obtained from an 11-point rating scale with criterion scores obtained from a force platform. In addition, partial correlations were used to control for the potential influence of body weight on the relationship between the visual judgements and criterion scores. Inter-observer reliability was quantified between the physiotherapists; intra-observer reliability was quantified between two tests four weeks apart. Mean criterion-related validity was high, regardless of whether body weight was controlled for statistically (Pearson's r = 0.84, 0.83, respectively). The standard error of estimating the criterion score was 3.3 newtons. Inter-observer reliability was high (ICC (2,1) = 0.81 at Test 1 and 0.82 at Test 2). Intra-observer reliability was high (on average ICC (2,1) = 0.88; Pearson's r = 0.90). The standard error of measurement for the 11-point scale was one unit. The finding of higher accuracy of making visual judgements than previously reported may be due to several aspects of design: use of a criterion score derived from the variability of the force signal which is more discriminating than variability of centre of pressure; use of a discriminating visual rating scale; specificity and clear definition of the phenomenon to be rated.
Seeking high reliability in primary care: Leadership, tools, and organization.

PubMed

Weaver, Robert R

2015-01-01

Leaders in health care increasingly recognize that improving health care quality and safety requires developing an organizational culture that fosters high reliability and continuous process improvement. For various reasons, a reliability-seeking culture is lacking in most health care settings. Developing a reliability-seeking culture requires leaders' sustained commitment to reliability principles using key mechanisms to embed those principles widely in the organization. The aim of this study was to examine how key mechanisms used by a primary care practice (PCP) might foster a reliability-seeking, system-oriented organizational culture. A case study approach was used to investigate the PCP's reliability culture. The study examined four cultural artifacts used to embed reliability-seeking principles across the organization: leadership statements, decision support tools, and two organizational processes. To decipher their effects on reliability, the study relied on observations of work patterns and the tools' use, interactions during morning huddles and process improvement meetings, interviews with clinical and office staff, and a "collective mindfulness" questionnaire. The five reliability principles framed the data analysis. Leadership statements articulated principles that oriented the PCP toward a reliability-seeking culture of care. Reliability principles became embedded in the everyday discourse and actions through the use of "problem knowledge coupler" decision support tools and daily "huddles." Practitioners and staff were encouraged to report unexpected events or close calls that arose and which often initiated a formal "process change" used to adjust routines and prevent adverse events from recurring. Activities that foster reliable patient care became part of the taken-for-granted routine at the PCP. The analysis illustrates the role leadership, tools, and organizational processes play in developing and embedding a reliable-seeking culture across an organization. Progress toward a reliability-seeking, system-oriented approach to care remains ongoing, and movement in that direction requires deliberate and sustained effort by committed leaders in health care.
A systematic review of reliable and valid tools for the measurement of patient participation in healthcare.

PubMed

Phillips, Nicole Margaret; Street, Maryann; Haesler, Emily

2016-02-01

Patient participation in healthcare is recognised internationally as essential for consumer-centric, high-quality healthcare delivery. Its measurement as part of continuous quality improvement requires development of agreed standards and measurable indicators. This systematic review sought to identify strategies to measure patient participation in healthcare and to report their reliability and validity. In the context of this review, patient participation was constructed as shared decision-making, acknowledging the patient as having critical knowledge regarding their own health and care needs and promoting self-care/autonomy. Following a comprehensive search, studies reporting reliability or validity of an instrument used in a healthcare setting to measure patient participation, published in English between January 2004 and March 2014 were eligible for inclusion. From an initial search, which identified 1582 studies, 156 studies were retrieved and screened against inclusion criteria. Thirty-three studies reporting 24 patient participation measurement tools met inclusion criteria, and were critically appraised. The majority of studies were descriptive psychometric studies using prospective, cross-sectional designs. Almost all the tools completed by patients, family caregivers, observers or more than one stakeholder focused on aspects of patient-professional communication. Few tools designed for completion by patients or family caregivers provided valid and reliable measures of patient participation. There was low correlation between many of the tools and other measures of patient satisfaction. Few reliable and valid tools for measurement of patient participation in healthcare have been recently developed. Of those reported in this review, the dyadic Observing Patient Involvement in Decision Making (dyadic-OPTION) tool presents the most promise for measuring core components of patient participation. There remains a need for further study into valid, reliable and feasible strategies for measuring patient participation as part of continuous quality improvement. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
The development and reliability of a simple field based screening tool to assess core stability in athletes.

PubMed

O'Connor, S; McCaffrey, N; Whyte, E; Moran, K

2016-07-01

To adapt the trunk stability test to facilitate further sub-classification of higher levels of core stability in athletes for use as a screening tool. To establish the inter-tester and intra-tester reliability of this adapted core stability test. Reliability study. Collegiate athletic therapy facilities. Fifteen physically active male subjects (19.46 ± 0.63) free from any orthopaedic or neurological disorders were recruited from a convenience sample of collegiate students. The intraclass correlation coefficients (ICC) and 95% Confidence Intervals (CI) were computed to establish inter-tester and intra-tester reliability. Excellent ICC values were observed in the adapted core stability test for inter-tester reliability (0.97) and good to excellent intra-tester reliability (0.73-0.90). While the 95% CI were narrow for inter-tester reliability, Tester A and C 95% CI's were widely distributed compared to Tester B. The adapted core stability test developed in this study is a quick and simple field based test to administer that can further subdivide athletes with high levels of core stability. The test demonstrated high inter-tester and intra-tester reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Validation of an Instructional Observation Instrument for Teaching English as a Foreign Language in Spain

ERIC Educational Resources Information Center

Gomez-Garcia, Maria

2011-01-01

The design and validation of a classroom observation instrument to provide formative feedback for teachers of EFL in Spain is the overarching purpose of this study. This study proposes that a valid and reliable classroom observation instrument, based on effective practice in teaching EFL, can be developed and used in Spain to enable teachers to…
Intra-and inter-observer reliability of nailfold videocapillaroscopy - A possible outcome measure for systemic sclerosis-related microangiopathy.

PubMed

Dinsdale, Graham; Moore, Tonia; O'Leary, Neil; Tresadern, Philip; Berks, Michael; Roberts, Christopher; Manning, Joanne; Allen, John; Anderson, Marina; Cutolo, Maurizio; Hesselstrand, Roger; Howell, Kevin; Pizzorni, Carmen; Smith, Vanessa; Sulli, Alberto; Wildt, Marie; Taylor, Christopher; Murray, Andrea; Herrick, Ariane L

2017-07-01

Our aim was to assess the reliability of nailfold capillary assessment in terms of image evaluability, image severity grade ('normal', 'early', 'active', 'late'), capillary density, capillary (apex) width, and presence of giant capillaries, and also to gain further insight into differences in these parameters between patients with systemic sclerosis (SSc), patients with primary Raynaud's phenomenon (PRP) and healthy control subjects. Videocapillaroscopy images (magnification 300×) were acquired from all 10 digits from 173 participants: 101 patients with SSc, 22 with PRP and 50 healthy controls. Ten capillaroscopy experts from 7 European centres evaluated the images. Custom image mark-up software allowed extraction of the following outcome measures: overall grade ('normal', 'early', 'active', 'late', 'non-specific', or 'ungradeable'), capillary density (vessels/mm), mean vessel apical width, and presence of giant capillaries. Observers analysed a median of 129 images each. Evaluability (i.e. the availability of measures) varied across outcome measures (e.g. 73.0% for density and 46.2% for overall grade in patients with SSc). Intra-observer reliability for evaluability was consistently higher than inter- (e.g. for density, intra-class correlation coefficient [ICC] was 0.71 within and 0.14 between observers). Conditional on evaluability, both intra- and inter-observer reliability were high for grade (ICC 0.93 and 0.78 respectively), density (0.91 and 0.64) and width (0.91 and 0.85). Evaluability is one of the major challenges in assessing nailfold capillaries. However, when images are evaluable, the high intra- and inter-reliabilities suggest that overall image grade, capillary density and apex width have potential as outcome measures in longitudinal studies. Copyright © 2017 Elsevier Inc. All rights reserved.
A Reliable, Feasible Method to Observe Neighborhoods at High Spatial Resolution

PubMed Central

Kepper, Maura M.; Sothern, Melinda S.; Theall, Katherine P.; Griffiths, Lauren A.; Scribner, Richard; Tseng, Tung-Sung; Schaettle, Paul; Cwik, Jessica M.; Felker-Kantor, Erica; Broyles, Stephanie T.

2016-01-01

Introduction Systematic social observation (SSO) methods traditionally measure neighborhoods at street level and have been performed reliably using virtual applications to increase feasibility. Research indicates that collection at even higher spatial resolution may better elucidate the health impact of neighborhood factors, but whether virtual applications can reliably capture social determinants of health at the smallest geographic resolution (parcel level) remains uncertain. This paper presents a novel, parcel-level SSO methodology and assesses whether this new method can be collected reliably using Google Street View and is feasible. Methods Multiple raters (N=5) observed 42 neighborhoods. In 2016, inter-rater reliability (observed agreement and kappa coefficient) was compared for four SSO methods: (1) street-level in person; (2) street-level virtual; (3) parcel-level in person; and (4) parcel-level virtual. Intra-rater reliability (observed agreement and kappa coefficient) was calculated to determine whether parcel-level methods produce results comparable to traditional street-level observation. Results Substantial levels of inter-rater agreement were documented across all four methods; all methods had >70% of items with at least substantial agreement. Only physical decay showed higher levels of agreement (83% of items with >75% agreement) for direct versus virtual rating source. Intra-rater agreement comparing street- versus parcel-level methods resulted in observed agreement >75% for all but one item (90%). Conclusions Results support the use of Google Street View as a reliable, feasible tool for performing SSO at the smallest geographic resolution. Validation of a new parcel-level method collected virtually may improve the assessment of social determinants contributing to disparities in health behaviors and outcomes. PMID:27989289

Development and reliability of an observation method to assess food intake of young children in child care.

PubMed

Ball, Sarah C; Benjamin, Sara E; Ward, Dianne S

2007-04-01

To our knowledge, a direct observation protocol for assessing dietary intake among young children in child care has not been published. This article reviews the development and testing of a diet observation system for child care facilities that occurred during a larger intervention trial. Development of this system was divided into five phases, done in conjunction with a larger intervention study; (a) protocol development, (b) training of field staff, (c) certification of field staff in a laboratory setting, (d) implementation in a child-care setting, and (e) certification of field staff in a child-care setting. During the certification phases, methods were used to assess the accuracy and reliability of all observers at estimating types and amounts of food and beverages commonly served in child care. Tests of agreement show strong agreement among five observers, as well as strong accuracy between the observers and 20 measured portions of foods and beverages with a mean intraclass correlation coefficient value of 0.99. This structured observation system shows promise as a valid and reliable approach for assessing dietary intake of children in child care and makes a valuable contribution to the growing body of literature on the dietary assessment of young children.
Gait Deviation Index, Gait Profile Score and Gait Variable Score in children with spastic cerebral palsy: Intra-rater reliability and agreement across two repeated sessions.

PubMed

Rasmussen, Helle Mätzke; Nielsen, Dennis Brandborg; Pedersen, Niels Wisbech; Overgaard, Søren; Holsgaard-Larsen, Anders

2015-07-01

The Gait Deviation Index (GDI) and Gait Profile Score (GPS) are the most used summary measures of gait in children with cerebral palsy (CP). However, the reliability and agreement of these indices have not been investigated, limiting their clinimetric quality for research and clinical practice. The aim of this study was to investigate the intra-rater reliability and agreement of summary measures of gait (GDI; GPS; and the Gait Variable Score (GVS) derived from the GPS). The intra-rater reliability and agreement were investigated across two repeated sessions in 18 children aged 5-12 years diagnosed with spastic CP. No systematic bias was observed between the sessions and no heteroscedasticity was observed in Bland-Altman plots. For the GDI and GPS, excellent reliability with intraclass correlation coefficient (ICC) values of 0.8-0.9 was found, while the GVS was found to have fair to good reliability with ICCs of 0.4-0.7. The agreement for the GDI and the logarithmically transformed GPS, in terms of the standard error of measurement as a percentage of the grand mean (SEM%) varied from 4.1 to 6.7%, whilst the smallest detectable change in percent (SDC%) ranged from 11.3 to 18.5%. For the logarithmically transformed GVS, we found a fair to large variation in SEM% from 7 to 29% and in SDC% from 18 to 81%. The GDI and GPS demonstrated excellent reliability and acceptable agreement proving that they can both be used in research and clinical practice. However, the observed large variability for some of the GVS requires cautious consideration when selecting outcome measures. Copyright © 2015 Elsevier B.V. All rights reserved.
Evaluation of the reliability of maize reference assays for GMO quantification.

PubMed

Papazova, Nina; Zhang, David; Gruden, Kristina; Vojvoda, Jana; Yang, Litao; Buh Gasparic, Meti; Blejec, Andrej; Fouilloux, Stephane; De Loose, Marc; Taverniers, Isabel

2010-03-01

A reliable PCR reference assay for relative genetically modified organism (GMO) quantification must be specific for the target taxon and amplify uniformly along the commercialised varieties within the considered taxon. Different reference assays for maize (Zea mays L.) are used in official methods for GMO quantification. In this study, we evaluated the reliability of eight existing maize reference assays, four of which are used in combination with an event-specific polymerase chain reaction (PCR) assay validated and published by the Community Reference Laboratory (CRL). We analysed the nucleotide sequence variation in the target genomic regions in a broad range of transgenic and conventional varieties and lines: MON 810 varieties cultivated in Spain and conventional varieties from various geographical origins and breeding history. In addition, the reliability of the assays was evaluated based on their PCR amplification performance. A single base pair substitution, corresponding to a single nucleotide polymorphism (SNP) reported in an earlier study, was observed in the forward primer of one of the studied alcohol dehydrogenase 1 (Adh1) (70) assays in a large number of varieties. The SNP presence is consistent with a poor PCR performance observed for this assay along the tested varieties. The obtained data show that the Adh1 (70) assay used in the official CRL NK603 assay is unreliable. Based on our results from both the nucleotide stability study and the PCR performance test, we can conclude that the Adh1 (136) reference assay (T25 and Bt11 assays) as well as the tested high mobility group protein gene assay, which also form parts of CRL methods for quantification, are highly reliable. Despite the observed uniformity in the nucleotide sequence of the invertase gene assay, the PCR performance test reveals that this target sequence might occur in more than one copy. Finally, although currently not forming a part of official quantification methods, zein and SSIIb assays are found to be highly reliable in terms of nucleotide stability and PCR performance and are proposed as good alternative targets for a reference assay for maize.
Psychometric Characteristics of Process Evaluation Measures for a Rural School-based Childhood Obesity Prevention Study: Louisiana Health

PubMed Central

Newton, R. L.; Thomson, J. L.; Rau, K.; Duhe’, S.; Sample, A.; Singleton, N.; Anton, S. D.; Webber, L. S.; Williamson, D. A.

2011-01-01

Purpose To evaluate the implementation of intervention components of the Louisiana Health study, which was a multi-component childhood obesity prevention program conducted in rural schools. Design Content analysis. Setting Process evaluation assessed implementation in the classrooms, gym classes, and cafeterias. Subjects Classroom teachers (n = 232), physical education teachers (n = 53), food service managers (n = 33), and trained observers (n = 9). Measures Five process evaluation measures were created: Physical Education Questionnaire (PEQ), Intervention Questionnaire (IQ), Food Service Manager Questionnaire (FSMQ), Classroom Observation (CO) and School Nutrition Environment Observation (SNEO). Analysis Inter-rater reliability and internal consistency were conducted on all measures. ANOVA and Chi-square were used to compare differences across study groups on questionnaires and observations. Results The PEQ and one sub-scale from the FSMQ were eliminated because their reliability coefficients fell below acceptable standards. The sub-scale internal consistencies for the IQ, FSMQ, CO, and SNEO (all Cronbach’s α > .60) were acceptable. Conclusions After the initial 4 months of intervention, there was evidence that the Louisiana Health intervention was being implemented as it was designed. In summary, four process evaluation measures were found to be sufficiently reliable and valid for assessing the delivery of various aspects of a school-based obesity prevention program. These process measures could be modified to evaluate the delivery of other similar school-based interventions. PMID:21721969
Psychometric characteristics of process evaluation measures for a rural school-based childhood obesity prevention study: Louisiana Health.

PubMed

Newton, Robert L; Thomson, Jessica L; Rau, Kristi K; Ragusa, Shelly A; Sample, Alicia D; Singleton, Nakisha N; Anton, Stephen D; Webber, Larry S; Williamson, Donald A

2011-01-01

To evaluate the implementation of intervention components of the Louisiana Health study, which was a multicomponent childhood obesity prevention program conducted in rural schools. Content analysis. Process evaluation assessed implementation in classrooms, gym classes, and cafeterias. Classroom teachers (n = 232), physical education teachers (n = 53), food service managers (n = 33), and trained observers (n = 9). Five process evaluation measures were created: Physical Education Questionnaire (PEQ), Intervention Questionnaire (IQ), Food Service Manager Questionnaire (FSMQ), Classroom Observation (CO), and School Nutrition Environment Observation (SNEO). Interrater reliability and internal consistency were assessed on all measures. Analysis of variance and χ(2) were used to compare differences across study groups on questionnaires and observations. The PEQ and one subscale from the FSMQ were eliminated because their reliability coefficients fell below acceptable standards. The subscale internal consistencies for the IQ, FSMQ, CO, and SNEO (all Cronbach α > .60) were acceptable. After the initial 4 months of intervention, there was evidence that the Louisiana Health intervention was being implemented as it was designed. In summary, four process evaluation measures were found to be sufficiently reliable and valid for assessing the delivery of various aspects of a school-based obesity prevention program. These process measures could be modified to evaluate the delivery of other similar school-based interventions.
Learning about Teachers' Literacy Instruction from Classroom Observations

ERIC Educational Resources Information Center

Kelcey, Ben; Carlisle, Joanne F.

2013-01-01

The purpose of this study is to contribute to efforts to improve methods for gathering and analyzing data from classroom observations in early literacy. The methodological approach addresses current problems of reliability and validity of classroom observations by taking into account differences in teachers' uses of instructional actions (e.g.,…
The use of video clips in teleconsultation for preschool children with movement disorders.

PubMed

Gorter, Hetty; Lucas, Cees; Groothuis-Oudshoorn, Karin; Maathuis, Carel; van Wijlen-Hempel, Rietje; Elvers, Hans

2013-01-01

To investigate the reliability and validity of video clips in assessing movement disorders in preschool children. The study group included 27 children with neuromotor concerns. The explorative validity group included children with motor problems (n = 21) or with typical development (n = 9). Hempel screening was used for live observation of the child, full recording, and short video clips. The explorative study tested the validity of the clinical classifications "typical" or "suspect." Agreement between live observation and the full recording was almost perfect; Agreement for the clinical classification "typical" or "suspect" was substantial. Agreement between the full recording and short video clips was substantial to moderate. The explorative validity study, based on short video clips and the presence of a neuromotor developmental disorder, showed substantial agreement. Hempel screening enables reliable and valid observation of video clips, but further research is necessary to demonstrate the predictive value.
Validity and Reliability of the 8-Item Work Limitations Questionnaire.

PubMed

Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

2017-12-01

Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.
Reliability of mercury-in-silastic strain gauge plethysmography curve reading: influence of clinical clues and observer variation.

PubMed

Høyer, Christian; Pavar, Susanne; Pedersen, Begitte H; Biurrun Manresa, José A; Petersen, Lars J

2013-08-01

Mercury-in-silastic strain gauge pletysmography (SGP) is a well-established technique for blood flow and blood pressure measurements. The aim of this study was to examine (i) the possible influence of clinical clues, e.g. the presence of wounds and color changes during blood pressure measurements, and (ii) intra- and inter-observer variation of curve interpretation for segmental blood pressure measurements. A total of 204 patients with known or suspected peripheral arterial disease (PAD) were included in a diagnostic accuracy trial. Toe and ankle pressures were measured in both limbs, and primary observers analyzed a total of 804 pressure curve sets. The SGP curves were later reanalyzed separately by two observers blinded to clinical clues. Intra- and inter-observer agreement was quantified using Cohen's kappa and reliability was quantified using intra-class correlation coefficients, coefficients of variance, and Bland-Altman analysis. There was an overall agreement regarding patient diagnostic classification (PAD/not PAD) in 202/204 (99.0%) for intra-observer (κ = 0.969, p < 0.001), and 201/204 (98.5%) for inter-observer readings (κ = 0.953, p < 0.001). Reliability analysis showed excellent correlation between blinded versus non-blinded and inter-observer readings for determination of absolute segmental pressures (all intraclass correlation coefficients ≥ 0.984). The coefficient of variance for determination of absolute segmental blood pressure ranged from 2.9-3.4% for blinded/non-blinded data and from 3.8-5.0% for inter-observer data. This study shows a low inter-observer variation among experienced laboratory technicians for reading strain gauge curves. The low variation between blinded/non-blinded readings indicates that SGP measurements are minimally biased by clinical clues.
Interrater reliability of videotaped observational gait-analysis assessments.

PubMed

Eastlack, M E; Arvidson, J; Snyder-Mackler, L; Danoff, J V; McGarvey, C L

1991-06-01

The purpose of this study was to determine the interrater reliability of videotaped observational gait-analysis (VOGA) assessments. Fifty-four licensed physical therapists with varying amounts of clinical experience served as raters. Three patients with rheumatoid arthritis who demonstrated an abnormal gait pattern served as subjects for the videotape. The raters analyzed each patient's most severely involved knee during the four subphases of stance for the kinematic variables of knee flexion and genu valgum. Raters were asked to determine whether these variables were inadequate, normal, or excessive. The temporospatial variables analyzed throughout the entire gait cycle were cadence, step length, stride length, stance time, and step width. Generalized kappa coefficients ranged from .11 to .52. Intraclass correlation coefficients (2,1) and (3,1) were slightly higher. Our results indicate that physical therapists' VOGA assessments are only slightly to moderately reliable and that improved interrater reliability of the assessments of physical therapists utilizing this technique is needed. Our data suggest that there is a need for greater standardization of gait-analysis training.
Validity and reliability of the Greek version of the xerostomia questionnaire in head and neck cancer patients.

PubMed

Memtsa, Pinelopi Theopisti; Tolia, Maria; Tzitzikas, Ioannis; Bizakis, Ioannis; Pistevou-Gombaki, Kyriaki; Charalambidou, Martha; Iliopoulou, Chrysoula; Kyrgias, George

2017-03-01

Xerostomia after radiation therapy for head and neck (H&N) cancer has serious effects on patients' quality of life. The purpose of this study was to validate the Greek version of the self-reported eight-item xerostomia questionnaire (XQ) in patients treated with radiotherapy for H&N cancer. The XQ was translated into Greek and administered to 100 XQ patients. An exploratory factor analysis was performed. Reliability measures were calculated. Several types of validity were evaluated. The observer-rated scoring system was also used. The mean XQ value was 41.92 (SD 22.71). Factor analysis revealed the unidimensional nature of the questionnaire. High reliability measures (ICC, Cronbach's α, Pearson coefficients) were obtained. Patients differed statistically significantly in terms of XQ score, depending on the RTOG/EORTC classification. The Greek version of XQ is valid and reliable. Its score is well related to observer's findings and it can be used to evaluate the impact of radiation therapy on the subjective feeling of xerostomia.
Anatomical landmark position--can we trust what we see? Results from an online reliability and validity study of osteopaths.

PubMed

Pattyn, Elise; Rajendran, Dévan

2014-04-01

Practitioners traditionally use observation to classify the position of patients' anatomical landmarks. This information may contribute to diagnosis and patient management. To calculate a) Inter-rater reliability of categorising the sagittal plane position of four anatomical landmarks (lateral femoral epicondyle, greater trochanter, mastoid process and acromion) on side-view photographs (with landmarks highlighted and not-highlighted) of anonymised subjects; b) Intra-rater reliability; c) Individual landmark inter-rater reliability; d) Validity against a 'gold standard' photograph. Online inter- and intra-rater reliability study. Photographed subjects: convenience sample of asymptomatic students; raters: randomly selected UK registered osteopaths. 40 photographs of 30 subjects were used, a priori clinically acceptable reliability was ≥0.4. Inter-rater arm: 20 photographs without landmark highlights plus 10 with highlights; Intra-rater arm: 10 duplicate photographs (non-highlighted landmarks). Validity arm: highlighted landmark scores versus 'gold standard' photographs with vertical line. Research ethics approval obtained. Osteopaths (n = 48) categorised landmark position relative to imagined vertical-line; Gwet's Agreement Coefficient 1 (AC1) calculated and chance-corrected coefficient benchmarked against Landis and Koch's scale; Validity calculation used Kendall's tau-B. Inter-rater reliability was 'fair' (AC1 = 0.342; 95% confidence interval (CI) = 0.279-0.404) for non-highlighted landmarks and 'moderate' (AC1 = 0.700; 95% CI = 0.596-0.805) for highlighted landmarks. Intra-rater reliability was 'fair' (AC1 = 0.522); range was 'poor' (AC1 = 0.160) to 'substantial' (AC1 = 0.896). No differences were found between individual landmarks. Validity was 'low' (TB = 0.327; p = 0.104). Both inter- and intra-rater reliability was 'fair' but below clinically acceptable levels, validity was 'low'. Together these results challenge the clinical practice of using observation to categorise anterio-posterior landmark position. Copyright © 2014 Elsevier Ltd. All rights reserved.
Laterality judgments in people with low back pain--A cross-sectional observational and test-retest reliability study.

PubMed

Linder, Martin; Michaelson, Peter; Röijezon, Ulrik

2016-02-01

Disruption of cortical representation, or body schema, has been indicated as a factor in the persistence and recurrence of low back pain (LBP). This has been observed through impaired laterality judgment ability and it has been suggested that this ability is affected in a spatial rather than anatomical manner. We compared laterality judgment performance of foot and trunk movements between people with LBP with or without leg pain and healthy controls, and investigated associations between test performance and pain. We also assessed the test-retest reliability of the Recognise Online™ software when used in a clinical and a home setting. Cross-sectional observational and test-retest study. Thirty individuals with LBP and 30 healthy controls performed judgment tests of foot and trunk laterality once supervised in a clinic and twice at home. No statistically significant group differences were found. LBP intensity was negatively related to trunk laterality accuracy (p = 0.019). Intraclass correlation values ranged from 0.51 to 0.91. Reaction time improved significantly between test occasions while accuracy did not. Laterality judgments were not impaired in subjects with LBP compared to controls. Further research may clarify the relationship between pain mechanisms in LBP and laterality judgment ability. Reliability values were mostly acceptable, with wide and low confidence intervals, suggesting test-retest reliability for Recognise Online™ could be questioned in this trial. A significant learning effect was observed which should be considered in clinical and research application of the test. Copyright © 2015 Elsevier Ltd. All rights reserved.
Cardiac valve calcifications on low-dose unenhanced ungated chest computed tomography: inter-observer and inter-examination reliability, agreement and variability.

PubMed

van Hamersvelt, Robbert W; Willemink, Martin J; Takx, Richard A P; Eikendal, Anouk L M; Budde, Ricardo P J; Leiner, Tim; Mol, Christian P; Isgum, Ivana; de Jong, Pim A

2014-07-01

To determine inter-observer and inter-examination variability for aortic valve calcification (AVC) and mitral valve and annulus calcification (MC) in low-dose unenhanced ungated lung cancer screening chest computed tomography (CT). We included 578 lung cancer screening trial participants who were examined by CT twice within 3 months to follow indeterminate pulmonary nodules. On these CTs, AVC and MC were measured in cubic millimetres. One hundred CTs were examined by five observers to determine the inter-observer variability. Reliability was assessed by kappa statistics (κ) and intra-class correlation coefficients (ICCs). Variability was expressed as the mean difference ± standard deviation (SD). Inter-examination reliability was excellent for AVC (κ = 0.94, ICC = 0.96) and MC (κ = 0.95, ICC = 0.90). Inter-examination variability was 12.7 ± 118.2 mm(3) for AVC and 31.5 ± 219.2 mm(3) for MC. Inter-observer reliability ranged from κ = 0.68 to κ = 0.92 for AVC and from κ = 0.20 to κ = 0.66 for MC. Inter-observer ICC was 0.94 for AVC and ranged from 0.56 to 0.97 for MC. Inter-observer variability ranged from -30.5 ± 252.0 mm(3) to 84.0 ± 240.5 mm(3) for AVC and from -95.2 ± 210.0 mm(3) to 303.7 ± 501.6 mm(3) for MC. AVC can be quantified with excellent reliability on ungated unenhanced low-dose chest CT, but manual detection of MC can be subject to substantial inter-observer variability. Lung cancer screening CT may be used for detection and quantification of cardiac valve calcifications. • Low-dose unenhanced ungated chest computed tomography can detect cardiac valve calcifications. • However, calcified cardiac valves are not reported by most radiologists. • Inter-observer and inter-examination variability of aortic valve calcifications is sufficient for longitudinal studies. • Volumetric measurement variability of mitral valve and annulus calcifications is substantial.
Spanish version validation of the Marihuana Motives Measure in a drug-consuming adolescent sample.

PubMed

Matali Costa, Josep; Simons, J; Pardo, M; Lleras, M; Pérez, A; Andión, O

2018-01-15

Cannabis is the illicit drug mostly widely consumed by adolescents in Spain. The understanding of consumption motives is an important factor for intervention. In Spain, there are no available instruments for their evaluation, hence, the goal of this paper is to study the psychometric properties of the Marihuana Motives Measure (MMM) in a sample of adolescent consumers. Firstly, translation and back-translation was performed. A total of 228 adolescent consumers of cannabis were evaluated. Factorial analysis was conducted, and the reliability of the total scores and of each scale of the questionnaire was studied through Cronbach's alpha. Test-retest reliability was analyzed through interclass correlations. Validity evidence of the MMM was examined through correlations between current cannabis use, subjective consumption effects measured with the Addiction Research Center Inventory (ARCI), and personality measured with the Millon Adolescent Clinical Inventory (MACI). High reliability was observed in total score of the MMM (Cronbach α = .86), and high and moderate reliability for each of the five factors obtained in the factorial analysis of the MMM, Social = .82, Enhancement = .72, Coping = .83, Expansion = .74, and Conformity = .64. Significant correlations were also observed between cannabis consumption motives and subjective effects, and between consumption motives and personality. The Spanish version of the MMM shows a similar factorial structure as the one obtained by the original author, and its measures are reliable and valid for the study of cannabis consumption motives in adolescent consumer population.
Live versus Video Observations: Comparing the Reliability and Validity of Two Methods of Assessing Classroom Quality

ERIC Educational Resources Information Center

Curby, Timothy W.; Johnson, Price; Mashburn, Andrew J.; Carlis, Lydia

2016-01-01

When conducting classroom observations, researchers are often confronted with the decision of whether to conduct observations live or by using pre-recorded video. The present study focuses on comparing and contrasting observations of live and video administrations of the Classroom Assessment Scoring System-PreK (CLASS-PreK). Associations between…
Development and reliability of the rating of compensatory movements in upper limb prosthesis wearers during work-related tasks.

PubMed

van der Laan, Tallie M J; Postema, Sietke G; Reneman, Michiel F; Bongers, Raoul M; van der Sluis, Corry K

2018-02-10

Reliability study. Quantifying compensatory movements during work-related tasks may help to prevent musculoskeletal complaints in individuals with upper limb absence. (1) To develop a qualitative scoring system for rating compensatory shoulder and trunk movements in upper limb prosthesis wearers during the performance of functional capacity evaluation tests adjusted for use by 1-handed individuals (functional capacity evaluation-one handed [FCE-OH]); (2) to examine the interrater and intrarater reliability of the scoring system; and (3) to assess its feasibility. Movement patterns of 12 videotaped upper limb prosthesis wearers and 20 controls were analyzed. Compensatory movements were defined for each FCE-OH test, and a scoring system was developed, pilot tested, and adjusted. During reliability testing, 18 raters (12 FCE experts and 6 physiotherapists/gait analysts) scored videotapes of upper limb prosthesis wearers performing 4 FCE-OH tests 2 times (2 weeks apart). Agreement was expressed in % and kappa value. Feasibility (focus area's "acceptability", "demand," and "implementation") was determined by using a questionnaire. After 2 rounds of pilot testing and adjusting, reliability of a third version was tested. The interrater reliability for the first and second rating sessions were к = 0.54 (confidence interval [CI]: 0.52-0.57) and к = 0.64 (CI: 0.61-0.66), respectively. The intrarater reliability was к = 0.77 (CI: 0.72-0.82). The feasibility was good but could be improved by a training program. It seems possible to identify compensatory movements in upper limb prosthesis wearers during the performance of FCE-OH tests reliably by observation using the developed observational scoring system. Interrater reliability was satisfactory in most instances; intrarater reliability was good. Feasibility was established. Copyright © 2018 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Reliability and validity of the Korean version of the community balance and mobility scale in patients with hemiplegia after stroke

PubMed Central

Lee, Kyoung-bo; Lee, Paul; Yoo, Sang-won; Kim, Young-dong

2016-01-01

[Purpose] The aim of this study was to translate and adapt the Community Balance and Mobility Scale (CB&M) into Korean (K-CB&M) and to verify the reliability and validity of scores obtained with Korean patients. [Subjects and Methods] A total of 16 subjects were recruited from St. Vincent’s Hospital in South Korea. At each testing session, subjects completed the K-CB&M, Berg balance scale (BBS), timed up and go test (TUG), and functional reaching test. All tests were administered by a physical therapist, and subjects completed the tests in an identical standardized order during all testing sessions. [Results] The inter- and intra-rater reliability coefficients were high for most subscores, while moderate inter-rater reliability was observed for the items “walking and looking” and “walk, look, and carry”, and moderate intra-rater reliability was observed for “forward to backward walking”. There was a positive correlation between the K-CB&M and BBS and a negative correlation between the K-CB&M and TUG in the convergent validity assessments. [Conclusion] The reliability and validity of the K-CB&M was high, suggesting that clinical practitioners treating Korean patients with hemiplegia can use this material for assessing static and dynamic balance. PMID:27630420
Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

PubMed Central

Hallgren, Kevin A.

2012-01-01

Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR. PMID:22833776
Reducing random measurement error in assessing postural load on the back in epidemiologic surveys.

PubMed

Burdorf, A

1995-02-01

The goal of this study was to design strategies to assess postural load on the back in occupational epidemiology by taking into account the reliability of measurement methods and the variability of exposure among the workers under study. Intermethod reliability studies were evaluated to estimate the systematic bias (accuracy) and random measurement error (precision) of various methods to assess postural load on the back. Intramethod reliability studies were reviewed to estimate random variability of back load over time. Intermethod surveys have shown that questionnaires have a moderate reliability for gross activities such as sitting, whereas duration of trunk flexion and rotation should be assessed by observation methods or inclinometers. Intramethod surveys indicate that exposure variability can markedly affect the reliability of estimates of back load if the estimates are based upon a single measurement over a certain time period. Equations have been presented to evaluate various study designs according to the reliability of the measurement method, the optimum allocation of the number of repeated measurements per subject, and the number of subjects in the study. Prior to a large epidemiologic study, an exposure-oriented survey should be conducted to evaluate the performance of measurement instruments and to estimate sources of variability for back load. The strategy for assessing back load can be optimized by balancing the number of workers under study and the number of repeated measurements per worker.

Development and Testing of the Observational System for Recording Physical Activity in Children: Elementary School

PubMed Central

McIver, Kerry L.; Brown, William H.; Pfeiffer, Karin A.; Dowda, Marsha; Pate, Russell R.

2016-01-01

Purpose This study describes the development and pilot testing of the Observational System for Recording Physical Activity-Elementary School (OSRAC-E) version. Methods This system was developed to observe and document the levels and types of physical activity and physical and social contexts of physical activity in elementary school students during the school day. Inter-observer agreement scores and summary data were calculated. Results All categories had Kappa statistics above 0.80, with the exception of the activity initiator category. Inter-observer agreement scores were 96% or greater. The OSRAC-E was shown to be a reliable observation system that allows researchers to assess physical activity behaviors, the contexts of those behaviors, and the effectiveness of physical activity interventions in the school environment. Conclusion The OSRAC-E can yield data with high interobserver reliability and provide relatively extensive contextual information about physical activity of students in elementary schools. PMID:26889587
Assessing fidelity in individual and family therapy for adolescent substance abuse.

PubMed

Hogue, Aaron; Dauber, Sarah; Chinchilla, Priscilla; Fried, Adam; Henderson, Craig; Inclan, Jaime; Reiner, Robert H; Liddle, Howard A

2008-09-01

This study introduces an observational measure of fidelity in evidence-based practices for adolescent substance abuse treatment. The Therapist Behavior Rating Scale-Competence (TBRS-C) measures adherence and competence in individual cognitive-behavioral therapy and multidimensional family therapy for adolescent substance abuse. The TBRS-C assesses fidelity to the core therapeutic goals of each approach and also contains global ratings of therapist competence. Study participants were 136 clinically referred adolescents and their families observed in 437 treatment sessions. The TBRS-C demonstrated strong interrater reliability for goal-specific ratings of treatment adherence, and modest reliability for goal-specific and global ratings of therapist competence, evidence of construct validity, and discriminant validity with an observational measure of therapeutic alliance. The utility of the TBRS-C for evaluating treatment fidelity in field settings is discussed.
Reliability and validity of the Turkish version of the Berg Balance Scale.

PubMed

Sahin, Fusun; Yilmaz, Figen; Ozmaden, Asli; Kotevolu, Nurdan; Sahin, Tulay; Kuran, Banu

2008-01-01

The purpose of this study was to develop a Turkish version of the Berg Balance Scale (BBS) and assess its reliability and validity. Sixty healthy volunteers older than 65 years were included in to the study. Subjects who had lower extremity amputation, or were armchair or bedridden were excluded. After translation process, the Turkish version of the scale was administered to each participant twice with an interval of 2 weeks. The intraclass correlation coefficient (ICC) was calculated to assess intra- and inter-observer reliability. Chronbach alpha was calculated to evaluate internal consistency of the total BBS score. Interclass correlation coefficient was calcuated to examine test-retest reliability. Convergent validity was assessed by correlating the scale with Modified Barthel Index (MBI) and Timed Up and Go Test (TUG). Construct validity was assessed with factor analysis. The mean age in years of the participants were 77.00+/-5.67 (range: 67-92 yrs). The ICC for intra- and inter- observer reliability was 0.98 (p<0.0001) and 0.97 (p<0.0001), respectively. Chronbach alpha of the Turkish version of the BBS was 0.98. The test-retest reliability (ICC) of the Turkish version of the BBS was determined as 0.98 for the total score, and ranged from 0.86-0.99 for individual items. In terms of validity, the Turkish version of the BBS was correlated with the MBI (in positive direction) and TUG (in negative direction) (r=0.67 p<0.0001; r=-0.75 p<0.0001, respectively). The Turkish version of the BBS is a reliable and valid scale to be used in balance assessment of Turkish older adults.
Generalizing from Observations of Mathematics Teachers' Instructional Practice Using the Instructional Quality Assessment

ERIC Educational Resources Information Center

Wilhelm, Anne Garrison; Kim, Sungyeun

2015-01-01

One crucial question for researchers who study teachers' classroom practice is how to maximize information about what is happening in classrooms while minimizing costs. This report extends prior studies of the reliability of the Instructional Quality Assessment (IQA), a widely used classroom observation toolkit, and offers insight into the often…
Comparison of Two Methods for Estimating Adjustable One-Point Cane Length in Community-Dwelling Older Adults.

PubMed

Camara, Camila Thais Pinto; de Freitas, Sandra Maria Sbeghen Ferreira; de Lima, Waléria Paixão; Lima, Camila Astolphi; Amorim, César Ferreira; Perracini, Monica Rodrigues

2017-01-01

Our aim is to estimate inter-observer reliability, test-retest reliability, anthropometric and biomechanical adequacy and minimal detectable change when measuring the length of single-point adjustable canes in community-dwelling older adults. There are 112 participants in the study. They are men and women, aged 60 years and over, who were attending an outpatient community health centre. An exploratory study design was used. Participants underwent two assessments within the same day by two independent observers and by the same observer at an interval of 15-45 days. Two measures were used to establish the length of a single-point adjustable cane: the distance from the distal wrist crease to the floor (WF) and the distance from the top of the greater trochanter of the femur to the floor (TF). Each individual was fitted according to these two measures, and elbow flexion angle was measured. Inter-observer reliability and the test-retest reliability were high in both TF (ICC 3.1 = 0.918 and ICC 2.1 = 0.935) and WF measures (ICC 3.1 = 0.967 and ICC 2.1 = 0.960). Only 1% of the individuals kept an elbow flexion angle within the standard recommendation of 30° ± 10° when the cane length was determined by the TF measure, and 30% of the participants when the cane was determined by the WF measure. The minimal detectable cane length change was 2.2 cm. Our results suggest that, even though both measures are reliable, cane length determined by WF distance is more appropriate to keep the elbow flexion angle within the standard recommendation. The minimal detectable change corresponds to approximately a hole in the cane adjustment. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Reliability of conditioned pain modulation: a systematic review

PubMed Central

Kennedy, Donna L.; Kemp, Harriet I.; Ridout, Deborah; Yarnitsky, David; Rice, Andrew S.C.

2016-01-01

Abstract A systematic literature review was undertaken to determine if conditioned pain modulation (CPM) is reliable. Longitudinal, English language observational studies of the repeatability of a CPM test paradigm in adult humans were included. Two independent reviewers assessed the risk of bias in 6 domains; study participation; study attrition; prognostic factor measurement; outcome measurement; confounding and analysis using the Quality in Prognosis Studies (QUIPS) critical assessment tool. Intraclass correlation coefficients (ICCs) less than 0.4 were considered to be poor; 0.4 and 0.59 to be fair; 0.6 and 0.75 good and greater than 0.75 excellent. Ten studies were included in the final review. Meta-analysis was not appropriate because of differences between studies. The intersession reliability of the CPM effect was investigated in 8 studies and reported as good (ICC = 0.6-0.75) in 3 studies and excellent (ICC > 0.75) in subgroups in 2 of those 3. The assessment of risk of bias demonstrated that reporting is not comprehensive for the description of sample demographics, recruitment strategy, and study attrition. The absence of blinding, a lack of control for confounding factors, and lack of standardisation in statistical analysis are common. Conditioned pain modulation is a reliable measure; however, the degree of reliability is heavily dependent on stimulation parameters and study methodology and this warrants consideration for investigators. The validation of CPM as a robust prognostic factor in experimental and clinical pain studies may be facilitated by improvements in the reporting of CPM reliability studies. PMID:27559835
Test-retest reliability of the scale of participation in organized activities among adolescents in the Czech Republic and Slovakia.

PubMed

Bosakova, Lucia; Kolarcik, Peter; Bobakova, Daniela; Sulcova, Martina; Van Dijk, Jitse P; Reijneveld, Sijmen A; Geckova, Andrea Madarasova

2016-04-01

Participation in organized activities is related with a range of positive outcomes, but the way such participation is measured has not been scrutinized. Test-retest reliability as an important indicator of a scale's reliability has been assessed rarely and for "The scale of participation in organized activities" lacks completely. This test-retest study is based on the Health Behaviour in School-aged Children study and is consistent with its methodology. We obtained data from 353 Czech (51.9 % boys) and 227 Slovak (52.9 % boys) primary school pupils, grades five and nine, who participated in this study in 2013. We used Cohen's kappa statistic and single measures of the intraclass correlation coefficient to estimate the test-retest reliability of all selected items in the sample, stratified by gender, age and country. We mostly observed a large correlation between the test and retest in all of the examined variables (κ ranged from 0.46 to 0.68). Test-retest reliability of the sum score of individual items showed substantial agreement (ICC = 0.64). The scale of participation in organized activities has an acceptable level of agreement, indicating good reliability.
Evaluation of a modified Karnofsky score to assess physical and psychological wellbeing of cats in a hospital setting.

PubMed

Taffin, Elien Rl; Paepe, Dominique; Campos, Miguel; Duchateau, Luc; Goris, Nesya; De Roover, Katrien; Daminet, Sylvie

2016-11-01

Objectives The Karnofsky score (KS) modified for cats, a scoring system to rate health and quality of life (QOL) in cats, is used in clinical trials, but its reliability and validity are yet to be determined. The present study aims to evaluate the scientific robustness of the KS when adapted for use in a hospital setting. Methods A list of variables to consider during the physical examination, which informs the clinician's score (CS) part of the KS, was added and clinicians were allowed to choose a score anywhere between 0 and 50. The Karnofsky QOL questionnaire was adapted for use in a hospital setting. F-tests with Bonferroni correction and Spearman rank correlation coefficients were used to evaluate reliability and validity of the KS to assess the health and wellbeing of cats in a hospital setting. The records of 54 feline immunodeficiency virus-positive cats, which were recruited for a clinical trial and hospitalised for 6 weeks, were reviewed. Four veterinarians scored the CS, and one veterinarian and a veterinary nurse assessed the QOL score. Results Mean absolute difference between observers was significantly larger for the CS than for the QOL score ( P <0.001) and two veterinarians scored significantly higher than the remaining two veterinarians ( P <0.001). Inter-observer correlation ranged from 0.45-0.75 for the CS. For the QOL score, the absolute difference between observers was small, no significant difference was found between observers and a high degree of inter-observer correlation was noted (r = 0.91). Conclusions and relevance The results indicate low inter-observer reliability for the CS, requiring additional modifications to this part of the KS. The QOL score seems more reliable, and the questionnaire may serve as a reliable tool in the assessment of QOL in cats in a hospital setting. Consequently, further adaptation of the KS is mandatory when simultaneous assessment of both the cat's clinical health and perceived wellbeing is required.
Constructing the 'Best' Reliability Data for the Job - Developing Generic Reliability Data from Alternative Sources Early in a Product's Development Phase

NASA Technical Reports Server (NTRS)

Kleinhammer, Roger K.; Graber, Robert R.; DeMott, D. L.

2016-01-01

Reliability practitioners advocate getting reliability involved early in a product development process. However, when assigned to estimate or assess the (potential) reliability of a product or system early in the design and development phase, they are faced with lack of reasonable models or methods for useful reliability estimation. Developing specific data is costly and time consuming. Instead, analysts rely on available data to assess reliability. Finding data relevant to the specific use and environment for any project is difficult, if not impossible. Instead, analysts attempt to develop the "best" or composite analog data to support the assessments. Industries, consortia and vendors across many areas have spent decades collecting, analyzing and tabulating fielded item and component reliability performance in terms of observed failures and operational use. This data resource provides a huge compendium of information for potential use, but can also be compartmented by industry, difficult to find out about, access, or manipulate. One method used incorporates processes for reviewing these existing data sources and identifying the available information based on similar equipment, then using that generic data to derive an analog composite. Dissimilarities in equipment descriptions, environment of intended use, quality and even failure modes impact the "best" data incorporated in an analog composite. Once developed, this composite analog data provides a "better" representation of the reliability of the equipment or component. It can be used to support early risk or reliability trade studies, or analytical models to establish the predicted reliability data points. It also establishes a baseline prior that may updated based on test data or observed operational constraints and failures, i.e., using Bayesian techniques. This tutorial presents a descriptive compilation of historical data sources across numerous industries and disciplines, along with examples of contents and data characteristics. It then presents methods for combining failure information from different sources and mathematical use of this data in early reliability estimation and analyses.
Tapering Practices of Strongman Athletes: Test-Retest Reliability Study

PubMed Central

Pritchard, Hayden J; Keogh, Justin WL

2017-01-01

Background Little is currently known about the tapering practices of strongman athletes. We have developed an Internet-based comprehensive self-report questionnaire examining the training and tapering practices of strongman athletes. Objective The objective of this study was to document the test-retest reliability of questions associated with the Internet-based comprehensive self-report questionnaire on the tapering practices of strongman athletes. The information will provide insight on the reliability and usefulness of the online questionnaire for use with strongman athletes. Methods Invitations to complete an Internet questionnaire were sent via Facebook Messenger to identified strongman athletes. The survey consisted of four main areas of inquiry, including demographics and background information, training practices, tapering, and tapering practices. Of the 454 athletes that completed the survey over the 8-week period, 130 athletes responded on Facebook Messenger indicating that they intended to complete, or had completed, the survey. These participants were asked if they could complete the online questionnaire a second time for a test-retest reliability analysis. Sixty-four athletes (mean age 33.3 years, standard deviation [SD] 7.7; mean height 178.2 cm, SD 11.0; mean body mass 103.7 kg, SD 24.8) accepted this invitation and completed the survey for the second time after a minimum 7-day period from the date of their first completion. Agreement between athlete responses was measured using intraclass correlation coefficients (ICCs) and kappa statistics. Confidence intervals (at 95%) were reported for all measures and significance was set at P<.05. Results Test-retest reliability for demographic and training practices items were significant (P<.001) and showed excellent (ICC range=.84 to .98) and fair to almost perfect agreement (κ range=.37-.85). Moderate to excellent agreements (ICC range=.56-.84; P<.01) were observed for all tapering practice measures except for the number of days athletes started their usual taper before a strongman competition (ICC=.30). When the number of days were categorized with additional analyses, moderate reliability was observed (κ=.43; P<.001). Fair to substantial agreement was observed for the majority of tapering practices measures (κrange=.38-.73; P<.001) except for how training frequency (κ=.26) and the percentage and type of resistance training performed, which changed in the taper (κ=.20). Good to excellent agreement (ICC=.62-.93; P<.05) was observed for items relating to strongman events and traditional exercises performed during the taper. Only the time at which the Farmer’s Walk was last performed before competition showed poor reliability (ICC=.27). Conclusions We have developed a low cost, self-reported, online retrospective questionnaire, which provided stable and reliable answers for most of the demographic, training, and tapering practice questions. The results of this study support the inferences drawn from the Tapering Practices of Strongman Athletes Study. PMID:29089292
Inter-rater Reliability of Sustained Aberrant Movement Patterns as a Clinical Assessment of Muscular Fatigue

PubMed Central

Aerts, Frank; Carrier, Kathy; Alwood, Becky

2016-01-01

Background: The assessment of clinical manifestation of muscle fatigue is an effective procedure in establishing therapeutic exercise dose. Few studies have evaluated physical therapist reliability in establishing muscle fatigue through detection of changes in quality of movement patterns in a live setting. Objective: The purpose of this study is to evaluate the inter-rater reliability of physical therapists’ ability to detect altered movement patterns due to muscle fatigue. Design: A reliability study in a live setting with multiple raters. Participants: Forty-four healthy individuals (ages 19-35) were evaluated by six physical therapists in a live setting. Methods: Participants were evaluated by physical therapists for altered movement patterns during resisted shoulder rotation. Each participant completed a total of four tests: right shoulder internal rotation, right shoulder external rotation, left shoulder internal rotation and left shoulder external rotation. Results: For all tests combined, the inter-rater reliability for a single rater scoring ICC (2,1) was .65 (95%, .60, .71) This corresponds to moderate inter-rater reliability between physical therapists. Limitations: The results of this study apply only to healthy participants and therefore cannot be generalized to a symptomatic population. Conclusion: Moderate inter-rater reliability was found between physical therapists in establishing muscle fatigue through the observation of sustained altered movement patterns during dynamic resistive shoulder internal and external rotation. PMID:27347241
Evidence-based dentistry: analysis of dental anxiety scales for children.

PubMed

Al-Namankany, A; de Souza, M; Ashley, P

2012-03-09

To review paediatric dental anxiety measures (DAMs) and assess the statistical methods used for validation and their clinical implications. A search of four computerised databases between 1960 and January 2011 associated with DAMs, using pre-specified search terms, to assess the method of validation including the reliability as intra-observer agreement 'repeatability or stability' and inter-observer agreement 'reproducibility' and all types of validity. Fourteen paediatric DAMs were predominantly validated in schools and not in the clinical setting while five of the DAMs were not validated at all. The DAMs that were validated were done so against other paediatric DAMs which may not have been validated previously. Reliability was not assessed in four of the DAMs. However, all of the validated studies assessed reliability which was usually 'good' or 'acceptable'. None of the current DAMs used a formal sample size technique. Diversity was seen between the studies ranging from a few simple pictograms to lists of questions reported by either the individual or an observer. To date there is no scale that can be considered as a gold standard, and there is a need to further develop an anxiety scale with a cognitive component for children and adolescents.
Test-Retest Reliability of Brain Activation Using the Face-Name Paired-Associates fMRI Task in Patients with Schizophrenia and Healthy Controls

NASA Astrophysics Data System (ADS)

Louis, Chelsey N.

Schizophrenia is a neurological disorder associated with cognitive impairments, and clinical symptoms of hallucinations and delusions. Recent imaging and behavioral studies have repeatedly shown aberrant brain activity in the hippocampal regions in relation to episodic memory impairments associated with schizophrenia. These findings have warranted further research to elucidate the neural processes associated with episodic memory. Therefore, the current study examined activity in a priori brain regions associated with episodic memory using the face-name paired-associates fMRI task to determine whether there was reliable activation patterns observed in healthy subjects and patients with self-reported schizophrenia. This was evaluated by using ROI analysis and whole brain analysis to examine activity between subjects during a session, and by using Pearson's R correlation coefficients to examine test-retest reliability over time. 30 schizophrenic (SZ) patients and 31 healthy control (HC) volunteers underwent a series of assessments including the fMRI behavioral task, face-name paired-associates task. The tests were conducted twice with a 14-day interval for the subjects. The results indicated no reliable brain activation in the hippocampus between scanning sessions for either the SZ or HC groups. However, distinct activation patterns were observed within sessions for both groups. These patterns were observed in the hippocampus, and regions of the frontal lobe and occipital lobe. Future studies should further explore these brain activity patterns across sessions in SZ patients compared to HC subjects to determine whether these patterns are due to pathological mechanisms associated with schizophrenia.
[Desing and validation of a scale to measure caregiving dedication in caregivers of dependent older people].

PubMed

Serrano-Ortega, Natalia; Frías-Osuna, Antonio; Recio-Gómez, Juan M; Del-Pino-Casado, Rafael

2015-11-01

To develop and validate a scale to measure caregiving dedication regarding activities of daily living in caregivers of dependent older people. Cross-sectional study. Primary Health Care (Andalusia, Spain). a probabilistic sample of 200 caregivers of older relatives from Córdoba, Spain. Content validation by experts, construct validity (by exploratory factor analysis), divergent validity and reliability (internal consistency, test-retest reliability and inter-observers reliability). Cronbach's alpha was 0.86. Intraclass Correlation Coefficient was 0.96 for test-retest reliability and 0.88 for inter-observers reliability. When the sample was divided in two groups according to perceived burden level (presence and absence), the perceived burden was significantly different in each group (P=.001). The factor analysis revealed one only factor that explained 64% of the variance. The scale allows a suitable measure of caregiving dedication regarding activities of daily living in caregivers of older people, because this scale allows a quickly, easy administration, is well accepted by caregivers, has acceptable psychometric results and includes the frequency of caregiving, the kind of attended need and the dependence level in each need. Copyright © 2014 Elsevier España, S.L.U. All rights reserved.
Diagnosing paratonia in the demented elderly: reliability and validity of the Paratonia Assessment Instrument (PAI).

PubMed

Hobbelen, Johannes S M; Koopmans, Raymond T C M; Verhey, Frans R J; Habraken, Kitty M; de Bie, Rob A

2008-08-01

Paratonia is one of the associated movement disorders characteristic of dementia. The aim of this study was to develop an assessment tool (the Paratonia Assessment Instrument, PAI), based on the new consensus definition of paratonia. An additional aim was to investigate the reliability and validity of the PAI. A three-phase cross-sectional survey was conducted. In the first two phases, the PAI was developed and validated. In the third phase, the inter-observer reliability and feasibility of the instrument was tested. The original PAI consisted of five criteria that all needed to be met in order to make the diagnosis. On the basis of a qualitative analysis, one criterion was reformulated and another was removed. Following this, inter-observer reliability between the two assessors resulted in an improvement of Cohen's kappa from 0.532 in the initial phase to 0.677 in the second phase. This improvement was substantiated in the third phase by two independent assessors with Cohen's kappa ranging from 0.625 to 1. The PAI is a reliable and valid assessment tool for diagnosing paratonia in elderly people with dementia that can be applied easily in daily practice.
Observed Emotional and Behavioral Indicators of Motivation Predict School Readiness in Head Start Graduates

ERIC Educational Resources Information Center

Berhenke, Amanda; Miller, Alison L.; Brown, Eleanor; Seifer, Ronald; Dickstein, Susan

2011-01-01

Emotions and behaviors observed during challenging tasks are hypothesized to be valuable indicators of young children's motivation, the assessment of which may be particularly important for children at risk for school failure. The current study demonstrated reliability and concurrent validity of a new observational assessment of motivation in…
Evaluating the use of in-store measures in retail food stores and restaurants in Brazil.

PubMed

Duran, Ana Clara; Lock, Karen; Latorre, Maria do Rosario D O; Jaime, Patricia Constante

2015-01-01

To assess inter-rater reliability, test-retest reliability, and construct validity of retail food store, open-air food market, and restaurant observation tools adapted to the Brazilian urban context. This study is part of a cross-sectional observation survey conducted in 13 districts across the city of Sao Paulo, Brazil in 2010-2011. Food store and restaurant observational tools were developed based on previously available tools, and then tested it. They included measures on the availability, variety, quality, pricing, and promotion of fruits and vegetables and ultra-processed foods. We used Kappa statistics and intra-class correlation coefficients to assess inter-rater and test-retest reliabilities in samples of 142 restaurants, 97 retail food stores (including open-air food markets), and of 62 restaurants and 45 retail food stores (including open-air food markets), respectively. Construct validity as the tool's abilities to discriminate based on store types and different income contexts were assessed in the entire sample: 305 retail food stores, 8 fruits and vegetable markets, and 472 restaurants. Inter-rater and test-retest reliability were generally high, with most Kappa values greater than 0.70 (range 0.49-1.00). Both tools discriminated between store types and neighborhoods with different median income. Fruits and vegetables were more likely to be found in middle to higher-income neighborhoods, while soda, fruit-flavored drink mixes, cookies, and chips were cheaper and more likely to be found in lower-income neighborhoods. The measures were reliable and able to reveal significant differences across store types and different contexts. Although some items may require revision, results suggest that the tools may be used to reliably measure the food stores and restaurant food environment in urban settings of middle-income countries. Such studies can help .inform health promotion interventions and policies in these contexts.
Evaluating the use of in-store measures in retail food stores and restaurants in Brazil

PubMed Central

Duran, Ana Clara; Lock, Karen; Latorre, Maria do Rosario D O; Jaime, Patricia Constante

2015-01-01

ABSTRACT OBJECTIVE To assess inter-rater reliability, test-retest reliability, and construct validity of retail food store, open-air food market, and restaurant observation tools adapted to the Brazilian urban context. METHODS This study is part of a cross-sectional observation survey conducted in 13 districts across the city of Sao Paulo, Brazil in 2010-2011. Food store and restaurant observational tools were developed based on previously available tools, and then tested it. They included measures on the availability, variety, quality, pricing, and promotion of fruits and vegetables and ultra-processed foods. We used Kappa statistics and intra-class correlation coefficients to assess inter-rater and test-retest reliabilities in samples of 142 restaurants, 97 retail food stores (including open-air food markets), and of 62 restaurants and 45 retail food stores (including open-air food markets), respectively. Construct validity as the tool’s abilities to discriminate based on store types and different income contexts were assessed in the entire sample: 305 retail food stores, 8 fruits and vegetable markets, and 472 restaurants. RESULTS Inter-rater and test-retest reliability were generally high, with most Kappa values greater than 0.70 (range 0.49-1.00). Both tools discriminated between store types and neighborhoods with different median income. Fruits and vegetables were more likely to be found in middle to higher-income neighborhoods, while soda, fruit-flavored drink mixes, cookies, and chips were cheaper and more likely to be found in lower-income neighborhoods. CONCLUSIONS The measures were reliable and able to reveal significant differences across store types and different contexts. Although some items may require revision, results suggest that the tools may be used to reliably measure the food stores and restaurant food environment in urban settings of middle-income countries. Such studies can help .inform health promotion interventions and policies in these contexts. PMID:26538101
Inter-rater reliability of a modified version of Delitto et al.’s classification-based system for low back pain: a pilot study

PubMed Central

Apeldoorn, Adri T.; van Helvoirt, Hans; Ostelo, Raymond W.; Meihuizen, Hanneke; Kamper, Steven J.; van Tulder, Maurits W.; de Vet, Henrica C. W.

2016-01-01

Study design Observational inter-rater reliability study. Objectives To examine: (1) the inter-rater reliability of a modified version of Delitto et al.’s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3) the inter-rater reliability of algorithm decisions in patients who clearly fit into a subgroup (clear classifications) and those who do not (unclear classifications). Methods Patients were examined twice on the same day by two of three participating physical therapists with different levels of familiarity with the system. Patients were classified into one of four classification groups. Raters were blind to the others’ classification decision. In order to quantify the inter-rater reliability, percentages of agreement and Cohen’s Kappa were calculated. Results A total of 36 patients were included (clear classification n = 23; unclear classification n = 13). The overall rate of agreement was 53% and the Kappa value was 0·34 [95% confidence interval (CI): 0·11–0·57], which indicated only fair inter-rater reliability. Inter-rater reliability for patients with a clear classification (agreement 52%, Kappa value 0·29) was not higher than for patients with an unclear classification (agreement 54%, Kappa value 0·33). Familiarity with the system (i.e. trained with written instructions and previous research experience with the algorithm) did not improve the inter-rater reliability. Conclusion Our pilot study challenges the inter-rater reliability of the classification procedure in clinical practice. Therefore, more knowledge is needed about factors that affect the inter-rater reliability, in order to improve the clinical applicability of the classification scheme. PMID:27559279
Validity and reliability of sleep time questionnaires in children and adolescents: A systematic review and meta-analysis.

PubMed

Nascimento-Ferreira, Marcus V; Collese, Tatiana S; de Moraes, Augusto César F; Rendo-Urteaga, Tara; Moreno, Luis A; Carvalho, Heráclito B

2016-12-01

Sleep duration has been associated with several health outcomes in children and adolescents. As an extensive number of questionnaires are currently used to investigate sleep schedule or sleep time, we performed a systematic review of criterion validation of sleep time questionnaires for children and adolescents, considering accelerometers as the reference method. We found a strong correlation between questionnaires and accelerometers for weeknights and a moderate correlation for weekend nights. When considering only studies performing a reliability assessment of the used questionnaires, a significant increase in the correlations for both weeknights and weekend nights was observed. In conclusion, moderate to strong criterion validity of sleep time questionnaires was observed; however, the reliability assessment of the questionnaires showed strong validation performance. Copyright © 2015 Elsevier Ltd. All rights reserved.

Understanding and Visualizing Multitasking and Task Switching Activities: A Time Motion Study to Capture Nursing Workflow

PubMed Central

Yen, Po-Yin; Kelley, Marjorie; Lopetegui, Marcelo; Rosado, Amber L.; Migliore, Elaina M.; Chipps, Esther M.; Buck, Jacalyn

2016-01-01

A fundamental understanding of multitasking within nursing workflow is important in today’s dynamic and complex healthcare environment. We conducted a time motion study to understand nursing workflow, specifically multitasking and task switching activities. We used TimeCaT, a comprehensive electronic time capture tool, to capture observational data. We established inter-observer reliability prior to data collection. We completed 56 hours of observation of 10 registered nurses. We found, on average, nurses had 124 communications and 208 hands-on tasks per 4-hour block of time. They multitasked (having communication and hands-on tasks simultaneously) 131 times, representing 39.48% of all times; the total multitasking duration ranges from 14.6 minutes to 109 minutes, 44.98 minutes (18.63%) on average. We also reviewed workflow visualization to uncover the multitasking events. Our study design and methods provide a practical and reliable approach to conducting and analyzing time motion studies from both quantitative and qualitative perspectives. PMID:28269924
Understanding and Visualizing Multitasking and Task Switching Activities: A Time Motion Study to Capture Nursing Workflow.

PubMed

Yen, Po-Yin; Kelley, Marjorie; Lopetegui, Marcelo; Rosado, Amber L; Migliore, Elaina M; Chipps, Esther M; Buck, Jacalyn

2016-01-01

A fundamental understanding of multitasking within nursing workflow is important in today's dynamic and complex healthcare environment. We conducted a time motion study to understand nursing workflow, specifically multitasking and task switching activities. We used TimeCaT, a comprehensive electronic time capture tool, to capture observational data. We established inter-observer reliability prior to data collection. We completed 56 hours of observation of 10 registered nurses. We found, on average, nurses had 124 communications and 208 hands-on tasks per 4-hour block of time. They multitasked (having communication and hands-on tasks simultaneously) 131 times, representing 39.48% of all times; the total multitasking duration ranges from 14.6 minutes to 109 minutes, 44.98 minutes (18.63%) on average. We also reviewed workflow visualization to uncover the multitasking events. Our study design and methods provide a practical and reliable approach to conducting and analyzing time motion studies from both quantitative and qualitative perspectives.
Identification of reliable gridded reference data for statistical downscaling methods in Alberta

NASA Astrophysics Data System (ADS)

Eum, H. I.; Gupta, A.

2017-12-01

Climate models provide essential information to assess impacts of climate change at regional and global scales. However, statistical downscaling methods have been applied to prepare climate model data for various applications such as hydrologic and ecologic modelling at a watershed scale. As the reliability and (spatial and temporal) resolution of statistically downscaled climate data mainly depend on a reference data, identifying the most reliable reference data is crucial for statistical downscaling. A growing number of gridded climate products are available for key climate variables which are main input data to regional modelling systems. However, inconsistencies in these climate products, for example, different combinations of climate variables, varying data domains and data lengths and data accuracy varying with physiographic characteristics of the landscape, have caused significant challenges in selecting the most suitable reference climate data for various environmental studies and modelling. Employing various observation-based daily gridded climate products available in public domain, i.e. thin plate spline regression products (ANUSPLIN and TPS), inverse distance method (Alberta Townships), and numerical climate model (North American Regional Reanalysis) and an optimum interpolation technique (Canadian Precipitation Analysis), this study evaluates the accuracy of the climate products at each grid point by comparing with the Adjusted and Homogenized Canadian Climate Data (AHCCD) observations for precipitation, minimum and maximum temperature over the province of Alberta. Based on the performance of climate products at AHCCD stations, we ranked the reliability of these publically available climate products corresponding to the elevations of stations discretized into several classes. According to the rank of climate products for each elevation class, we identified the most reliable climate products based on the elevation of target points. A web-based system was developed to allow users to easily select the most reliable reference climate data at each target point based on the elevation of grid cell. By constructing the best combination of reference data for the study domain, the accurate and reliable statistically downscaled climate projections could be significantly improved.
Three-Dimensional Photography for Quantitative Assessment of Penile Volume-Loss Deformities in Peyronie's Disease.

PubMed

Margolin, Ezra J; Mlynarczyk, Carrie M; Mulhall, John P; Stember, Doron S; Stahl, Peter J

2017-06-01

Non-curvature penile deformities are prevalent and bothersome manifestations of Peyronie's disease (PD), but the quantitative metrics that are currently used to describe these deformities are inadequate and non-standardized, presenting a barrier to clinical research and patient care. To introduce erect penile volume (EPV) and percentage of erect penile volume loss (percent EPVL) as novel metrics that provide detailed quantitative information about non-curvature penile deformities and to study the feasibility and reliability of three-dimensional (3D) photography for measurement of quantitative penile parameters. We constructed seven penis models simulating deformities found in PD. The 3D photographs of each model were captured in triplicate by four observers using a 3D camera. Computer software was used to generate automated measurements of EPV, percent EPVL, penile length, minimum circumference, maximum circumference, and angle of curvature. The automated measurements were statistically compared with measurements obtained using water-displacement experiments, a tape measure, and a goniometer. Accuracy of 3D photography for average measurements of all parameters compared with manual measurements; inter-test, intra-observer, and inter-observer reliabilities of EPV and percent EPVL measurements as assessed by the intraclass correlation coefficient. The 3D images were captured in a median of 52 seconds (interquartile range = 45-61). On average, 3D photography was accurate to within 0.3% for measurement of penile length. It overestimated maximum and minimum circumferences by averages of 4.2% and 1.6%, respectively; overestimated EPV by an average of 7.1%; and underestimated percent EPVL by an average of 1.9%. All inter-test, inter-observer, and intra-observer intraclass correlation coefficients for EPV and percent EPVL measurements were greater than 0.75, reflective of excellent methodologic reliability. By providing highly descriptive and reliable measurements of penile parameters, 3D photography can empower researchers to better study volume-loss deformities in PD and enable clinicians to offer improved clinical assessment, communication, and documentation. This is the first study to apply 3D photography to the assessment of PD and to accurately measure the novel parameters of EPV and percent EPVL. This proof-of-concept study is limited by the lack of data in human subjects, which could present additional challenges in obtaining reliable measurements. EPV and percent EPVL are novel metrics that can be quickly, accurately, and reliably measured using computational analysis of 3D photographs and can be useful in describing non-curvature volume-loss deformities resulting from PD. Margolin EJ, Mlynarczyk CM, Muhall JP, et al. Three-Dimensional Photography for Quantitative Assessment of Penile Volume-Loss Deformities in Peyronie's Disease. J Sex Med 2017;14:829-833. Copyright © 2017 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
International FItness Scale (IFIS): Construct Validity and Reliability in Women With Fibromyalgia: The al-Ándalus Project.

PubMed

Álvarez-Gallardo, Inmaculada C; Soriano-Maldonado, Alberto; Segura-Jiménez, Víctor; Carbonell-Baeza, Ana; Estévez-López, Fernando; McVeigh, Joseph G; Delgado-Fernández, Manuel; Ortega, Francisco B

2016-03-01

To examine the construct validity of the International FItness Scale (IFIS) (ie, self-reported fitness) against objectively measured physical fitness in women with fibromyalgia and in healthy women; and to study the test-retest reliability of the IFIS in women with fibromyalgia. Cross-sectional study. Fibromyalgia patient support groups. Women with fibromyalgia (n=413) and healthy women (controls) (n=195) for validity purposes and women with fibromyalgia (n=101) for the reliability study. The total sample was N=709. Not applicable. Fitness level was both self-reported (IFIS) and measured using performance-based fitness tests. For the reliability study the IFIS was completed on 2 occasions, 1 week apart. Women with fibromyalgia who reported average fitness had better measured fitness than those reporting very poor fitness (all P<.001, except 6-minute walk test where P<.05), with similar trends observed in healthy control women. The test-retest reliability of the IFIS, as measured by the average weighted κ, was .45. The IFIS was able to identify women with fibromyalgia who had very low fitness and distinguish them from those with higher fitness levels. Furthermore, the IFIS was moderately reliable in women with fibromyalgia. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
The quadrant method measuring four points is as a reliable and accurate as the quadrant method in the evaluation after anatomical double-bundle ACL reconstruction.

PubMed

Mochizuki, Yuta; Kaneko, Takao; Kawahara, Keisuke; Toyoda, Shinya; Kono, Norihiko; Hada, Masaru; Ikegami, Hiroyasu; Musha, Yoshiro

2017-11-20

The quadrant method was described by Bernard et al. and it has been widely used for postoperative evaluation of anterior cruciate ligament (ACL) reconstruction. The purpose of this research is to further develop the quadrant method measuring four points, which we named four-point quadrant method, and to compare with the quadrant method. Three-dimensional computed tomography (3D-CT) analyses were performed in 25 patients who underwent double-bundle ACL reconstruction using the outside-in technique. The four points in this study's quadrant method were defined as point1-highest, point2-deepest, point3-lowest, and point4-shallowest, in femoral tunnel position. Value of depth and height in each point was measured. Antero-medial (AM) tunnel is (depth1, height2) and postero-lateral (PL) tunnel is (depth3, height4) in this four-point quadrant method. The 3D-CT images were evaluated independently by 2 orthopaedic surgeons. A second measurement was performed by both observers after a 4-week interval. Intra- and inter-observer reliability was calculated by means of intra-class correlation coefficient (ICC). Also, the accuracy of the method was evaluated against the quadrant method. Intra-observer reliability was almost perfect for both AM and PL tunnel (ICC > 0.81). Inter-observer reliability of AM tunnel was substantial (ICC > 0.61) and that of PL tunnel was almost perfect (ICC > 0.81). The AM tunnel position was 0.13% deep, 0.58% high and PL tunnel position was 0.01% shallow, 0.13% low compared to quadrant method. The four-point quadrant method was found to have high intra- and inter-observer reliability and accuracy. This method can evaluate the tunnel position regardless of the shape and morphology of the bone tunnel aperture for use of comparison and can provide measurement that can be compared with various reconstruction methods. The four-point quadrant method of this study is considered to have clinical relevance in that it is a detailed and accurate tool for evaluating femoral tunnel position after ACL reconstruction. Case series, Level IV.
Modeling of unit operating considerations in generating-capacity reliability evaluation. Volume 1. Mathematical models, computing methods, and results. Final report. [GENESIS, OPCON and OPPLAN

DOE Office of Scientific and Technical Information (OSTI.GOV)

Patton, A.D.; Ayoub, A.K.; Singh, C.

1982-07-01

Existing methods for generating capacity reliability evaluation do not explicitly recognize a number of operating considerations which may have important effects in system reliability performance. Thus, current methods may yield estimates of system reliability which differ appreciably from actual observed reliability. Further, current methods offer no means of accurately studying or evaluating alternatives which may differ in one or more operating considerations. Operating considerations which are considered to be important in generating capacity reliability evaluation include: unit duty cycles as influenced by load cycle shape, reliability performance of other units, unit commitment policy, and operating reserve policy; unit start-up failuresmore » distinct from unit running failures; unit start-up times; and unit outage postponability and the management of postponable outages. A detailed Monte Carlo simulation computer model called GENESIS and two analytical models called OPCON and OPPLAN have been developed which are capable of incorporating the effects of many operating considerations including those noted above. These computer models have been used to study a variety of actual and synthetic systems and are available from EPRI. The new models are shown to produce system reliability indices which differ appreciably from index values computed using traditional models which do not recognize operating considerations.« less
A newly developed tool for classifying study designs in systematic reviews of interventions and exposures showed substantial reliability and validity.

PubMed

Seo, Hyun-Ju; Kim, Soo Young; Lee, Yoon Jae; Jang, Bo-Hyoung; Park, Ji-Eun; Sheen, Seung-Soo; Hahn, Seo Kyung

2016-02-01

To develop a study Design Algorithm for Medical Literature on Intervention (DAMI) and test its interrater reliability, construct validity, and ease of use. We developed and then revised the DAMI to include detailed instructions. To test the DAMI's reliability, we used a purposive sample of 134 primary, mainly nonrandomized studies. We then compared the study designs as classified by the original authors and through the DAMI. Unweighted kappa statistics were computed to test interrater reliability and construct validity based on the level of agreement between the original and DAMI classifications. Assessment time was also recorded to evaluate ease of use. The DAMI includes 13 study designs, including experimental and observational studies of interventions and exposure. Both the interrater reliability (unweighted kappa = 0.67; 95% CI [0.64-0.75]) and construct validity (unweighted kappa = 0.63, 95% CI [0.52-0.67]) were substantial. Mean classification time using the DAMI was 4.08 ± 2.44 minutes (range, 0.51-10.92). The DAMI showed substantial interrater reliability and construct validity. Furthermore, given its ease of use, it could be used to accurately classify medical literature for systematic reviews of interventions although minimizing disagreement between authors of such reviews. Copyright © 2016 Elsevier Inc. All rights reserved.
Application of the Modified Erikson Psychosocial Stage Inventory: 25 Years in Review.

PubMed

Darling-Fisher, Cynthia S

2018-04-01

The Modified Erikson Psychosocial Stage Inventory (MEPSI) is an 80-item, comprehensive measure of psychosocial development based on Erikson's theory with published reliability and validity data. Although designed as a comprehensive measure, some researchers have used individual subscales for specific developmental stages as a measure; however, these subscale reliability scores have not been generally shared. This article reviewed the literature to evaluate the use of the MEPSI: the major research questions, samples/populations studied, and individual subscale and total reliability and validity data. In total, 16 research articles (1990-2011) and 28 Dissertations/Theses (1991-2016) from nursing, social work, psychology, criminal justice, and religious studies met criteria. Results support the MEPSI's global reliability (aggregate scores ranged .89-.99) and validity in terms of consistent patterns of changes observed in the predicted direction. Reliability and validity data for individual subscales were more variable. Limitations of the tool and recommendations for possible revision and future research are addressed.
The Development, Validation, and Reliability of SAM: A Tool for Measurement of Moderate to Vigorous Physical Activity in School Physical Education

ERIC Educational Resources Information Center

Surapiboonchai, Kampol

2010-01-01

There is a lack of valid and reliable low cost observational instruments to measure moderate to vigorous physical activity (MVPA) in school physical education (PE). The participants in this study were third to tenth grade boys and girls from a south Texas school district. The SAM (Simple Activity Measurement) activity levels were compared with…
Reliability and Validity of a New Physical Activity Self-Report Measure for Younger Children

ERIC Educational Resources Information Center

Belton, Sarahjane; Mac Donncha, Ciaran

2010-01-01

The purpose of this study was to assess the test-retest reliability and validity of a new Youth Physical Activity Self-Report measure. Heart rate and direct observation were employed as criterion measures with a sample of 79 children (aged 7-9 years). Spearman's rho correlation between self reported activity intensity and heart rate was 0.87 for…
A multicentre observational study to evaluate a new tool to assess emergency physicians' non-technical skills.

PubMed

Flowerdew, Lynsey; Gaunt, Arran; Spedding, Jessica; Bhargava, Ajay; Brown, Ruth; Vincent, Charles; Woloshynowych, Maria

2013-06-01

To evaluate a new tool to assess emergency physicians' non-technical skills. This was a multicentre observational study using data collected at four emergency departments in England. A proportion of observations used paired observers to obtain data for inter-rater reliability. Data were also collected for test-retest reliability, observability of skills, mean ratings and dispersion of ratings for each skill, as well as a comparison of skill level between hospitals. Qualitative data described the range of non-technical skills exhibited by trainees and identified sources of rater error. 96 assessments of 43 senior trainees were completed. At a scale level, intra-class coefficients were 0.575, 0.532 and 0.419 and using mean scores were 0.824, 0.702 and 0.519. Spearman's ρ for calculating test-retest reliability was 0.70 using mean scores. All skills were observed more than 60% of the time. The skill Maintenance of Standards received the lowest mean rating (4.8 on a nine-point scale) and the highest mean was calculated for Team Building (6.0). Two skills, Supervision & Feedback and Situational Awareness-Gathering Information, had significantly different distributions of ratings across the four hospitals (p<0.04 and 0.007, respectively), and this appeared to be related to the leadership roles of trainees. This study shows the performance of the assessment tool is acceptable and provides valuable information to structure the assessment and training of non-technical skills, especially in relation to leadership. The framework of skills may be used to identify areas for development in individual trainees, as well as guide other patient safety interventions.
Development of a novel observational measure for anxiety in young children: The Anxiety Dimensional Observation Scale

PubMed Central

Mian, Nicholas D.; Carter, Alice S.; Pine, Daniel S.; Wakschlag, Lauren S.; Briggs-Gowan, Margaret J.

2015-01-01

Background Identifying anxiety disorders in preschool-age children represents an important clinical challenge. Observation is essential to clinical assessment and can help differentiate normative variation from clinically significant anxiety. Yet, most anxiety assessment methods for young children rely on parent-reports. The goal of this article is to present and preliminarily test the reliability and validity of a novel observational paradigm for assessing a range of fearful and anxious behaviors in young children, the Anxiety Dimensional Observation Schedule (Anx-DOS). Methods A diverse sample of 403 children, aged 3 to 6 years, and their mothers was studied. Reliability and validity in relation to parent reports (Preschool Age Psychiatric Assessment) and known risk factors, including indicators of behavioral inhibition (latency to touch novel objects) and attention bias to threat (in the dot-probe task) were investigated. Results The Anx-DOS demonstrated good inter-rater reliability and internal consistency. Evidence for convergent validity was demonstrated relative to mother-reported separation anxiety, social anxiety, phobic avoidance, trauma symptoms, and past service use. Finally, fearfulness was associated with observed latency and attention bias toward threat. Conclusions Findings support the Anx-DOS as a method for capturing early manifestations of fearfulness and anxiety in young children. Multimethod assessments incorporating standardized methods for assessing discrete, observable manifestations of anxiety may be beneficial for early identification and clinical intervention efforts. PMID:25773515
A GIS-based assessment of the suitability of SCIAMACHY satellite sensor measurements for estimating reliable CO concentrations in a low-latitude climate.

PubMed

Fagbeja, Mofoluso A; Hill, Jennifer L; Chatterton, Tim J; Longhurst, James W S

2015-02-01

An assessment of the reliability of the Scanning Imaging Absorption Spectrometer for Atmospheric Cartography (SCIAMACHY) satellite sensor measurements to interpolate tropospheric concentrations of carbon monoxide considering the low-latitude climate of the Niger Delta region in Nigeria was conducted. Monthly SCIAMACHY carbon monoxide (CO) column measurements from January 2,003 to December 2005 were interpolated using ordinary kriging technique. The spatio-temporal variations observed in the reliability were based on proximity to the Atlantic Ocean, seasonal variations in the intensities of rainfall and relative humidity, the presence of dust particles from the Sahara desert, industrialization in Southwest Nigeria and biomass burning during the dry season in Northern Nigeria. Spatial reliabilities of 74 and 42 % are observed for the inland and coastal areas, respectively. Temporally, average reliability of 61 and 55 % occur during the dry and wet seasons, respectively. Reliability in the inland and coastal areas was 72 and 38 % during the wet season, and 75 and 46 % during the dry season, respectively. Based on the results, the WFM-DOAS SCIAMACHY CO data product used for this study is therefore relevant in the assessment of CO concentrations in developing countries within the low latitudes that could not afford monitoring infrastructure due to the required high costs. Although the SCIAMACHY sensor is no longer available, it provided cost-effective, reliable and accessible data that could support air quality assessment in developing countries.
Reliability of primary caregivers reports on lifestyle behaviours of European pre-school children: the ToyBox-study.

PubMed

González-Gil, E M; Mouratidou, T; Cardon, G; Androutsos, O; De Bourdeaudhuij, I; Góźdź, M; Usheva, N; Birnbaum, J; Manios, Y; Moreno, L A

2014-08-01

Reliable assessments of health-related behaviours are necessary for accurate evaluation on the efficiency of public health interventions. The aim of the current study was to examine the reliability of a self-administered primary caregivers questionnaire (PCQ) used in the ToyBox-intervention. The questionnaire consisted of six sections addressing sociodemographic and perinatal factors, water and beverages consumption, physical activity, snacking and sedentary behaviours. Parents/caregivers from six countries (Belgium, Bulgaria, Germany, Greece, Poland and Spain) were asked to complete the questionnaire twice within a 2-week interval. A total of 93 questionnaires were collected. Test-retest reliability was assessed using intra-class correlation coefficient (ICC). Reliability of the six questionnaire sections was assessed. A stronger agreement was observed in the questions addressing sociodemographic and perinatal factors as opposed to questions addressing behaviours. Findings showed that 92% of the ToyBox PCQ had a moderate-to-excellent test-retest reliability (defined as ICC values from 0.41 to 1) and less than 8% poor test-retest reliability (ICC < 0.40). Out of the total ICC values, 67% showed good-to-excellent reliability (ICC from 0.61 to 1). We conclude that the PCQ is a reliable tool to assess sociodemographic characteristics, perinatal factors and lifestyle behaviours of pre-school children and their families participating in the ToyBox-intervention. © 2014 World Obesity.
An analysis of functional shoulder movements during task performance using Dartfish movement analysis software.

PubMed

Khadilkar, Leenesh; MacDermid, Joy C; Sinden, Kathryn E; Jenkyn, Thomas R; Birmingham, Trevor B; Athwal, George S

2014-01-01

Video-based movement analysis software (Dartfish) has potential for clinical applications for understanding shoulder motion if functional measures can be reliably obtained. The primary purpose of this study was to describe the functional range of motion (ROM) of the shoulder used to perform a subset of functional tasks. A second purpose was to assess the reliability of functional ROM measurements obtained by different raters using Dartfish software. Ten healthy participants, mean age 29 ± 5 years, were videotaped while performing five tasks selected from the Disabilities of the Arm, Shoulder and Hand (DASH). Video cameras and markers were used to obtain video images suitable for analysis in Dartfish software. Three repetitions of each task were performed. Shoulder movements from all three repetitions were analyzed using Dartfish software. The tracking tool of the Dartfish software was used to obtain shoulder joint angles and arcs of motion. Test-retest and inter-rater reliability of the measurements were evaluated using intraclass correlation coefficients (ICC). Maximum (coronal plane) abduction (118° ± 16°) and (sagittal plane) flexion (111° ± 15°) was observed during 'washing one's hair;' maximum extension (-68° ± 9°) was identified during 'washing one's own back.' Minimum shoulder ROM was observed during 'opening a tight jar' (33° ± 13° abduction and 13° ± 19° flexion). Test-retest reliability (ICC = 0.45 to 0.94) suggests high inter-individual task variability, and inter-rater reliability (ICC = 0.68 to 1.00) showed moderate to excellent agreement. KEY FINDINGS INCLUDE: 1) functional shoulder ROM identified in this study compared to similar studies; 2) healthy individuals require less than full ROM when performing five common ADL tasks 3) high participant variability was observed during performance of the five ADL tasks; and 4) Dartfish software provides a clinically relevant tool to analyze shoulder function.
Reliability analysis for radiographic measures of lumbar lordosis in adult scoliosis: a case–control study comparing 6 methods

PubMed Central

Hong, Jae Young; Modi, Hitesh N.; Hur, Chang Yong; Song, Hae Ryong; Park, Jong Hoon

2010-01-01

Several methods are used to measure lumbar lordosis. In adult scoliosis patients, the measurement is difficult due to degenerative changes in the vertebral endplate as well as the coronal and sagittal deformity. We did the observational study with three examiners to determine the reliability of six methods for measuring the global lumbar lordosis in adult scoliosis patients. Ninety lateral lumbar radiographs were collected for the study. The radiographs were divided into normal (Cobb < 10°), low-grade (Cobb 10°–19°), high-grade (Cobb ≥ 20°) group to determine the reliability of Cobb L1–S1, Cobb L1–L5, centroid, posterior tangent L1–S1, posterior tangent L1–L5 and TRALL method in adult scoliosis. The 90 lateral radiographs were measured twice by each of the three examiners using the six measurement methods. The data was analyzed to determine the inter- and intra-observer reliability. In general, for the six radiographic methods, the inter- and intra-class correlation coefficients (ICCs) were all ≥0.82. A comparison of the ICCs and 95% CI for the inter- and intra-observer reliability between the groups with varying degrees of scoliosis showed that, the reliability of the lordosis measurement decreased with increasing severity of scoliosis. In Cobb L1–S1, centroid and posterior tangent L1–S1 methods, the ICCs were relatively lower in the high-grade scoliosis group (≥0.60). And, the mean absolute difference (MAD) in these methods was high in the high-grade scoliosis group (≤7.17°). However, in the Cobb L1–L5 and posterior tangent L1–L5 method, the ICCs were ≥0.86 in all groups. And, in the TRALL method, the ICCs were ≥0.76 in all groups. In addition, in the Cobb L1–L5 and posterior tangent L1–L5 method, the MAD was ≤3.63°. And, in the TRALL method, the MAD was ≤3.84° in all groups. We concluded that the Cobb L1–L5 and the posterior tangent L1–L5 methods are reliable methods for measuring the global lumbar lordosis in adult scoliosis. And the TRALL method is more reliable method than other methods which include the L5–S1 joint in lordosis measurement. PMID:20437183
Reliability of concentrations of organophosphate pesticide metabolites in serial urine specimens from pregnancy in the Generation R Study.

PubMed

Spaan, Suzanne; Pronk, Anjoeka; Koch, Holger M; Jusko, Todd A; Jaddoe, Vincent W V; Shaw, Pamela A; Tiemeier, Henning M; Hofman, Albert; Pierik, Frank H; Longnecker, Matthew P

2015-05-01

The widespread use of organophosphate (OP) pesticides has resulted in ubiquitous exposure in humans, primarily through their diet. Exposure to OP pesticides may have adverse health effects, including neurobehavioral deficits in children. The optimal design of new studies requires data on the reliability of urinary measures of exposure. In the present study, urinary concentrations of six dialkyl phosphate (DAP) metabolites, the main urinary metabolites of OP pesticides, were determined in 120 pregnant women participating in the Generation R Study in Rotterdam. Intra-class correlation coefficients (ICCs) across serial urine specimens taken at <18, 18-25, and >25 weeks of pregnancy were determined to assess reliability. Geometric mean total DAP metabolite concentrations were 229 (GSD 2.2), 240 (GSD 2.1), and 224 (GSD 2.2) nmol/g creatinine across the three periods of gestation. Metabolite concentrations from the serial urine specimens in general correlated moderately. The ICCs for the six DAP metabolites ranged from 0.14 to 0.38 (0.30 for total DAPs), indicating weak to moderate reliability. Although the DAP metabolite levels observed in this study are slightly higher and slightly more correlated than in previous studies, the low to moderate reliability indicates a high degree of within-person variability, which presents challenges for designing well-powered epidemiological studies.
Further Study of the Choice of Anchor Tests in Equating

ERIC Educational Resources Information Center

Trierweiler, Tammy J.; Lewis, Charles; Smith, Robert L.

2016-01-01

In this study, we describe what factors influence the observed score correlation between an (external) anchor test and a total test. We show that the anchor to full-test observed score correlation is based on two components: the true score correlation between the anchor and total test, and the reliability of the anchor test. Findings using an…
Effect of clinical information and previous exam execution on observer agreement and reliability in the analysis of hysteroscopic video-recordings.

PubMed

Martinho, Margarida Suzel Lopes; da Costa Santos, Cristina Maria Nogueira; Silva Carvalho, João Luís Mendonça; Bernardes, João Francisco Montenegro Andrade Lima

2018-02-01

Inter-observer agreement and reliability in hysteroscopic image assessment remain uncertain and the type of factors that may influence it has only been studied in relation to the experience of hysteroscopists. We aim to assess the effect of clinical information and previous exam execution on observer agreement and reliability in the analysis of hysteroscopic video-recordings. Ninety hysteroscopies were video-recorded and randomized into a group without (Group 1) and with clinical information (Group 2). The videos were independently analyzed by three hysteroscopists, regarding lesion location, dimension, and type, as well as decision to perform a biopsy. One of the hysteroscopists had executed all the exams before. Proportions of agreement (PA) and kappa statistics (κ) with 95% confidence intervals (95% CI) were used. In Group 2, there was a higher proportion of a normal diagnosis (p < 0.001) and a lower proportion of biopsies recommended (p = 0.027). Observer agreement and reliability were better in Group 2, with the PA and κ ranging, respectively, from 0.73 (95% CI 0.62, 0.83) and 0.44 (95% CI 0.26, 0.63), for image quality, to 0.94 (95% CI 0.88, 0.99) and 0.85 (95% CI 0.65, 0.95), for the decision to perform a biopsy. Execution of the exams before the analysis of the video-recordings did not significantly affect the results. With clinical information, agreement and reliability in the overall analysis of hysteroscopic video-recordings may reach almost perfect results and this was not significantly affected by the execution of the exams before the analysis. However, there is still uncertainty in the analysis of specific endometrial cavity abnormalities.

Validity and reliability of the Fitbit Zip as a measure of preschool children’s step count

PubMed Central

Sharp, Catherine A; Mackintosh, Kelly A; Erjavec, Mihela; Pascoe, Duncan M; Horne, Pauline J

2017-01-01

Objectives Validation of physical activity measurement tools is essential to determine the relationship between physical activity and health in preschool children, but research to date has not focused on this priority. The aims of this study were to ascertain inter-rater reliability of observer step count, and interdevice reliability and validity of Fitbit Zip accelerometer step counts in preschool children. Methods Fifty-six children aged 3–4 years (29 girls) recruited from 10 nurseries in North Wales, UK, wore two Fitbit Zip accelerometers while performing a timed walking task in their childcare settings. Accelerometers were worn in secure pockets inside a custom-made tabard. Video recordings enabled two observers to independently code the number of steps performed in 3 min by each child during the walking task. Intraclass correlations (ICCs), concordance correlation coefficients, Bland-Altman plots and absolute per cent error were calculated to assess the reliability and validity of the consumer-grade device. Results An excellent ICC was found between the two observer codings (ICC=1.00) and the two Fitbit Zips (ICC=0.91). Concordance between the Fitbit Zips and observer counts was also high (r=0.77), with an acceptable absolute per cent error (6%–7%). Bland-Altman analyses identified a bias for Fitbit 1 of 22.8±19.1 steps with limits of agreement between −14.7 and 60.2 steps, and a bias for Fitbit 2 of 25.2±23.2 steps with limits of agreement between −20.2 and 70.5 steps. Conclusions Fitbit Zip accelerometers are a reliable and valid method of recording preschool children’s step count in a childcare setting. PMID:29081984
Three-dimensional facial anthropometry of unilateral cleft lip infants with a structured light scanning system.

PubMed

Li, Guanghui; Wei, Jianhua; Wang, Xi; Wu, Guofeng; Ma, Dandan; Wang, Bo; Liu, Yanpu; Feng, Xinghua

2013-08-01

Cleft lip in the presence or absence of a cleft palate is a major public health problem. However, few studies have been published concerning the soft-tissue morphology of cleft lip infants. Currently, obtaining reliable three-dimensional (3D) surface models of infants remains a challenge. The aim of this study was to investigate a new way of capturing 3D images of cleft lip infants using a structured light scanning system. In addition, the accuracy and precision of the acquired facial 3D data were validated and compared with direct measurements. Ten unilateral cleft lip patients were enrolled in the study. Briefly, 3D facial images of the patients were acquired using a 3D scanner device before and after the surgery. Fourteen items were measured by direct anthropometry and 3D image software. The accuracy and precision of the 3D system were assessed by comparative analysis. The anthropometric data obtained using the 3D method were in agreement with the direct anthropometry measurements. All data calculated by the software were 'highly reliable' or 'reliable', as defined in the literature. The localisation of four landmarks was not consistent in repeated experiments of inter-observer reliability in preoperative images (P<0.05), while the intra-observer reliability in both pre- and postoperative images was good (P>0.05). The structured light scanning system is proven to be a non-invasive, accurate and precise method in cleft lip anthropometry. Copyright © 2013 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. All rights reserved.
Proximal humeral fracture classification systems revisited.

PubMed

Majed, Addie; Macleod, Iain; Bull, Anthony M J; Zyto, Karol; Resch, Herbert; Hertel, Ralph; Reilly, Peter; Emery, Roger J H

2011-10-01

This study evaluated several classification systems and expert surgeons' anatomic understanding of these complex injuries based on a consecutive series of patients. We hypothesized that current proximal humeral fracture classification systems, regardless of imaging methods, are not sufficiently reliable to aid clinical management of these injuries. Complex fractures in 96 consecutive patients were investigated by generation of rapid sequence prototyping models from computed tomography Digital Imaging and Communications in Medicine (DICOM) imaging data. Four independent senior observers were asked to classify each model using 4 classification systems: Neer, AO, Codman-Hertel, and a prototype classification system by Resch. Interobserver and intraobserver κ coefficient values were calculated for the overall classification system and for selected classification items. The κ coefficient values for the interobserver reliability were 0.33 for Neer, 0.11 for AO, 0.44 for Codman-Hertel, and 0.15 for Resch. Interobserver reliability κ coefficient values were 0.32 for the number of fragments and 0.30 for the anatomic segment involved using the Neer system, 0.30 for the AO type (A, B, C), and 0.53, 0.48, and 0.08 for the Resch impaction/distraction, varus/valgus and flexion/extension subgroups, respectively. Three-part fractures showed low reliability for the Neer and AO systems. Currently available evidence suggests fracture classifications in use have poor intra- and inter-observer reliability despite the modality of imaging used thus making treating these injuries difficult as weak as affecting scientific research as well. This study was undertaken to evaluate the reliability of several systems using rapid sequence prototype models. Overall interobserver κ values represented slight to moderate agreement. The most reliable interobserver scores were found with the Codman-Hertel classification, followed by elements of Resch's trial system. The AO system had the lowest values. The higher interobserver reliability values for the Codman-Hertel system showed that is the only comprehensive fracture description studied, whereas the novel classification by Resch showed clear definition in respect to varus/valgus and impaction/distraction angulation. Copyright © 2011 Journal of Shoulder and Elbow Surgery Board of Trustees. All rights reserved.
Reliability of tristimulus colourimetry in the assessment of cutaneous bruise colour.

PubMed

Scafide, Katherine N; Sheridan, Daniel J; Taylor, Laura A; Hayat, Matthew J

2016-06-01

Bruising is one of the most common types of injury clinicians observe among victims of violence and other trauma patients. However, research has shown commonly used qualitative description of cutaneous bruise colour via the naked eye is subjective and unreliable. No published work has formally evaluated the reliability of tristimulus colourimetry as an alternative for assessing bruise colour, despite its clinical and research applications in accurately assessing skin colour. The purpose of this study was to systematically evaluate the test-retest and inter-observer reliability of tristimulus colourimetry in the assessment of cutaneous bruise colour. Two researchers obtained repeated tristimulus colourimetry measures of cutaneous bruises with participants of diverse skin colour. Measures were obtained using the Minolta CR-400 Chomameter. Commission Internationale d'Eclairage (CIE) L*a*b* colour space was used. Data was analysed using intraclass correlation coefficients (ICC), Cronbach's alpha, and minimal detectable change (MDC) on all three L*a*b* values. The colorimeter demonstrated excellent test-retest or intra-rater reliability (L* ICC=0.999; a* ICC=0.973; b* ICC=0.892) and inter-rater reliability (L* ICC=0.997; a* ICC=0.976; b* ICC=0.982). With consistent placement, the tristimulus colourimetry is reliable for the objective assessment and documentation of cutaneous bruise colour for purposes of clinical practice and research. Recommendations for use in practice/research are provided. Copyright © 2016 Elsevier Ltd. All rights reserved.
Reliability and agreement on embryo assessment: 5 years of an external quality control programme.

PubMed

Martínez-Granados, Luis; Serrano, María; González-Utor, Antonio; Ortiz, Nereyda; Badajoz, Vicente; López-Regalado, María Luisa; Boada, Montserrat; Castilla, Jose A

2018-03-01

An external quality-control programme for morphology-based embryo quality assessment, incorporating a standardized embryo grading scheme, was evaluated over a period of 5 years to determine levels of inter-observer reliability and agreement between practising clinical embryologists at IVF centres and the opinions of a panel of experts. Following Guidelines for Reporting Reliability and Agreement Studies, the Gwet index and proportion of positive (Ppos) and negative agreement were calculated. For embryo morphology assessment, a substantial degree of reliability was measured between the centres and the panel of experts (Gwet index: 0.76; 95% CI 0.70 to 0.84). The agreement was higher for good- versus poor-quality embryos. When multinucleation or vacuoles were observed, low levels of reliability were obtained (Ppos: 0.56 and 0.43, respectively). In blastocysts, the characteristic that presented the largest discrepancy was that related to the inner cell mass. In decisions about the final disposition of the embryo, reliability between centre and the panel of experts was moderate (Gwet index: 0.51; 95% CI 0.41 to 0.60). In conclusion, the ability of clinical embryologists to evaluate the presence of multinucleation and vacuoles in the early cleavage embryo, and to determine the category of the inner cell mass in blastocysts, needs to be improved. Copyright © 2017 Reproductive Healthcare Ltd. All rights reserved.
The reliability of a modified Kalamazoo Consensus Statement Checklist for assessing the communication skills of multidisciplinary clinicians in the simulated environment.

PubMed

Peterson, Eleanor B; Calhoun, Aaron W; Rider, Elizabeth A

2014-09-01

With increased recognition of the importance of sound communication skills and communication skills education, reliable assessment tools are essential. This study reports on the psychometric properties of an assessment tool based on the Kalamazoo Consensus Statement Essential Elements Communication Checklist. The Gap-Kalamazoo Communication Skills Assessment Form (GKCSAF), a modified version of an existing communication skills assessment tool, the Kalamazoo Essential Elements Communication Checklist-Adapted, was used to assess learners in a multidisciplinary, simulation-based communication skills educational program using multiple raters. 118 simulated conversations were available for analysis. Internal consistency and inter-rater reliability were determined by calculating a Cronbach's alpha score and intra-class correlation coefficients (ICC), respectively. The GKCSAF demonstrated high internal consistency with a Cronbach's alpha score of 0.844 (faculty raters) and 0.880 (peer observer raters), and high inter-rater reliability with an ICC of 0.830 (faculty raters) and 0.89 (peer observer raters). The Gap-Kalamazoo Communication Skills Assessment Form is a reliable method of assessing the communication skills of multidisciplinary learners using multi-rater methods within the learning environment. The Gap-Kalamazoo Communication Skills Assessment Form can be used by educational programs that wish to implement a reliable assessment and feedback system for a variety of learners. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Validity of the Autism Spectrum Disorder Observation for Children (ASD-OC)

ERIC Educational Resources Information Center

Neal, Daniene; Matson, Johnny L.; Hattier, Megan A.

2014-01-01

The Autism Spectrum Disorder Observation for Children (ASD-OC) is a 45-item observation scale used to assess autistic symptomatology. The reliability of this measure has been established in previous research; therefore, the purpose of this study is to evaluate its validity among a sample of children (1-15 years). The large correlation between the…
Validity and reliability of Persian version of Listening Styles Profile-Revised (LSP- R) in Iranian students.

PubMed

Fatehi, Zahra; Baradaran, Hamid Reza; Asadpour, Mohamad; Rezaeian, Mohsen

2017-01-01

Background: Individuals' listening styles differs based on their characters, professions and situations. This study aimed to assess the validity and reliability of Listening Styles Profile- Revised (LSP- R) in Iranian students. Methods: After translating into Persian, LSP-R was employed in a sample of 240 medical and nursing Persian speaking students in Iran. Statistical analysis was performed to test the reliability and validity of the LSP-R. Results: The study revealed high internal consistency and good test-retest reliability for the Persian version of the questionnaire. The Cronbach's alpha coefficient was 0.72 and intra-class correlation coefficient 0.87. The means for the content validity index and the content validity ratio (CVR) were 0.90 and 0.83, respectively. Exploratory factor analysis (EFA) yielded a four-factor solution accounted for 60.8% of the observed variance. Majority of medical students (73%) as well as majority of nursing students (70%) stated that their listening styles were task-oriented. Conclusion: In general, the study finding suggests that the Persian version of LSP-R is a valid and reliable instrument for assessing listening styles profile in the studied sample.
Accuracy and reliability of pulp/tooth area ratio in upper canines by peri-apical X-rays.

PubMed

Azevedo, A C; Michel-Crosato, E; Biazevic, M G H; Galić, I; Merelli, V; De Luca, S; Cameriere, R

2014-11-01

Due to the real need for careful staff training in age assessment, in order to improve capacity, consistency and competence, new research on the reliability and repeatability of methods frequently used in age assessment are required. The aim of this study was twofold: first, to test the accuracy of this method for age estimation; second, to obtain data on the reliability of this technique. A sample of 81 peri-apical radiographs of upper canines (44 men and 37 women), aged between 19 and 74years, was used; the teeth were taken from the osteological collection of Sassari (Sardinia, Italy). Three blinded observers used the technique in order to perform the age estimation. The mean real age of the 81 observations was 37.21 (CI95% 34.37 40.05), and estimated ages ranged from 36.65 to 38.99 (CI95%-Ex1 35.42; 41.28; CI95%-Ex2 33.89; 39.41; CI95%-Ex3 35.92; 42.06). The module differences found by the three observers were 3.43, 4.24 and 4.45, respectively for Ex1×Ex2, Ex1×Ex3 and Ex2×Ex3. The module differences observed among real and observed ages were 2.55 (CI95% 1.90; 3.20), 2.22 (CI95% 1.65; 2.78) and 4.39 (CI95% 3.80; 5.75), respectively for Ex1, Ex2 and Ex3. No differences were observed among measurements. This technique can be reproduced and repeated after proper training, since it was found high reliability and accuracy. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Reliability analysis of visual ranking of coronary artery calcification on low-dose CT of the thorax for lung cancer screening: comparison with ECG-gated calcium scoring CT.

PubMed

Kim, Yoon Kyung; Sung, Yon Mi; Cho, So Hyun; Park, Young Nam; Choi, Hye-Young

2014-12-01

Coronary artery calcification (CAC) is frequently detected on low-dose CT (LDCT) of the thorax. Concurrent assessment of CAC and lung cancer screening using LDCT is beneficial in terms of cost and radiation dose reduction. The aim of our study was to evaluate the reliability of visual ranking of positive CAC on LDCT compared to Agatston score (AS) on electrocardiogram (ECG)-gated calcium scoring CT. We studied 576 patients who were consecutively registered for health screening and undergoing both LDCT and ECG-gated calcium scoring CT. We excluded subjects with an AS of zero. The final study cohort included 117 patients with CAC (97 men; mean age, 53.4 ± 8.5). AS was used as the gold standard (mean score 166.0; range 0.4-3,719.3). Two board-certified radiologists and two radiology residents participated in an observer performance study. Visual ranking of CAC was performed according to four categories (1-10, 11-100, 101-400, and 401 or higher) for coronary artery disease risk stratification. Weighted kappa statistics were used to measure the degree of reliability on visual ranking of CAC on LDCT. The degree of reliability on visual ranking of CAC on LDCT compared to ECG-gated calcium scoring CT was excellent for board-certified radiologists and good for radiology residents. A high degree of association was observed with 71.6% of visual rankings in the same category as the Agatston category and 98.9% varying by no more than one category. Visual ranking of positive CAC on LDCT is reliable for predicting AS rank categorization.
Quantitative outcome measures for systemic sclerosis-related Microangiopathy - Reliability of image acquisition in Nailfold Capillaroscopy.

PubMed

Dinsdale, Graham; Moore, Tonia; O'Leary, Neil; Berks, Michael; Roberts, Christopher; Manning, Joanne; Allen, John; Anderson, Marina; Cutolo, Maurizio; Hesselstrand, Roger; Howell, Kevin; Pizzorni, Carmen; Smith, Vanessa; Sulli, Alberto; Wildt, Marie; Taylor, Christopher; Murray, Andrea; Herrick, Ariane L

2017-09-01

Nailfold capillaroscopic parameters hold increasing promise as outcome measures for clinical trials in systemic sclerosis (SSc). Their inclusion as outcomes would often naturally require capillaroscopy images to be captured at several time points during any one study. Our objective was to assess repeatability of image acquisition (which has been little studied), as well as of measurement. 41 patients (26 with SSc, 15 with primary Raynaud's phenomenon) and 10 healthy controls returned for repeat high-magnification (300×) videocapillaroscopy mosaic imaging of 10 digits one week after initial imaging (as part of a larger study of reliability). Images were assessed in a random order by an expert blinded observer and 4 outcome measures extracted: (1) overall image grade and then (where possible) distal vessel locations were marked, allowing (2) vessel density (across the whole nailfold) to be calculated (3) apex width measurement and (4) giant vessel count. Intra-rater, intra-visit and intra-rater inter-visit (baseline vs. 1week) reliability were examined in 475 and 392 images respectively. A linear, mixed-effects model was used to estimate variance components, from which intra-class correlation coefficients (ICCs) were determined. Intra-visit and inter-visit reliability estimates (ICCs) were (respectively): overall image grade, 0.97 and 0.90; vessel density, 0.92 and 0.65; mean vessel width, 0.91 and 0.79; presence of giant capillary, 0.68 and 0.56. These estimates were conditional on each parameter being measurable. Within-operator image analysis and acquisition are reproducible. Quantitative nailfold capillaroscopy, at least with a single observer, provides reliable outcome measures for clinical studies including randomised controlled trials. Copyright © 2017 Elsevier Inc. All rights reserved.
Reliability of ultrasound thickness measurement of the abdominal muscles during clinical isometric endurance tests.

PubMed

ShahAli, Shabnam; Arab, Amir Massoud; Talebian, Saeed; Ebrahimi, Esmaeil; Bahmani, Andia; Karimi, Noureddin; Nabavi, Hoda

2015-07-01

The study was designed to evaluate the intra-examiner reliability of ultrasound (US) thickness measurement of abdominal muscles activity when supine lying and during two isometric endurance tests in subjects with and without Low back pain (LBP). A total of 19 women (9 with LBP, 10 without LBP) participated in the study. Within-day reliability of the US thickness measurements at supine lying and the two isometric endurance tests were assessed in all subjects. The intra-class correlation coefficient (ICC) was used to assess the relative reliability of thickness measurement. The standard error of measurement (SEM), minimal detectable change (MDC) and the coefficient of variation (CV) were used to evaluate the absolute reliability. Results indicated high ICC scores (0.73-0.99) and also small SEM and MDC scores for within-day reliability assessment. The Bland-Altman plots of agreement in US measurement of the abdominal muscles during the two isometric endurance tests demonstrated that 95% of the observations fall between the limits of agreement for test and retest measurements. Together the results indicate high intra-tester reliability for the US measurement of the thickness of abdominal muscles in all the positions tested. According to the study's findings, US imaging can be used as a reliable method for assessment of abdominal muscles activity in supine lying and the two isometric endurance tests employed, in participants with and without LBP. Copyright © 2014 Elsevier Ltd. All rights reserved.
Behavior and neural basis of near-optimal visual search

PubMed Central

Ma, Wei Ji; Navalpakkam, Vidhya; Beck, Jeffrey M; van den Berg, Ronald; Pouget, Alexandre

2013-01-01

The ability to search efficiently for a target in a cluttered environment is one of the most remarkable functions of the nervous system. This task is difficult under natural circumstances, as the reliability of sensory information can vary greatly across space and time and is typically a priori unknown to the observer. In contrast, visual-search experiments commonly use stimuli of equal and known reliability. In a target detection task, we randomly assigned high or low reliability to each item on a trial-by-trial basis. An optimal observer would weight the observations by their trial-to-trial reliability and combine them using a specific nonlinear integration rule. We found that humans were near-optimal, regardless of whether distractors were homogeneous or heterogeneous and whether reliability was manipulated through contrast or shape. We present a neural-network implementation of near-optimal visual search based on probabilistic population coding. The network matched human performance. PMID:21552276
Test-retest and inter- and intrareliability of the quality of the upper-extremity skills test in preschool-age children with cerebral palsy.

PubMed

Haga, Nienke; van der Heijden-Maessen, Hélène C; van Hoorn, Jessika F; Boonstra, Anne M; Hadders-Algra, Mijna

2007-12-01

To investigate the test-retest, inter-, and intraobserver reliability of the Quality of Upper Extremity Skills Test (QUEST) in young children with cerebral palsy (CP). For test-retest reliability, a test-retest design was used; for the intra- and interobserver reliability, the videotaped test was scored on 2 occasions by 1 observer and by various observers. Groups of preschool-age children in 2 general rehabilitation centers. Twenty-one children with CP (12 boys, 9 girls) aged 2 to 4.5 years (mean, 39 mo). Not applicable. Spearman correlation coefficient. The data indicated that test-retest reliability was strong (rho range, .85-.94). Intraobserver agreement (rho range, .63-.95) and agreement between various observers (rho range, .72-.90) were moderate to strong. Test-retest and inter- and intraobserver reliability of the QUEST in preschool-age children with CP is good.
Validity and Reliability of Accelerometers in Patients With COPD: A SYSTEMATIC REVIEW.

PubMed

Gore, Shweta; Blackwood, Jennifer; Guyette, Mary; Alsalaheen, Bara

2018-05-01

Reduced physical activity is associated with poor prognosis in chronic obstructive pulmonary disease (COPD). Accelerometers have greatly improved quantification of physical activity by providing information on step counts, body positions, energy expenditure, and magnitude of force. The purpose of this systematic review was to compare the validity and reliability of accelerometers used in patients with COPD. An electronic database search of MEDLINE and CINAHL was performed. Study quality was assessed with the Strengthening the Reporting of Observational Studies in Epidemiology checklist while methodological quality was assessed using the modified Quality Appraisal Tool for Reliability Studies. The search yielded 5392 studies; 25 met inclusion criteria. The SenseWear Pro armband reported high criterion validity under controlled conditions (r = 0.75-0.93) and high reliability (ICC = 0.84-0.86) for step counts. The DynaPort MiniMod demonstrated highest concurrent validity for step count using both video and manual methods. Validity of the SenseWear Pro armband varied between studies especially in free-living conditions, slower walking speeds, and with addition of weights during gait. A high degree of variability was found in the outcomes used and statistical analyses performed between studies, indicating a need for further studies to measure reliability and validity of accelerometers in COPD. The SenseWear Pro armband is the most commonly used accelerometer in COPD, but measurement properties are limited by gait speed variability and assistive device use. DynaPort MiniMod and Stepwatch accelerometers demonstrated high validity in patients with COPD but lack reliability data.
Identifying Creative Activities in Preschool Children.

ERIC Educational Resources Information Center

Keily, Margaret Mary

This study compared the creative self-direction, creative behavior, and creative activities of preschool children to determine if students and teachers trained in the creative process and in observation techniques can, with reliability, observe the creative potential of young children. Creative abilities of 155 children from four preschool centers…
Evaluation of Two Observational Assessment Systems for Children's Development and Learning

ERIC Educational Resources Information Center

Kim, Do-Hong; Smith, JaneDiane

2010-01-01

This study provided preliminary evidence for the reliability and validity of "Teaching Strategies GOLD", a recently developed observational system for assessing young children's development and learning. The measurement properties of "Teaching Strategies GOLD" were compared with those of an older instrument, "The Creative…
Teaching Historical Contextualization: The Construction of a Reliable Observation Instrument

ERIC Educational Resources Information Center

Huijgen, Tim; van de Grift, Wim; van Boxtel, Carla; Holthuis, Paul

2017-01-01

Since the 1970s, many observation instruments have been constructed to map teachers' general pedagogic competencies. However, few of these instruments focus on teachers' subject-specific competencies. This study presents the development of the "Framework for Analyzing the Teaching of Historical Contextualization" (FAT-HC). This…
Reliability and Validity of the Acanthosis Nigricans Screening Tool for Use in Elementary School-Age Children by School Nurses

ERIC Educational Resources Information Center

Scott, Leslie K.; Hall, Lynne M.

2012-01-01

The purpose of this study was to test the reliability and validity of an acanthosis nigricans (AN) screening tool for use in elementary school-age children of different ethnic groups. Cross-sectional data were collected via observation of 288, 5- to 12-year-old school-age children. Three nurse clinicians used a 0-4 grade AN screening tool to rate…
The reliability, validity, and feasibility of physical activity measurement in adults with traumatic brain injury: an observational study.

PubMed

Hassett, Leanne; Moseley, Anne; Harmer, Alison; van der Ploeg, Hidde P

2015-01-01

To determine the reliability and validity of the Physical Activity Scale for Individuals with a Physical Disability (PASIPD) in adults with severe traumatic brain injury (TBI) and estimate the proportion of the sample participants who fail to meet the World Health Organization guidelines for physical activity. A single-center observational study recruited a convenience sample of 30 community-based ambulant adults with severe TBI. Participants completed the PASIPD on 2 occasions, 1 week apart, and wore an accelerometer (ActiGraph GT3X; ActiGraph LLC, Pensacola, Florida) for the 7 days between these 2 assessments. The PASIPD test-retest reliability was substantial (intraclass correlation coefficient = 0.85; 95% confidence interval, 0.70-0.92), and the correlation with the accelerometer ranged from too low to be meaningful (R = 0.09) to moderate (R = 0.57). From device-based measurement of physical activity, 56% of participants failed to meet the World Health Organization physical activity guidelines. The PASIPD is a reliable measure of the type of physical activity people with severe TBI participate in, but it is not a valid measure of the amount of moderate to vigorous physical activity in which they engage. Accelerometers should be used to quantify moderate to vigorous physical activity in people with TBI.

Inter- and intra-operator reliability and repeatability of shear wave elastography in the liver: a study in healthy volunteers.

PubMed

Hudson, John M; Milot, Laurent; Parry, Craig; Williams, Ross; Burns, Peter N

2013-06-01

This study assessed the reproducibility of shear wave elastography (SWE) in the liver of healthy volunteers. Intra- and inter-operator reliability and repeatability were quantified in three different liver segments in a sample of 15 subjects, scanned during four independent sessions (two scans on day 1, two scans 1 wk later) by two operators. A total of 1440 measurements were made. Reproducibility was assessed using the intra-class correlation coefficient (ICC) and a repeated measures analysis of variance. The shear wave speed was measured and used to estimate Young's modulus using the Supersonics Imagine Aixplorer. The median Young's modulus measured through the inter-costal space was 5.55 ± 0.74 kPa. The intra-operator reliability was better for same-day evaluations (ICC = 0.91) than the inter-operator reliability (ICC = 0.78). Intra-observer agreement decreased when scans were repeated on a different day. Inter-session repeatability was between 3.3% and 9.9% for intra-day repeated scans, compared with to 6.5%-12% for inter-day repeated scans. No significant difference was observed in subjects with a body mass index greater or less than 25 kg/m(2). Copyright © 2013 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
Evaluation of the psychometric properties of the phlebitis and infiltration scales for the assessment of complications of peripheral vascular access devices.

PubMed

Groll, Dianne; Davies, Barbara; Mac Donald, Joan; Nelson, Susanne; Virani, Tazim

2010-01-01

To prevent complications from peripheral vascular access device (PVAD) therapy, the Infusion Nurses Society (INS) developed 2 scales to measure the extent and severity of phlebitis and infiltration in PVADs. This study evaluated the psychometric properties of these scales to validate them with respect to their interrater reliability, concurrent validity, feasibility, and acceptability. A total of 182 patients at 2 sites were enrolled, and 416 observations of PVAD sites were made. Two nurses independently rated each PVAD site for the presence or absence of phlebitis and/or infiltration by using the INS scales. The interrater reliability was calculated, as was the agreement of the observed versus charted incidence of phlebitis and infiltration (concurrent validity) and the ease of use of the scales (feasibility, acceptability). Interrater reliability for both the Phlebitis and Infiltration scales and concurrent validity were found to be statistically significant (P < .05). The study nurses reported the scales to be easy to use, taking an average of 1.3 minutes to complete both. The importance of valid measures for use in research cannot be underestimated. The INS Phlebitis and Infiltration scales have been shown to be easy to use, valid, and reliable scales.
TURKISH VERSION QUALITY OF LIFE IN ESSENTIAL TREMOR QUESTIONNAIRE (QUEST): VALIDITY AND RELIABILITY STUDY.

PubMed

Güler, Sibel; Turan, F Nesrin

2015-09-30

Our aim was to translate the Quality of Life in Essential Tremor Questionnaire (QUEST) advanced by Troster (2005) and to analyse the validity and reliability of this questionnaire. Two hundred twelve consecutive patients with essential tremor (ET) and forty-three control subjects were included in the study. Permission for the translation and validation of the QUEST scale was obtained. The translation was performed according to the guidelines provided by the publisher. After the translation, the final version of the scale was administered to both groups to determine its reliability and validity. The QUEST Physical, Psychosocial, communication, Hobbies/leisure and Work/finance scores were 0.967, 0.968, 0.933, 0.964 and 0.925, respectively. There were good correlations between each of the QUEST scores that were indicative of good internal consistency. Additionally, we observed that all of the QUEST scores were most strongly related to the right and left arms (p=0.0001). However, we observed that all of the QUEST scores were weakly related to the voice, head and right leg (p=0.0001). These findings support the notion that the Turkish version of the Quality of Life in Essential Tremor (QUEST) questionnaire is a valid and reliable tool for the assessment of the quality of life of patients with ET.
Reliability and Validity of the PAQ-C Questionnaire to Assess Physical Activity in Children.

PubMed

Benítez-Porres, Javier; López-Fernández, Iván; Raya, Juan Francisco; Álvarez Carnero, Sabrina; Alvero-Cruz, José Ramón; Álvarez Carnero, Elvis

2016-09-01

Physical activity (PA) assessment by questionnaire is a cornerstone in the field of sport epidemiology studies. The Physical Activity Questionnaire for Children (PAQ-C) has been used widely to assess PA in healthy school populations. The aim of this study was to evaluate the reliability and validity of the PAQ-C questionnaire in Spanish children using triaxial accelerometry as criterion. Eighty-three (N = 46 boys, N = 37 girls) healthy children (age 10.98 ± 1.17 years, body mass index 19.48 ± 3.51 kg/m(2) ) were volunteers and completed the PAQ-C twice and wore an accelerometer for 8 consecutive days. Reliability was analyzed by the intraclass correlation coefficient (ICC) and the internal consistency by the Cronbach's α coefficient. The PAQ-C was compared against total PA and moderate to vigorous PA (MVPA) obtained by accelerometry. Test-retest reliability showed an ICC = 0.96 for the final score of PAQ-C. Small differences between first and second questionnaire administration were detected. Few and low correlations (rho = 0.228-0.278, all ps < .05) were observed between PAQ-C and accelerometry. The highest correlation was observed for item 9 (rho = 0.311, p < .01). PAQ-C had a high reliability but a questionable validity for assessing total PA and MVPA in Spanish children. Therefore, PA measurement in children should not be limited only to self-report measurements. © 2016, American School Health Association.
Reliability of the Cooking Task in adults with acquired brain injury.

PubMed

Poncet, Frédérique; Swaine, Bonnie; Taillefer, Chantal; Lamoureux, Julie; Pradat-Diehl, Pascale; Chevignard, Mathilde

2015-01-01

Acquired brain injury (ABI) often leads to deficits in executive functioning (EF) responsible for severe and long-standing disabilities in daily life activities. The Cooking Task is an ecological and valid test of EF involving multi-tasking in a real environment. Given its complex scoring system, it is important to establish the tool's reliability. The objective of the study was to examine the reliability of the Cooking Task (internal consistency, inter-rater and test-retest reliability). A total of 160 patients with ABI (113 men, mean age 37 years, SD = 14.3) were tested using the Cooking Task. For test-retest reliability, patients were assessed by the same rater on two occasions (mean interval 11 days) while two raters independently and simultaneously observed and scored patients' performances to estimate inter-rater reliability. Internal consistency was high for the global scale (Cronbach α = .74). Inter-rater reliability (n = 66) for total errors was also high (ICC = .93), however the test-retest reliability (n = 11) was poor (ICC = .36). In general the Cooking Task appears to be a reliable tool. The low test-retest results were expected given the importance of EF in the performance of novel tasks.
Inter-rater and intra-rater reliability of a movement control test in shoulder.

PubMed

Rajasekar, S; Bangera, Rakshith K; Sekaran, Padmanaban

2017-07-01

Movement faults are commonly observed in patients with musculoskeletal pain. The Kinetic Medial Rotation Test (KMRT) is a movement control test used to identify movement faults of the scapula and gleno-humeral joints during arm movement. Objective tests such as the KMRT need to be reliable and valid for the results to be applied across different clinical settings and patient populations. The primary objective of the present study was to determine the intra-rater and inter-rater reliability of KMRT in subjects with and without shoulder pain. Sixty subjects were included in this study based on specific inclusion and exclusion criteria. Two musculoskeletal physiotherapists with different levels of clinical experience performed the tests. The intra-rater reliability was tested in twenty asymptomatic subjects by a single assessor at two week intervals. An equal number of subjects with and without shoulder pain were tested by both the assessors to determine the inter-rater reliability. Both components of the KMRT, the Gleno- Humeral Anterior Translation (GHAT) and the Scapular Forward Tilt (SCFT) were tested. The Kappa values for inter-rater reliability of the GHAT and SCFT were K = 0.68 & K = 0.65 respectively in subjects with shoulder pain. In asymptomatic subjects, the inter-rater reliability of GHAT was K = 0.61 and SCFT was K = 0.85. Intra-rater reliability ranged from K = 0.66 for GHAT to K = 0.87 for SCFT. Our study found substantial agreement in inter-rater reliability of KMRT in subjects with shoulder pain, whereas substantial to near perfect agreement was found in intra-rater and inter-rater reliability of KMRT in subjects without shoulder pain. Copyright © 2017 Elsevier Ltd. All rights reserved.
An assessment of the reliability of quantitative genetics estimates in study systems with high rate of extra-pair reproduction and low recruitment.

PubMed

Bourret, A; Garant, D

2017-03-01

Quantitative genetics approaches, and particularly animal models, are widely used to assess the genetic (co)variance of key fitness related traits and infer adaptive potential of wild populations. Despite the importance of precision and accuracy of genetic variance estimates and their potential sensitivity to various ecological and population specific factors, their reliability is rarely tested explicitly. Here, we used simulations and empirical data collected from an 11-year study on tree swallow (Tachycineta bicolor), a species showing a high rate of extra-pair paternity and a low recruitment rate, to assess the importance of identity errors, structure and size of the pedigree on quantitative genetic estimates in our dataset. Our simulations revealed an important lack of precision in heritability and genetic-correlation estimates for most traits, a low power to detect significant effects and important identifiability problems. We also observed a large bias in heritability estimates when using the social pedigree instead of the genetic one (deflated heritabilities) or when not accounting for an important cause of resemblance among individuals (for example, permanent environment or brood effect) in model parameterizations for some traits (inflated heritabilities). We discuss the causes underlying the low reliability observed here and why they are also likely to occur in other study systems. Altogether, our results re-emphasize the difficulties of generalizing quantitative genetic estimates reliably from one study system to another and the importance of reporting simulation analyses to evaluate these important issues.
Validity of an adaptation of the Framingham cardiovascular risk function: the VERIFICA study

PubMed Central

Marrugat, Jaume; Subirana, Isaac; Comín, Eva; Cabezas, Carmen; Vila, Joan; Elosua, Roberto; Nam, Byung‐Ho; Ramos, Rafel; Sala, Joan; Solanas, Pascual; Cordón, Ferran; Gené‐Badia, Joan; D'Agostino, Ralph B

2007-01-01

Background To assess the reliability and accuracy of the Framingham coronary heart disease (CHD) risk function adapted by the Registre Gironí del Cor (REGICOR) investigators in Spain. Methods A 5‐year follow‐up study was completed in 5732 participants aged 35–74 years. The adaptation consisted of using in the function the average population risk factor prevalence and the cumulative incidence observed in Spain instead of those from Framingham in a Cox proportional hazards model. Reliability and accuracy in estimating the observed cumulative incidence were tested with the area under the curve comparison and goodness‐of‐fit test, respectively. Results The Kaplan–Meier CHD cumulative incidence during the follow‐up was 4.0% in men and 1.7% in women. The original Framingham function and the REGICOR adapted estimates were 10.4% and 4.8%, and 3.6% and 2.0%, respectively. The REGICOR‐adapted function's estimate did not differ from the observed cumulated incidence (goodness of fit in men, p = 0.078, in women, p = 0.256), whereas all the original Framingham function estimates differed significantly (p<0.001). Reliabilities of the original Framingham function and of the best Cox model fit with the study data were similar in men (area under the receiver operator characteristic curve 0.68 and 0.69, respectively, p = 0.273), whereas the best Cox model fitted better in women (0.73 and 0.81, respectively, p<0.001). Conclusion The Framingham function adapted to local population characteristics accurately and reliably predicted the 5‐year CHD risk for patients aged 35–74 years, in contrast with the original function, which consistently overestimated the actual risk. PMID:17183014
Analysis of linear measurements on 3D surface models using CBCT data segmentation obtained by automatic standard pre-set thresholds in two segmentation software programs: an in vitro study.

PubMed

Poleti, Marcelo Lupion; Fernandes, Thais Maria Freire; Pagin, Otávio; Moretti, Marcela Rodrigues; Rubira-Bullen, Izabel Regina Fischer

2016-01-01

The aim of this in vitro study was to evaluate the reliability and accuracy of linear measurements on three-dimensional (3D) surface models obtained by standard pre-set thresholds in two segmentation software programs. Ten mandibles with 17 silica markers were scanned for 0.3-mm voxels in the i-CAT Classic (Imaging Sciences International, Hatfield, PA, USA). Twenty linear measurements were carried out by two observers two times on the 3D surface models: the Dolphin Imaging 11.5 (Dolphin Imaging & Management Solutions, Chatsworth, CA, USA), using two filters(Translucent and Solid-1), and in the InVesalius 3.0.0 (Centre for Information Technology Renato Archer, Campinas, SP, Brazil). The physical measurements were made by another observer two times using a digital caliper on the dry mandibles. Excellent intra- and inter-observer reliability for the markers, physical measurements, and 3D surface models were found (intra-class correlation coefficient (ICC) and Pearson's r ≥ 0.91). The linear measurements on 3D surface models by Dolphin and InVesalius software programs were accurate (Dolphin Solid-1 > InVesalius > Dolphin Translucent). The highest absolute and percentage errors were obtained for the variable R1-R1 (1.37 mm) and MF-AC (2.53 %) in the Dolphin Translucent and InVesalius software, respectively. Linear measurements on 3D surface models obtained by standard pre-set thresholds in the Dolphin and InVesalius software programs are reliable and accurate compared with physical measurements. Studies that evaluate the reliability and accuracy of the 3D models are necessary to ensure error predictability and to establish diagnosis, treatment plan, and prognosis in a more realistic way.
Day-to-day reliability of gait characteristics in rats.

PubMed

Raffalt, Peter C; Nielsen, Louise R; Madsen, Stefan; Munk Højberg, Laurits; Pingel, Jessica; Nielsen, Jens Bo; Wienecke, Jacob; Alkjær, Tine

2018-04-27

The purpose of the present study was to determine the day-to-day reliability in stride characteristics in rats during treadmill walking obtained with two-dimensional (2D) motion capture. Kinematics were recorded from 26 adult rats during walking at 8 m/min, 12 m/min and 16 m/min on two separate days. Stride length, stride time, contact time, swing time and hip, knee and ankle joint range of motion were extracted from 15 strides. The relative reliability was assessed using intra-class correlation coefficients (ICC(1,1)) and (ICC(3,1)). The absolute reliability was determined using measurement error (ME). Across walking speeds, the relative reliability ranged from fair to good (ICCs between 0.4 and 0.75). The ME was below 91 mm for strides lengths, below 55 ms for the temporal stride variables and below 6.4° for the joint angle range of motion. In general, the results indicated an acceptable day-to-day reliability of the gait pattern parameters observed in rats during treadmill walking. The results of the present study may serve as a reference material that can help future intervention studies on rat gait characteristics both with respect to the selection of outcome measures and in the interpretation of the results. Copyright © 2018 Elsevier Ltd. All rights reserved.
Measuring the morphological characteristics of thoracolumbar fascia in ultrasound images: an inter-rater reliability study.

PubMed

De Coninck, Kyra; Hambly, Karen; Dickinson, John W; Passfield, Louis

2018-06-01

Chronic lower back pain is still regarded as a poorly understood multifactorial condition. Recently, the thoracolumbar fascia complex has been found to be a contributing factor. Ultrasound imaging has shown that people with chronic lower back pain demonstrate both a significant decrease in shear strain, and a 25% increase in thickness of the thoracolumbar fascia. There is sparse data on whether medical practitioners agree on the level of disorganisation in ultrasound images of thoracolumbar fascia. The purpose of this study was to establish inter-rater reliability of the ranking of architectural disorganisation of thoracolumbar fascia on a scale from 'very disorganised' to 'very organised'. An exploratory analysis was performed using a fully crossed design of inter-rater reliability. Thirty observers were recruited, consisting of 21 medical doctors, 7 physiotherapists and 2 radiologists, with an average of 13.03 ± 9.6 years of clinical experience. All 30 observers independently rated the architectural disorganisation of the thoracolumbar fascia in 30 ultrasound scans, on a Likert-type scale with rankings from 1 = very disorganised to 10 = very organised. Internal consistency was assessed using Cronbach's alpha. Krippendorff's alpha was used to calculate the overall inter-rater reliability. The Krippendorf's alpha was .61, indicating a modest degree of agreement between observers on the different morphologies of thoracolumbar fascia.The Cronbach's alpha (0.98), indicated that there was a high degree of consistency between observers. Experience in ultrasound image analysis did not affect constancy between observers (Cronbach's range between experienced and inexperienced raters: 0.95 and 0.96 respectively). Medical practitioners agree on morphological features such as levels of organisation and disorganisation in ultrasound images of thoracolumbar fascia, regardless of experience. Further analysis by an expert panel is required to develop specific classification criteria for thoracolumbar fascia.
Unique reliability characteristics of fully depleted silicon-on-insulator tunneling FET

NASA Astrophysics Data System (ADS)

Kang, Soo Cheol; Lim, Donghwan; Lim, Sung Kwan; Noh, Jinwoo; Kim, Seung-Mo; Lee, Sang Kyung; Choi, Changhwan; Lee, Byoung Hun

2018-04-01

This study investigated the unique reliability characteristics of tunneling field effect transistors (TFETs) by comparing the effects of positive bias temperature instability (PBTI) and hot carrier injection (HCI) stresses. In case of hot carrier injection (HCI) stress, the interface trap generation near a p/n+ region was the primary degradation mechanism. However, strong recovery after a high-pressure hydrogen annealing and weak degradation at low temperature indicates that the degradation mechanism of TFET under the HCI stress is different from the high-energy carrier stress induced permanent defect generation mechanism observed in MOSFETs. Further study is necessary to identify the exact location and defect species causing TFET degradation; however, a significant difference is evident between the dominant reliability mechanism of TFET and MOSFET.
Psychometric properties of a sign language version of the Mini International Neuropsychiatric Interview (MINI).

PubMed

Øhre, Beate; Saltnes, Hege; von Tetzchner, Stephen; Falkum, Erik

2014-05-22

There is a need for psychiatric assessment instruments that enable reliable diagnoses in persons with hearing loss who have sign language as their primary language. The objective of this study was to assess the validity of the Norwegian Sign Language (NSL) version of the Mini International Neuropsychiatric Interview (MINI). The MINI was translated into NSL. Forty-one signing patients consecutively referred to two specialised psychiatric units were assessed with a diagnostic interview by clinical experts and with the MINI. Inter-rater reliability was assessed with Cohen's kappa and "observed agreement". There was 65% agreement between MINI diagnoses and clinical expert diagnoses. Kappa values indicated fair to moderate agreement, and observed agreement was above 76% for all diagnoses. The MINI diagnosed more co-morbid conditions than did the clinical expert interview (mean diagnoses: 1.9 versus 1.2). Kappa values indicated moderate to substantial agreement, and "observed agreement" was above 88%. The NSL version performs similarly to other MINI versions and demonstrates adequate reliability and validity as a diagnostic instrument for assessing mental disorders in persons who have sign language as their primary and preferred language.
Intra-observer reproducibility and interobserver reliability of the radiographic parameters in the Spinal Deformity Study Group's AIS Radiographic Measurement Manual.

PubMed

Dang, Natasha Radhika; Moreau, Marc J; Hill, Douglas L; Mahood, James K; Raso, James

2005-05-01

Retrospective cross-sectional assessment of the reproducibility and reliability of radiographic parameters. To measure the intra-examiner and interexaminer reproducibility and reliability of salient radiographic features. The management and treatment of adolescent idiopathic scoliosis (AIS) depends on accurate and reproducible radiographic measurements of the deformity. Ten sets of radiographs were randomly selected from a sample of patients with AIS, with initial curves between 20 degrees and 45 degrees. Fourteen measures of the deformity were measured from posteroanterior and lateral radiographs by 2 examiners, and were repeated 5 times at intervals of 3-5 days. Intra-examiner and interexaminer differences were examined. The parameters include measures of curve size, spinal imbalance, sagittal kyphosis and alignment, maximum apical vertebral rotation, T1 tilt, spondylolysis/spondylolisthesis, and skeletal age. Intra-examiner reproducibility was generally excellent for parameters measured from the posteroanterior radiographs but only fair to good for parameters from the lateral radiographs, in which some landmarks were not clearly visible. Of the 13 parameters observed, 7 had excellent interobserver reliability. The measurements from the lateral radiograph were less reproducible and reliable and, thus, may not add value to the assessment of AIS. Taking additional measures encourages a systematic and comprehensive assessment of spinal radiographs.
Blinded evaluation of interrater reliability of an operative competency assessment tool for direct laryngoscopy and rigid bronchoscopy.

PubMed

Ishman, Stacey L; Benke, James R; Johnson, Kaalan Erik; Zur, Karen B; Jacobs, Ian N; Thorne, Marc C; Brown, David J; Lin, Sandra Y; Bhatti, Nasir; Deutsch, Ellen S

2012-10-01

OBJECTIVES To confirm interrater reliability using blinded evaluation of a skills-assessment instrument to assess the surgical performance of resident and fellow trainees performing pediatric direct laryngoscopy and rigid bronchoscopy in simulated models. DESIGN Prospective, paired, blinded observational validation study. SUBJECTS Paired observers from multiple institutions simultaneously evaluated residents and fellows who were performing surgery in an animal laboratory or using high-fidelity manikins. The evaluators had no previous affiliation with the residents and fellows and did not know their year of training. INTERVENTIONS One- and 2-page versions of an objective structured assessment of technical skills (OSATS) assessment instrument composed of global and a task-specific surgical items were used to evaluate surgical performance. RESULTS Fifty-two evaluations were completed by 17 attending evaluators. The instrument agreement for the 2-page assessment was 71.4% when measured as a binary variable (ie, competent vs not competent) (κ = 0.38; P = .08). Evaluation as a continuous variable revealed a 42.9% percentage agreement (κ = 0.18; P = .14). The intraclass correlation was 0.53, considered substantial/good interrater reliability (69% reliable). For the 1-page instrument, agreement was 77.4% when measured as a binary variable (κ = 0.53, P = .0015). Agreement when evaluated as a continuous measure was 71.0% (κ = 0.54, P < .001). The intraclass correlation was 0.73, considered high interrater reliability (85% reliable). CONCLUSIONS The OSATS assessment instrument is an effective tool for evaluating surgical performance among trainees with acceptable interrater reliability in a simulator setting. Reliability was good for both the 1- and 2-page OSATS checklists, and both serve as excellent tools to provide immediate formative feedback on operational competency.
Temporal reliability and lateralization of the resting-state language network.

PubMed

Zhu, Linlin; Fan, Yang; Zou, Qihong; Wang, Jue; Gao, Jia-Hong; Niu, Zhendong

2014-01-01

The neural processing loop of language is complex but highly associated with Broca's and Wernicke's areas. The left dominance of these two areas was the earliest observation of brain asymmetry. It was demonstrated that the language network and its functional asymmetry during resting state were reproducible across institutions. However, the temporal reliability of resting-state language network and its functional asymmetry are still short of knowledge. In this study, we established a seed-based resting-state functional connectivity analysis of language network with seed regions located at Broca's and Wernicke's areas, and investigated temporal reliability of language network and its functional asymmetry. The language network was found to be temporally reliable in both short- and long-term. In the aspect of functional asymmetry, the Broca's area was found to be left lateralized, while the Wernicke's area is mainly right lateralized. Functional asymmetry of these two areas revealed high short- and long-term reliability as well. In addition, the impact of global signal regression (GSR) on reliability of the resting-state language network was investigated, and our results demonstrated that GSR had negligible effect on the temporal reliability of the resting-state language network. Our study provided methodology basis for future cross-culture and clinical researches of resting-state language network and suggested priority of adopting seed-based functional connectivity for its high reliability.
Temporal Reliability and Lateralization of the Resting-State Language Network

PubMed Central

Zou, Qihong; Wang, Jue; Gao, Jia-Hong; Niu, Zhendong

2014-01-01

The neural processing loop of language is complex but highly associated with Broca's and Wernicke's areas. The left dominance of these two areas was the earliest observation of brain asymmetry. It was demonstrated that the language network and its functional asymmetry during resting state were reproducible across institutions. However, the temporal reliability of resting-state language network and its functional asymmetry are still short of knowledge. In this study, we established a seed-based resting-state functional connectivity analysis of language network with seed regions located at Broca's and Wernicke's areas, and investigated temporal reliability of language network and its functional asymmetry. The language network was found to be temporally reliable in both short- and long-term. In the aspect of functional asymmetry, the Broca's area was found to be left lateralized, while the Wernicke's area is mainly right lateralized. Functional asymmetry of these two areas revealed high short- and long-term reliability as well. In addition, the impact of global signal regression (GSR) on reliability of the resting-state language network was investigated, and our results demonstrated that GSR had negligible effect on the temporal reliability of the resting-state language network. Our study provided methodology basis for future cross-culture and clinical researches of resting-state language network and suggested priority of adopting seed-based functional connectivity for its high reliability. PMID:24475058
Measuring Afterschool Program Quality Using Setting-Level Observational Approaches

ERIC Educational Resources Information Center

Oh, Yoonkyung; Osgood, D. Wayne; Smith, Emilie P.

2015-01-01

The importance of afterschool hours for youth development is widely acknowledged, and afterschool settings have recently received increasing attention as an important venue for youth interventions, bringing a growing need for reliable and valid measures of afterschool quality. This study examined the extent to which the two observational tools,…
77 FR 54917 - Findings of Research Misconduct

Federal Register 2010, 2011, 2012, 2013, 2014

2012-09-06

... values for inter-observer reliabilities when coding was done by only one observer, in both cases leading... Research Integrity (ORI) has taken final action in the following case: Marc Hauser, Ph.D., Harvard... collaborators that he miscoded some of the trials and that the study failed to provide support for the initial...
Basinwide Estimation of Habitat and Fish Populations in Streams

Treesearch

C. Andrew Dolloff; David G. Hankin; Gordon H. Reeves

1993-01-01

Basinwide visual estimation techniques (BVET) are statistically reliable and cost effective for estimating habitat and fish populations across entire watersheds. Survey teams visit habitats in every reach of the study area to record visual observations. At preselected intervals, teams also record actual measurements. These observations and measurements are used to...

Mobile Functional Reach Test in People Who Suffer Stroke: A Pilot Study

PubMed Central

Merchán-Baeza, Jose Antonio; González-Sánchez, Manuel

2015-01-01

Background Postural instability is one of the major complications found in people who survive a stroke. Parameterizing the Functional Reach Test (FRT) could be useful in clinical practice and basic research, as this test is a clinically accepted tool (for its simplicity, reliability, economy, and portability) to measure the semistatic balance of a subject. Objective The aim of this study is to analyze the reliability in the FRT parameterization using inertial sensor within mobile phones (mobile sensors) for recording kinematic variables in patients who have suffered a stroke. Our hypothesis is that the sensors in mobile phones will be reliable instruments for kinematic study of the FRT. Methods This is a cross-sectional study of 7 subjects over 65 years of age who suffered a stroke. During the execution of FRT, the subjects carried two mobile phones: one placed in the lumbar region and the other one on the trunk. After analyzing the data obtained in the kinematic registration by the mobile sensors, a number of direct and indirect variables were obtained. The variables extracted directly from FRT through the mobile sensors were distance, maximum angular lumbosacral/thoracic displacement, time for maximum angular lumbosacral/thoracic displacement, time of return to the initial position, and total time. Using these data, we calculated speed and acceleration of each. A descriptive analysis of all kinematic outcomes recorded by the two mobile sensors (trunk and lumbar) was developed and the average range achieved in the FRT. Reliability measures were calculated by analyzing the internal consistency of the measures with 95% confidence interval of each outcome variable. We calculated the reliability of mobile sensors in the measurement of the kinematic variables during the execution of the FRT. Results The values in the FRT obtained in this study (2.49 cm, SD 13.15) are similar to those found in other studies with this population and with the same age range. Intrasubject reliability values observed in the use of mobile phones are all located above 0.831, ranging from 0.831 (time B_C trunk area) and 0.894 (displacement A_B trunk area). Likewise, the observed intersubject values range from 0.835 (time B_C trunk area) and 0.882 (displacement A_C trunk area). On the other hand, the reliability of the FRT was 0.989 (0.981-0.996) and 0.978 (0.970-0.985), intrasubject and intersubject respectively. Conclusions We found that mobile sensors in mobile phones could be reliable tools in the parameterization of the Functional Reach Test in people who have had a stroke. PMID:28582239
Getting It Right Matters: Climate Spectra and Their Estimation

NASA Astrophysics Data System (ADS)

Privalsky, Victor; Yushkov, Vladislav

2018-06-01

In many recent publications, climate spectra estimated with different methods from observed, GCM-simulated, and reconstructed time series contain many peaks at time scales from a few years to many decades and even centuries. However, respective spectral estimates obtained with the autoregressive (AR) and multitapering (MTM) methods showed that spectra of climate time series are smooth and contain no evidence of periodic or quasi-periodic behavior. Four order selection criteria for the autoregressive models were studied and proven sufficiently reliable for 25 time series of climate observations at individual locations or spatially averaged at local-to-global scales. As time series of climate observations are short, an alternative reliable nonparametric approach is Thomson's MTM. These results agree with both the earlier climate spectral analyses and the Markovian stochastic model of climate.
Comparison of 3D computer-aided with manual cerebral aneurysm measurements in different imaging modalities.

PubMed

Groth, M; Forkert, N D; Buhk, J H; Schoenfeld, M; Goebell, E; Fiehler, J

2013-02-01

To compare intra- and inter-observer reliability of aneurysm measurements obtained by a 3D computer-aided technique with standard manual aneurysm measurements in different imaging modalities. A total of 21 patients with 29 cerebral aneurysms were studied. All patients underwent digital subtraction angiography (DSA), contrast-enhanced (CE-MRA) and time-of-flight magnetic resonance angiography (TOF-MRA). Aneurysm neck and depth diameters were manually measured by two observers in each modality. Additionally, semi-automatic computer-aided diameter measurements were performed using 3D vessel surface models derived from CE- (CE-com) and TOF-MRA (TOF-com) datasets. Bland-Altman analysis (BA) and intra-class correlation coefficient (ICC) were used to evaluate intra- and inter-observer agreement. BA revealed the narrowest relative limits of intra- and inter-observer agreement for aneurysm neck and depth diameters obtained by TOF-com (ranging between ±5.3 % and ±28.3 %) and CE-com (ranging between ±23.3 % and ±38.1 %). Direct measurements in DSA, TOF-MRA and CE-MRA showed considerably wider limits of agreement. The highest ICCs were observed for TOF-com and CE-com (ICC values, 0.92 or higher for intra- as well as inter-observer reliability). Computer-aided aneurysm measurement in 3D offers improved intra- and inter-observer reliability and a reproducible parameter extraction, which may be used in clinical routine and as objective surrogate end-points in clinical trials.
Marijuana abstinence effects in marijuana smokers maintained in their home environment.

PubMed

Budney, A J; Hughes, J R; Moore, B A; Novy, P L

2001-10-01

Although withdrawal symptoms are commonly reported by persons seeking treatment for marijuana dependence, the validity and clinical significance of a marijuana withdrawal syndrome has not been established. This controlled outpatient study examined the reliability and specificity of the abstinence effects that occur when daily marijuana users abruptly stop smoking marijuana. Twelve daily marijuana smokers were assessed on 16 consecutive days during which they smoked marijuana as usual (days 1-5), abstained from smoking marijuana (days 6-8), returned to smoking marijuana (days 9-13), and again abstained from smoking marijuana (days 14-16). An overall measure of withdrawal discomfort increased significantly during the abstinence phases and returned to baseline when marijuana smoking resumed. Craving for marijuana, decreased appetite, sleep difficulty, and weight loss reliably changed across the smoking and abstinence phases. Aggression, anger, irritability, restlessness, and strange dreams increased significantly during one abstinence phase, but not the other. Collateral observers confirmed participant reports of these symptoms. This study validated several specific effects of marijuana abstinence in heavy marijuana users, and showed they were reliable and clinically significant. These withdrawal effects appear similar in type and magnitude to those observed in studies of nicotine withdrawal.
The Reliability of Galaxy Classifications by Citizen Scientists

NASA Astrophysics Data System (ADS)

Francis, Lennox; Kautsch, Stefan J.; Bizyaev, Dmitry

2017-01-01

Citizen scientists are becoming more and more important in helping professionals working through big data. An example in astronomy is crowdsourced galaxy classification. But how reliable are these classifications for studies of galaxy evolution? We present a tool in order to investigate those morphological classifications and test it on a diverse population on our campus. We observe a slight offset towards earlier Hubble types in the crowdsourced morphologies, when compared to professional classifications.
Multisite Reliability of Cognitive BOLD Data

PubMed Central

Brown, Gregory G.; Mathalon, Daniel H.; Stern, Hal; Ford, Judith; Mueller, Bryon; Greve, Douglas N.; McCarthy, Gregory; Voyvodic, Jim; Glover, Gary; Diaz, Michele; Yetter, Elizabeth; Burak Ozyurt, I.; Jorgensen, Kasper W.; Wible, Cynthia G.; Turner, Jessica A.; Thompson, Wesley K.; Potkin, Steven G.

2010-01-01

Investigators perform multi-site functional magnetic resonance imaging studies to increase statistical power, to enhance generalizability, and to improve the likelihood of sampling relevant subgroups. Yet undesired site variation in imaging methods could off-set these potential advantages. We used variance components analysis to investigate sources of variation in the blood oxygen level dependent (BOLD) signal across four 3T magnets in voxelwise and region of interest (ROI) analyses. Eighteen participants traveled to four magnet sites to complete eight runs of a working memory task involving emotional or neutral distraction. Person variance was more than 10 times larger than site variance for five of six ROIs studied. Person-by-site interactions, however, contributed sizable unwanted variance to the total. Averaging over runs increased between-site reliability, with many voxels showing good to excellent between-site reliability when eight runs were averaged and regions of interest showing fair to good reliability. Between-site reliability depended on the specific functional contrast analyzed in addition to the number of runs averaged. Although median effect size was correlated with between-site reliability, dissociations were observed for many voxels. Brain regions where the pooled effect size was large but between-site reliability was poor were associated with reduced individual differences. Brain regions where the pooled effect size was small but between-site reliability was excellent were associated with a balance of participants who displayed consistently positive or consistently negative BOLD responses. Although between-site reliability of BOLD data can be good to excellent, acquiring highly reliable data requires robust activation paradigms, ongoing quality assurance, and careful experimental control. PMID:20932915
Reliability of an experimental method to analyse the impact point on a golf ball during putting.

PubMed

Richardson, Ashley K; Mitchell, Andrew C S; Hughes, Gerwyn

2015-06-01

This study aimed to examine the reliability of an experimental method identifying the location of the impact point on a golf ball during putting. Forty trials were completed using a mechanical putting robot set to reproduce a putt of 3.2 m, with four different putter-ball combinations. After locating the centre of the dimple pattern (centroid) the following variables were tested; distance of the impact point from the centroid, angle of the impact point from the centroid and distance of the impact point from the centroid derived from the X, Y coordinates. Good to excellent reliability was demonstrated in all impact variables reflected in very strong relative (ICC = 0.98-1.00) and absolute reliability (SEM% = 0.9-4.3%). The highest SEM% observed was 7% for the angle of the impact point from the centroid. In conclusion, the experimental method was shown to be reliable at locating the centroid location of a golf ball, therefore allowing for the identification of the point of impact with the putter head and is suitable for use in subsequent studies.
Reliability of concentrations of organophosphate pesticide metabolites in serial urine specimens from pregnancy in the Generation R study

PubMed Central

Spaan, Suzanne; Pronk, Anjoeka; Koch, Holger M.; Jusko, Todd A.; Jaddoe, Vincent W.V.; Shaw, Pamela A.; Tiemeier, Henning M.; Hofman, Albert; Pierik, Frank H.; Longnecker, Matthew P.

2014-01-01

The widespread use of organophosphate (OP) pesticides has resulted in ubiquitous exposure in humans, primarily through their diet. Exposure to OP pesticides may have adverse health effects, including neurobehavioral deficits in children. The optimal design of new studies requires data on the reliability of urinary measures of exposure. In the present study, urinary concentrations of six dialkyl phosphate (DAP) metabolites, the main urinary metabolites of OP pesticides, were determined in 120 pregnant women participating in the Generation R Study in Rotterdam. Intra-class correlation coefficients (ICCs) across serial urine specimens taken at <18, 18–25, and >25 weeks of pregnancy were determined to assess reliability. Geometric mean total DAP metabolite concentrations were 229 (GSD 2.2), 240 (GSD 2.1), and 224 (GSD 2.2) nmol/g creatinine across the three periods of gestation. Metabolite concentrations from the serial urine specimens in general correlated moderately. The ICCs for the six DAP metabolites ranged from 0.14 to 0.38 (0.30 for total DAPs), indicating weak to moderate reliability. Although the DAP metabolite levels observed in this study are slightly higher and slightly more correlated than in previous studies, the low to moderate reliability indicates a high degree of within-person variability, which presents challenges for designing well-powered epidemiologic studies. PMID:25515376
Validity and reliability of global operative assessment of laparoscopic skills (GOALS) in novice trainees performing a laparoscopic cholecystectomy.

PubMed

Kramp, Kelvin H; van Det, Marc J; Hoff, Christiaan; Lamme, Bas; Veeger, Nic J G M; Pierie, Jean-Pierre E N

2015-01-01

Global Operative Assessment of Laparoscopic Skills (GOALS) assessment has been designed to evaluate skills in laparoscopic surgery. A longitudinal blinded study of randomized video fragments was conducted to estimate the validity and reliability of GOALS in novice trainees. In total, 10 trainees each performed 6 consecutive laparoscopic cholecystectomies. Sixty procedures were recorded on video. Video fragments of (1) opening of the peritoneum; (2) dissection of Calot's triangle and achievement of critical view of safety; and (3) dissection of the gallbladder from the liver bed were blinded, randomized, and rated by 2 consultant surgeons using GOALS. Also, a grade was given for overall competence. The correlation of GOALS with live observation Objective Structured Assessment of Technical Skills (OSATS) scores was calculated. Construct validity was estimated using the Friedman 2-way analysis of variance by ranks and the Wilcoxon signed-rank test. The interrater reliability was calculated using the absolute and consistency agreement 2-way random-effects model intraclass correlation coefficient. A high correlation was found between mean GOALS score (r = 0.879, p = 0.021) and mean OSATS score. The GOALS score increased significantly across the 6 procedures (p = 0.002). The trainees performed significantly better on their sixth when compared with their first cholecystectomy (p = 0.004). The consistency agreement interrater reliability was 0.37 for the mean GOALS score (p = 0.002) and 0.55 for overall competence (p < 0.001) of the 3 video fragments. The validity observed in this randomized blinded longitudinal study supports the existing evidence that GOALS is a valid tool for assessment of novice trainees. A relatively low reliability was found in this study. Copyright © 2014 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
German validation of the Conners Adult ADHD Rating Scales (CAARS) II: reliability, validity, diagnostic sensitivity and specificity.

PubMed

Christiansen, H; Kis, B; Hirsch, O; Matthies, S; Hebebrand, J; Uekermann, J; Abdel-Hamid, M; Kraemer, M; Wiltfang, J; Graf, E; Colla, M; Sobanski, E; Alm, B; Rösler, M; Jacob, C; Jans, T; Huss, M; Schimmelmann, B G; Philipsen, A

2012-07-01

The German version of the Conners Adult ADHD Rating Scales (CAARS) has proven to show very high model fit in confirmative factor analyses with the established factors inattention/memory problems, hyperactivity/restlessness, impulsivity/emotional lability, and problems with self-concept in both large healthy control and ADHD patient samples. This study now presents data on the psychometric properties of the German CAARS-self-report (CAARS-S) and observer-report (CAARS-O) questionnaires. CAARS-S/O and questions on sociodemographic variables were filled out by 466 patients with ADHD, 847 healthy control subjects that already participated in two prior studies, and a total of 896 observer data sets were available. Cronbach's-alpha was calculated to obtain internal reliability coefficients. Pearson correlations were performed to assess test-retest reliability, and concurrent, criterion, and discriminant validity. Receiver Operating Characteristics (ROC-analyses) were used to establish sensitivity and specificity for all subscales. Coefficient alphas ranged from .74 to .95, and test-retest reliability from .85 to .92 for the CAARS-S, and from .65 to .85 for the CAARS-O. All CAARS subscales, except problems with self-concept correlated significantly with the Barrett Impulsiveness Scale (BIS), but not with the Wender Utah Rating Scale (WURS). Criterion validity was established with ADHD subtype and diagnosis based on DSM-IV criteria. Sensitivity and specificity were high for all four subscales. The reported results confirm our previous study and show that the German CAARS-S/O do indeed represent a reliable and cross-culturally valid measure of current ADHD symptoms in adults. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
The admissions process of a bachelor of science in nursing program: initial reliability and validity of the personal interview.

PubMed

Carpio, B; Brown, B

1993-01-01

The undergraduate nursing degree program (B.Sc.N.) at McMaster University School of Nursing uses small groups, and is learner-centered and problem-based. A study was conducted during the 1991 admissions cycle to determine the initial reliability and validity of the semi-structured personal interview which constitutes the final component of candidate selection for this program. During the interview, three-member teams assess applicant suitability to the program based on six dimensions: applicant motivation, awareness of the program, problem-solving abilities, ability to relate to others, self-appraisal skills, and career goals. Each interviewer assigns the applicant a global rating using a seven-point scale. For the purposes of this study four interviewer teams were randomly selected from the pool of 31 teams to interview four simulated (preprogrammed) applicants. Using two-factor repeated-measures ANOVA to analyze interview ratings, inter-rater and inter-team intraclass correlation coefficients (ICC) were calculated. Inter-team reliability ranged from .64 to .97 for the individual dimensions, and .66 to .89 on global ratings. Inter-rater ICC for the six dimensions ranged from .81 to .99, and .96 to .99 for the global ratings. The item-to-total correlation coefficients between individual dimensions and global ratings ranged from .8 to 1.0. Pearson correlations between items ranged from .77 to 1.0. The ICC were then calculated for the interview scores of 108 actual applicants to the program. Inter-rater reliability based on global ratings was .79 for the single (1 rater) observation, and .91 for the multiple (3 rater) observation. These findings support the continued use of the interview as a reliable instrument with face validity. Studies of predictive validity will be undertaken.
Greater understanding of normal hip physical function may guide clinicians in providing targeted rehabilitation programmes.

PubMed

Kemp, Joanne L; Schache, Anthony G; Makdissi, Michael; Sims, Kevin J; Crossley, Kay M

2013-07-01

This study investigated tests of hip muscle strength and functional performance. The specific objectives were to: (i) establish intra- and inter-rater reliability; (ii) compare differences between dominant and non-dominant limbs; (iii) compare agonist and antagonist muscle strength ratios; (iv) compare differences between genders; and (v) examine relationships between hip muscle strength, baseline measures and functional performance. Reliability study and cross-sectional analysis of hip strength and functional performance. In healthy adults aged 18-50years, normalised hip muscle peak torque and functional performance were evaluated to: (i) establish intra-rater and inter-rater reliability; (ii) analyse differences between limbs, between antagonistic muscle groups and genders; and (iii) associations between strength and functional performance. Excellent reliability (intra-rater ICC=0.77-0.96; inter-rater ICC=0.82-0.95) was observed. No difference existed between dominant and non-dominant limbs. Differences in strength existed between antagonistic pairs of muscles: hip abduction was greater than adduction (p<0.001) and hip ER was greater than IR (p<0.001). Men had greater ER strength (p=0.006) and hop for distance (p<0.001) than women. Strong associations were observed between measures of hip muscle strength (except hip flexion) and age, height, and functional performance. Deficits in hip muscle strength or functional performance may influence hip pain. In order to provide targeted rehabilitation programmes to address patient-specific impairments, and determine when individuals are ready to return to physical activity, clinicians are increasingly utilising tests of hip strength and functional performance. This study provides a battery of reliable, clinically applicable tests which can be used for these purposes. Copyright © 2012 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Measurement of the center edge angle and determination of the Severin classification using digital radiography, computer-assisted measurement tools, and a Severin algorithm: intraobserver and interobserver reliability revisited.

PubMed

Carroll, Kristen L; Murray, Kathleen A; MacLeod, Lynne M; Hennessey, Theresa A; Woiczik, Marcella R; Roach, James W

2011-06-01

Numerous studies underscore the poor intraobserver and interobserver reliability of both the center edge angle (CEA) and the Severin classification using plain film measurements. In this study, experienced observers applied a computer-assisted measurement program to determine the CEA in digital pelvic radiographs of adults who had been previously treated for dysplasia of the hip (DDH). Using a teaching aid/algorithm of the Severin classification, the observers then assigned a Severin rating to these hips. Intraobserver and interobserver errors were then calculated on both the CEA measurements and the Severin classifications. Four pediatric orthopaedic surgeons and 1 pediatric radiologist calculated the CEAs using the OrthoView TM planning system and then determined the Severin classification on 41 blinded digital pelvic radiographs. The radiographs were evaluated by each examiner twice, with evaluations separated by 2 months. All examiners reviewed a Severin classification algorithm before making their Severin assignments. The intraobserver and interobserver reliability for both the CEA and the Severin classification were calculated using the interclass correlation coefficients and Cohen and Fleiss κ scores, respectively. The intraobserver and interobserver reliability for CEA measurement was moderate to almost perfect. When we separated the Severin classification into 3 clinically relevant groups of good (Severin I and II), dysplastic (Severin III), and poor (Severin IV and above), our interobserver reliability neared almost perfect. The Severin classification is an extremely useful and oft-used radiographic measure for the success of DDH treatment. Our research found digital radiography, computer-aided measurement tools, the use of a Severin algorithm, and separating the Severin classification into 3 clinically relevant groups significantly increased the intraobserver and interobserver reliability of both the CEA and Severin classification. This finding will assist future studies using the CEA and Severin classification in the radiographic assessment of DDH treatment outcomes.
Evaluative frailty index for physical activity (EFIP): a reliable and valid instrument to measure changes in level of frailty.

PubMed

de Vries, Nienke M; Staal, J Bart; Olde Rikkert, Marcel G M; Nijhuis-van der Sanden, Maria W G

2013-04-01

Physical activity is assumed to be important in the prevention and treatment of frailty. It is unclear, however, to what extent frailty can be influenced because instruments designed to assess frailty have not been validated as evaluative outcome instruments in clinical practice. The aims of this study were: (1) to develop a frailty index (i.e., the evaluative frailty index for physical activity [EFIP]) based on the method of deficit accumulation and (2) to test the clinimetric properties of the EFIP. The content of the EFIP was determined using a written Delphi procedure. Intrarater reliability, interrater reliability, and construct validity were determined in an observational study (n=24). Intrarater reliability and interrater reliability were calculated using Cohen kappa and intraclass correlation coefficients (ICCs). Construct validity was determined by correlating the score on the EFIP with those on the timed "up & go" test (TUG), the performance-oriented mobility assessment (POMA), and the Cumulative Illness Rating Scale for Geriatrics (CIRS-G). Fifty items were included in the EFIP. Interrater reliability (Cohen kappa=0.72, ICC=.96) and intrarater reliability (Cohen kappa=0.77 and 0.80, ICC=.93 and .98) were good. As expected, a fair to moderate correlation with the TUG, POMA, and CIRS-G was found (.61, -.70, and .66, respectively). Reliability and validity of the EFIP have been tested in a small sample. These and other clinimetric properties, such as responsiveness, will be assessed or reassessed in a larger study population. The EFIP is a reliable and valid instrument to evaluate the effect of physical activity on frailty in research and in clinical practice.
Development of a direct observation Measure of Environmental Qualities of Activity Settings.

PubMed

King, Gillian; Rigby, Patty; Batorowicz, Beata; McMain-Klein, Margot; Petrenchik, Theresa; Thompson, Laura; Gibson, Michelle

2014-08-01

The aim of this study was to develop an observer-rated measure of aesthetic, physical, social, and opportunity-related qualities of leisure activity settings for young people (with or without disabilities). Eighty questionnaires were completed by sets of raters who independently rated 22 community/home activity settings. The scales of the 32-item Measure of Environmental Qualities of Activity Settings (MEQAS; Opportunities for Social Activities, Opportunities for Physical Activities, Pleasant Physical Environment, Opportunities for Choice, Opportunities for Personal Growth, and Opportunities to Interact with Adults) were determined using principal components analyses. Test-retest reliability was determined for eight activity settings, rated twice (4-6wk interval) by a trained rater. The factor structure accounted for 80% of the variance. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy was 0.73. Cronbach's alphas for the scales ranged from 0.76 to 0.96, and interrater reliabilities (ICCs) ranged from 0.60 to 0.93. Test-retest reliabilities ranged from 0.70 to 0.90. Results suggest that the MEQAS has a sound factor structure and preliminary evidence of internal consistency, interrater, and test-retest reliability. The MEQAS is the first observer-completed measure of environmental qualities of activity settings. The MEQAS allows researchers to assess comprehensively qualities and affordances of activity settings, and can be used to design and assess environmental qualities of programs for young people. © 2014 Mac Keith Press.
Environmental Profile of a Community's Health (EPOCH): An Instrument to Measure Environmental Determinants of Cardiovascular Health in Five Countries

PubMed Central

Chow, Clara K.; Lock, Karen; Madhavan, Manisha; Corsi, Daniel J.; Gilmore, Anna B.; Subramanian, S. V.; Li, Wei; Swaminathan, Sumathi; Lopez-Jaramillo, Patricio; Avezum, Alvaro; Lear, Scott A.; Dagenais, Gilles; Teo, Koon; McKee, Martin; Yusuf, Salim

2010-01-01

Background The environment in which people live is known to be important in influencing diet, physical activity, smoking, psychosocial and other risk factors for cardiovascular (CV) disease. However no instrument exists that evaluates communities for these multiple environmental factors and is suitable for use across different communities, regions and countries. This report describes the design and reliability of an instrument to measure environmental determinants of CV risk factors. Method/Principal Findings The Environmental Profile of Community Health (EPOCH) instrument comprises two parts: (I) an assessment of the physical environment, and (II) an interviewer-administered questionnaire to collect residents' perceptions of their community. We examined the inter-rater reliability amongst 3 observers from each region of the direct observation component of the instrument (EPOCH I) in 93 rural and urban communities in 5 countries (Canada, Colombia, Brazil, China and India). Data collection using the EPOCH instrument was feasible in all communities. Reliability of the instrument was excellent (Intraclass Correlation Coefficient - ICC>0.75) for 24 of 38 items and fair to good (ICC 0.4–0.75) for 14 of 38 items. Conclusion This report shows data collection with the EPOCH instrument is feasible and direct observation of community measures reliable. The EPOCH instrument will enable further research on environmental determinants of health for population studies from a broad range of settings. PMID:21170320
Inter-rater reliability of an observation-based ergonomics assessment checklist for office workers.

PubMed

Pereira, Michelle Jessica; Straker, Leon Melville; Comans, Tracy Anne; Johnston, Venerina

2016-12-01

To establish the inter-rater reliability of an observation-based ergonomics assessment checklist for computer workers. A 37-item (38-item if a laptop was part of the workstation) comprehensive observational ergonomics assessment checklist comparable to government guidelines and up to date with empirical evidence was developed. Two trained practitioners assessed full-time office workers performing their usual computer-based work and evaluated the suitability of workstations used. Practitioners assessed each participant consecutively. The order of assessors was randomised, and the second assessor was blinded to the findings of the first. Unadjusted kappa coefficients between the raters were obtained for the overall checklist and subsections that were formed from question-items relevant to specific workstation equipment. Twenty-seven office workers were recruited. The inter-rater reliability between two trained practitioners achieved moderate to good reliability for all except one checklist component. This checklist has mostly moderate to good reliability between two trained practitioners. Practitioner Summary: This reliable ergonomics assessment checklist for computer workers was designed using accessible government guidelines and supplemented with up-to-date evidence. Employers in Queensland (Australia) can fulfil legislative requirements by using this reliable checklist to identify and subsequently address potential risk factors for work-related injury to provide a safe working environment.
Performance of Lung Ultrasound in Detecting Peri-Operative Atelectasis after General Anesthesia.

PubMed

Yu, Xin; Zhai, Zhenping; Zhao, Yongfeng; Zhu, Zhiming; Tong, Jianbin; Yan, Jianqin; Ouyang, Wen

2016-12-01

The aim of this prospective observational study was to evaluate the performance of lung ultrasound (LUS) in detecting post-operative atelectasis in adult patients under general anesthesia. Forty-six patients without pulmonary comorbidities who were scheduled for elective neurosurgery were enrolled in the study. A total of 552 pairs of LUS clips and thoracic computed tomography (CT) images were ultimately analyzed to determine the presence of atelectasis in 12 prescribed lung regions. The accuracy of LUS in detecting peri-operative atelectasis was evaluated with thoracic CT as gold standard. Levels of agreement between the two observers for LUS and the two observers for thoracic CT were analyzed using the κ reliability test. The quantitative correlation between LUS scores of aeration and the volumetric data of atelectasis in thoracic CT were further evaluated. LUS had reliable performance in post-operative atelectasis, with a sensitivity of 87.7%, specificity of 92.1% and diagnostic accuracy of 90.8%. The levels of agreement between the two observers for LUS and for thoracic CT were both satisfactory, with κ coefficients of 0.87 (p < 0.0001) and 0.93 (p < 0.0001), respectively. In patients in the supine position, LUS scores were highly correlated with the atelectasis volume of CT (r = 0.58, p < 0.0001). Thus, LUS provides a fast, reliable and radiation-free method to identify peri-operative atelectasis in adults. Copyright © 2016. Published by Elsevier Inc.
An OSSE on Mesoscale Model Assimilation of Simulated HIRAD-Observed Hurricane Surface Winds

NASA Technical Reports Server (NTRS)

Albers, Cerese; Miller, Timothy; Uhlhorn, Eric; Krishnamurti, T. N.

2012-01-01

The hazards of landfalling hurricanes are well known, but progress on improving the intensity forecasts of these deadly storms at landfall has been slow. Many cite a lack of high-resolution data sets taken inside the core of a hurricane, and the lack of reliable measurements in extreme conditions near the surface of hurricanes, as possible reasons why even the most state-of-the-art forecasting models cannot seem to forecast intensity changes better. The Hurricane Imaging Radiometer (HIRAD) is a new airborne microwave remote sensor for observing hurricanes, and is operated and researched by NASA Marshall Space Flight Center in partnership with the NOAA Atlantic Oceanographic and Meteorological Laboratory/Hurricane Research Division, the University of Central Florida, the University of Michigan, and the University of Alabama in Huntsville. This instrument?s purpose is to study the wind field of a hurricane, specifically observing surface wind speeds and rain rates, in what has traditionally been the most difficult areas for other instruments to study; the high wind and heavy rain regions. Dr. T. N. Krishnamurti has studied various data assimilation techniques for hurricane and monsoon rain rates, and this study builds off of results obtained from utilizing his style of physical initializations of rainfall observations, but obtaining reliable observations in heavy rain regions has always presented trouble to our research of high-resolution rainfall forecasting. Reliable data from these regions at such a high resolution and wide swath as HIRAD provides is potentially very valuable to mesoscale forecasting of hurricane intensity. This study shows how the data assimilation technique of Ensemble Kalman Filtering (EnKF) in the Weather Research and Forecasting (WRF) model can be used to incorporate wind, and later rain rate, data into a mesoscale model forecast of hurricane intensity. The study makes use of an Observing System Simulation Experiment (OSSE) with a simulated HIRAD dataset sampled during a hurricane and uses EnKF to forecast the track and intensity prediction of the hurricane. Comparisons to truth and error metrics are used to assess the model?s forecast performance.
Test-retest reliability of the proposed DSM-5 eating disorder diagnostic criteria

PubMed Central

Sysko, Robyn; Roberto, Christina A.; Barnes, Rachel D.; Grilo, Carlos M.; Attia, Evelyn; Walsh, B. Timothy

2012-01-01

The proposed DSM-5 classification scheme for eating disorders includes both major and minor changes to the existing DSM-IV diagnostic criteria. It is not known what effect these modifications will have on the ability to make reliable diagnoses. Two studies were conducted to evaluate the short-term test-retest reliability of the proposed DSM-5 eating disorder diagnoses: anorexia nervosa, bulimia nervosa, binge eating disorder, and feeding and eating conditions not elsewhere classified. Participants completed two independent telephone interviews with research assessors (n=70 Study 1; n=55 Study 2). Fair to substantial agreements (κ= 0.80 and 0.54) were observed across eating disorder diagnoses in Study 1 and Study 2, respectively. Acceptable rates of agreement were identified for the individual eating disorder diagnoses, including DSM-5 anorexia nervosa (κ’s of 0.81 to 0.97), bulimia nervosa (κ=0.84), binge eating disorder (κ’s of 0.75 and 0.61), and feeding and eating disorders not elsewhere classified (κ’s of 0.70 and 0.46). Further, improved short-term test-retest reliability was noted when using the DSM-5, in comparison to DSM-IV, criteria for binge eating disorder. Thus, these studies found that trained interviewers can reliably diagnose eating disorders using the proposed DSM-5 criteria; however, additional data from general practice settings and community samples are needed. PMID:22401974

Various methods for assessing static lower extremity alignment: implications for prospective risk-factor screenings.

PubMed

Nguyen, Anh-Dung; Boling, Michelle C; Slye, Carrie A; Hartley, Emily M; Parisi, Gina L

2013-01-01

Accurate, efficient, and reliable measurement methods are essential to prospectively identify risk factors for knee injuries in large cohorts. To determine tester reliability using digital photographs for the measurement of static lower extremity alignment (LEA) and whether values quantified with an electromagnetic motion-tracking system are in agreement with those quantified with clinical methods and digital photographs. Descriptive laboratory study. Laboratory. Thirty-three individuals participated and included 17 (10 women, 7 men; age = 21.7 ± 2.7 years, height = 163.4 ± 6.4 cm, mass = 59.7 ± 7.8 kg, body mass index = 23.7 ± 2.6 kg/m2) in study 1, in which we examined the reliability between clinical measures and digital photographs in 1 trained and 1 novice investigator, and 16 (11 women, 5 men; age = 22.3 ± 1.6 years, height = 170.3 ± 6.9 cm, mass = 72.9 ± 16.4 kg, body mass index = 25.2 ± 5.4 kg/m2) in study 2, in which we examined the agreement among clinical measures, digital photographs, and an electromagnetic tracking system. We evaluated measures of pelvic angle, quadriceps angle, tibiofemoral angle, genu recurvatum, femur length, and tibia length. Clinical measures were assessed using clinically accepted methods. Frontal- and sagittal-plane digital images were captured and imported into a computer software program. Anatomic landmarks were digitized using an electromagnetic tracking system to calculate static LEA. Intraclass correlation coefficients and standard errors of measurement were calculated to examine tester reliability. We calculated 95% limits of agreement and used Bland-Altman plots to examine agreement among clinical measures, digital photographs, and an electromagnetic tracking system. Using digital photographs, fair to excellent intratester (intraclass correlation coefficient range = 0.70-0.99) and intertester (intraclass correlation coefficient range = 0.75-0.97) reliability were observed for static knee alignment and limb-length measures. An acceptable level of agreement was observed between clinical measures and digital pictures for limb-length measures. When comparing clinical measures and digital photographs with the electromagnetic tracking system, an acceptable level of agreement was observed in measures of static knee angles and limb-length measures. The use of digital photographs and an electromagnetic tracking system appears to be an efficient and reliable method to assess static knee alignment and limb-length measurements.
Statistical models for causation: what inferential leverage do they provide?

PubMed

Freedman, David A

2006-12-01

Experiments offer more reliable evidence on causation than observational studies, which is not to gainsay the contribution to knowledge from observation. Experiments should be analyzed as experiments, not as observational studies. A simple comparison of rates might be just the right tool, with little value added by "sophisticated" models. This article discusses current models for causation, as applied to experimental and observational data. The intention-to-treat principle and the effect of treatment on the treated will also be discussed. Flaws in per-protocol and treatment-received estimates will be demonstrated.
Leveraging Observation Tools for Instructional Improvement: Exploring Variability in Uptake of Ambitious Instructional Practices

ERIC Educational Resources Information Center

Cohen, Julie; Schuldt, Lorien Chambers; Brown, Lindsay; Grossman, Pamela

2016-01-01

Background/Context: Current efforts to build rigorous teacher evaluation systems has increased interest in standardized classroom observation tools as reliable measures for assessing teaching. However, many argue these instruments can also be used to effect change in classroom practice. This study investigates a model of professional development…
Development of the System for Observing Student Movement in Academic Routines and Transitions (SOSMART)

ERIC Educational Resources Information Center

Russ, Laura B.; Webster, Collin A.; Beets, Michael W.; Egan, Catherine; Weaver, Robert Glenn; Harvey, Rachel; Phillips, David S.

2017-01-01

National attention on whole-of-school approaches to decrease children's sedentary behavior and increase physical activity includes movement integration (MI) in classrooms. The purpose of this study was to describe instrument development, reliability, and validity of the System for Observing Student Movement in Academic Routines and Transitions…
Assessing Psychodynamic Conflict.

PubMed

Simmonds, Joshua; Constantinides, Prometheas; Perry, J Christopher; Drapeau, Martin; Sheptycki, Amanda R

2015-09-01

Psychodynamic psychotherapies suggest that symptomatic relief is provided, in part, with the resolution of psychic conflicts. Clinical researchers have used innovative methods to investigate such phenomenon. This article aims to review the literature on quantitative psychodynamic conflict rating scales. An electronic search of the literature was conducted to retrieve quantitative observer-rated scales used to assess conflict noting each measure's theoretical model, information source, and training and clinical experience required. Scales were also examined for levels of reliability and validity. Five quantitative observer-rated conflict scales were identified. Reliability varied from poor to excellent with each measure demonstrating good validity. However a small number of studies and limited links to current conflict theory suggest further clinical research is needed.
Noninvasive measurement of burn wound depth applying infrared thermal imaging (Conference Presentation)

NASA Astrophysics Data System (ADS)

Jaspers, Mariëlle E.; Maltha, Ilse M.; Klaessens, John H.; Vet, Henrica C.; Verdaasdonk, Rudolf M.; Zuijlen, Paul P.

2016-02-01

In burn wounds early discrimination between the different depths plays an important role in the treatment strategy. The remaining vasculature in the wound determines its healing potential. Non-invasive measurement tools that can identify the vascularization are therefore considered to be of high diagnostic importance. Thermography is a non-invasive technique that can accurately measure the temperature distribution over a large skin or tissue area, the temperature is a measure of the perfusion of that area. The aim of this study was to investigate the clinimetric properties (i.e. reliability and validity) of thermography for measuring burn wound depth. In a cross-sectional study with 50 burn wounds of 35 patients, the inter-observer reliability and the validity between thermography and Laser Doppler Imaging were studied. With ROC curve analyses the ΔT cut-off point for different burn wound depths were determined. The inter-observer reliability, expressed by an intra-class correlation coefficient of 0.99, was found to be excellent. In terms of validity, a ΔT cut-off point of 0.96°C (sensitivity 71%; specificity 79%) differentiates between a superficial partial-thickness and deep partial-thickness burn. A ΔT cut-off point of -0.80°C (sensitivity 70%; specificity 74%) could differentiate between a deep partial-thickness and a full-thickness burn wound. This study demonstrates that thermography is a reliable method in the assessment of burn wound depths. In addition, thermography was reasonably able to discriminate among different burn wound depths, indicating its potential use as a diagnostic tool in clinical burn practice.
Fluorescent tag is not a reliable marker for small RNA transfection in the presence of serum.

PubMed

Han, Jing; Wang, Qi-Wei; Wang, Shi-Qiang

2013-09-01

Chemically synthetic siRNA and miRNA have become powerful tools to study gene function in the past decade. Fluorescent dyes covalently attached to the 5' or 3' ends of synthetic small RNAs are widely used for fluorescently imaging and detection of these RNAs. However, the reliability of fluorescent tags as small RNA markers in different conditions has not attracted enough attention. We used Cy3-labelled small RNAs to explore the reliability of fluorescent tags as small RNA markers in cell cultures involving serum. A strong Cy3-fluorescence signal was observed in the cytoplasm of the cells transfected with Cy3-miR24 in the culture medium containing fetal bovine serum (FBS), but qRT-PCR results showed that little miR24 were detected in these cells. Further study demonstrated that small RNAs were degraded in the presence of FBS, suggesting that it was Cy3-RNA fragments, rather than the original Cy3-miR24, diffused into cells. These phenomena disappeared when FBS was replaced by boiled-FBS, further supporting that the Cy3-fluorescence we observed in cells in the presence of FBS could not represent the presence of intact small RNAs. These findings addressed that fluorescent tags are not reliable for small RNA transfection in the presence of serum in culture.
Reliability of length measurements collected by community nurses and health volunteers in rural growth monitoring and promotion services.

PubMed

Laar, Matilda E; Marquis, Grace S; Lartey, Anna; Gray-Donald, Katherine

2018-02-17

Length measurements are important in growth, monitoring and promotion (GMP) for the surveillance of a child's weight-for-length and length-for-age. These two indices provide an indication of a child's risk of becoming wasted or stunted, and are more informative about a child's growth than the widely used weight-for-age index (underweight). Although the introduction of length measurements in GMP is recommended by the World Health Organization, concerns about the reliability of length measurements collected in rural outreach settings have been expressed by stakeholders. Our aim was to describe the reliability and challenges associated with community health personnel measuring length for rural outreach GMP activities. Two reliability studies (A and B), using 10 children less than 24 months each, were conducted in the GMP services of a rural district in Ghana. Fifteen nurses and 15 health volunteers (HV) with no prior experience in length measurements were trained. Intra- and inter-observer technical error of measurement (TEM), average bias from expert anthropometrist, and coefficient of reliability (R) of length measurements were assessed and compared across sessions. Observations and interviews were used to understand the ability and experiences of health personnel with measuring length at outreach GMP. Inter-observer TEM was larger than intra-observer TEM for both nurses and HV at both sessions and was unacceptably (compared to error standards) high in both groups at both time points. Average biases from expert's measurements were within acceptable limits, however, both groups tended to underestimate length measurements. The R for lengths collected by nurses (92.3%) was higher at session B compared to that of HV (87.5%). Length measurements taken by nurses and HV, and those taken by an experienced anthropometrist at GMP sessions were of moderate agreement (kappa = 0.53, p < 0.0001). The reliability of length measurements improved after two refresher trainings for nurses but not for HV. In addition, length measurements taken during GMP sessions may be susceptible to errors due to overburdened health personnel and crowded GMP clinics. There is need for both pre- and in-service training of nurses and HV on length measurements and procedures to improve reliability of length measurements.
fMRI reliability: influences of task and experimental design.

PubMed

Bennett, Craig M; Miller, Michael B

2013-12-01

As scientists, it is imperative that we understand not only the power of our research tools to yield results, but also their ability to obtain similar results over time. This study is an investigation into how common decisions made during the design and analysis of a functional magnetic resonance imaging (fMRI) study can influence the reliability of the statistical results. To that end, we gathered back-to-back test-retest fMRI data during an experiment involving multiple cognitive tasks (episodic recognition and two-back working memory) and multiple fMRI experimental designs (block, event-related genetic sequence, and event-related m-sequence). Using these data, we were able to investigate the relative influences of task, design, statistical contrast (task vs. rest, target vs. nontarget), and statistical thresholding (unthresholded, thresholded) on fMRI reliability, as measured by the intraclass correlation (ICC) coefficient. We also utilized data from a second study to investigate test-retest reliability after an extended, six-month interval. We found that all of the factors above were statistically significant, but that they had varying levels of influence on the observed ICC values. We also found that these factors could interact, increasing or decreasing the relative reliability of certain Task × Design combinations. The results suggest that fMRI reliability is a complex construct whose value may be increased or decreased by specific combinations of factors.
A Structured Clinical Interview for Kleptomania (SCI-K): preliminary validity and reliability testing.

PubMed

Grant, Jon E; Kim, Suck Won; McCabe, James S

2006-06-01

Kleptomania presents difficulties in diagnosis for clinicians. This study aimed to develop and test a DSM-IV-based diagnostic instrument for kleptomania. To assess for current kleptomania the Structured Clinical Interview for Kleptomania (SCI-K) was administered to 112 consecutive subjects requesting psychiatric outpatient treatment for a variety of disorders. Reliability and validity were determined. Classification accuracy was examined using the longitudinal course of illness. The SCI-K demonstrated excellent test-retest (Phi coefficient = 0.956 (95% CI = 0.937, 0.970)) and inter-rater reliability (phi coefficient = 0.718 (95% CI = 0.506, 0.848)) in the diagnosis of kleptomania. Concurrent validity was observed with a self-report measure using DSM-IV kleptomania criteria (phi coefficient = 0.769 (95% CI = 0.653, 0.850)). Discriminant validity was observed with a measure of depression (point biserial coefficient = -0.020 (95% CI = -0.205, 0.166)). The SCI-K demonstrated both high sensitivity and specificity based on longitudinal assessment. The SCI-K demonstrated excellent reliability and validity in diagnosing kleptomania in subjects presenting with various psychiatric problems. These findings require replication in larger groups, including non-psychiatric populations, to examine their generalizability. Copyright (c) 2006 John Wiley & Sons, Ltd.
Reliability and accuracy of three imaging software packages used for 3D analysis of the upper airway on cone beam computed tomography images.

PubMed

Chen, Hui; van Eijnatten, Maureen; Wolff, Jan; de Lange, Jan; van der Stelt, Paul F; Lobbezoo, Frank; Aarab, Ghizlane

2017-08-01

The aim of this study was to assess the reliability and accuracy of three different imaging software packages for three-dimensional analysis of the upper airway using CBCT images. To assess the reliability of the software packages, 15 NewTom 5G ® (QR Systems, Verona, Italy) CBCT data sets were randomly and retrospectively selected. Two observers measured the volume, minimum cross-sectional area and the length of the upper airway using Amira ® (Visage Imaging Inc., Carlsbad, CA), 3Diagnosys ® (3diemme, Cantu, Italy) and OnDemand3D ® (CyberMed, Seoul, Republic of Korea) software packages. The intra- and inter-observer reliability of the upper airway measurements were determined using intraclass correlation coefficients and Bland & Altman agreement tests. To assess the accuracy of the software packages, one NewTom 5G ® CBCT data set was used to print a three-dimensional anthropomorphic phantom with known dimensions to be used as the "gold standard". This phantom was subsequently scanned using a NewTom 5G ® scanner. Based on the CBCT data set of the phantom, one observer measured the volume, minimum cross-sectional area, and length of the upper airway using Amira ® , 3Diagnosys ® , and OnDemand3D ® , and compared these measurements with the gold standard. The intra- and inter-observer reliability of the measurements of the upper airway using the different software packages were excellent (intraclass correlation coefficient ≥0.75). There was excellent agreement between all three software packages in volume, minimum cross-sectional area and length measurements. All software packages underestimated the upper airway volume by -8.8% to -12.3%, the minimum cross-sectional area by -6.2% to -14.6%, and the length by -1.6% to -2.9%. All three software packages offered reliable volume, minimum cross-sectional area and length measurements of the upper airway. The length measurements of the upper airway were the most accurate results in all software packages. All software packages underestimated the upper airway dimensions of the anthropomorphic phantom.
Intra- and interobserver reliability of the Eaton classification for trapeziometacarpal arthritis: a systematic review.

PubMed

Berger, Aaron J; Momeni, Arash; Ladd, Amy L

2014-04-01

Trapeziometacarpal, or thumb carpometacarpal (CMC), arthritis is a common problem with a variety of treatment options. Although widely used, the Eaton radiographic staging system for CMC arthritis is of questionable clinical utility, as disease severity does not predictably correlate with symptoms or treatment recommendations. A possible reason for this is that the classification itself may not be reliable, but the literature on this has not, to our knowledge, been systematically reviewed. We therefore performed a systematic review to determine the intra- and interobserver reliability of the Eaton staging system. We systematically reviewed English-language studies published between 1973 and 2013 to assess the degree of intra- and interobserver reliability of the Eaton classification for determining the stage of trapeziometacarpal joint arthritis and pantrapezial arthritis based on plain radiographic imaging. Search engines included: PubMed, Scopus(®), and CINAHL. Four studies, which included a total of 163 patients, met our inclusion criteria and were evaluated. The level of evidence of the studies included in this analysis was determined using the Oxford Centre for Evidence Based Medicine Levels of Evidence Classification by two independent observers. A limited number of studies have been performed to assess intra- and interobserver reliability of the Eaton classification system. The four studies included were determined to be Level 3b. These studies collectively indicate that the Eaton classification demonstrates poor to fair interobserver reliability (kappa values: 0.11-0.56) and fair to moderate intraobserver reliability (kappa values: 0.54-0.657). Review of the literature demonstrates that radiographs assist in the assessment of CMC joint disease, but there is not a reliable system for classification of disease severity. Currently, diagnosis and treatment of thumb CMC arthritis are based on the surgeon's qualitative assessment combining history, physical examination, and radiographic evaluation. Inconsistent agreement using the current common radiographic classification system suggests a need for better radiographic tools to quantify disease severity.
Responsiveness to child feeding cues: an observational scale

USDA-ARS?s Scientific Manuscript database

Mismatched caregiver responsiveness to child hunger and satiety cues, is thought to contribute to obesity in infancy and beyond. Assessment of this proposition, however, has been limited by a lack of reliable and valid measures. This research evaluated the interrater reliability of a new observation...
Temporal eye movement strategies during naturalistic viewing

PubMed Central

Wang, Helena X.; Freeman, Jeremy; Merriam, Elisha P.; Hasson, Uri; Heeger, David J.

2011-01-01

The deployment of eye movements to complex spatiotemporal stimuli likely involves a variety of cognitive factors. However, eye movements to movies are surprisingly reliable both within and across observers. We exploited and manipulated that reliability to characterize observers’ temporal viewing strategies. Introducing cuts and scrambling the temporal order of the resulting clips systematically changed eye movement reliability. We developed a computational model that exhibited this behavior and provided an excellent fit to the measured eye movement reliability. The model assumed that observers searched for, found, and tracked a point-of-interest, and that this process reset when there was a cut. The model did not require that eye movements depend on temporal context in any other way, and it managed to describe eye movements consistently across different observers and two movie sequences. Thus, we found no evidence for the integration of information over long time scales (greater than a second). The results are consistent with the idea that observers employ a simple tracking strategy even while viewing complex, engaging naturalistic stimuli. PMID:22262911
Comparison of Mean Climate Trends in the Northern Hemisphere Between N.C.E.P. and Two Atmosphere-Ocean Model Forced Runs

NASA Technical Reports Server (NTRS)

Lucarini, Valerio; Russell, Gary L.; Hansen, James E. (Technical Monitor)

2002-01-01

Results are presented for two greenhouse gas experiments of the Goddard Institute for Space Studies Atmosphere-Ocean Model (AOM). The computed trends of surface pressure, surface temperature, 850, 500 and 200 mb geopotential heights and related temperatures of the model for the time frame 1960-2000 are compared to those obtained from the National Centers for Environmental Prediction observations. A spatial correlation analysis and mean value comparison are performed, showing good agreement. A brief general discussion about the statistics of trend detection is presented. The domain of interest is the Northern Hemisphere (NH) because of the higher reliability of both the model results and the observations. The accuracy that this AOM has in describing the observed regional and NH climate trends makes it reliable in forecasting future climate changes.
Enhanced Component Performance Study: Air-Operated Valves 1998-2014

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schroeder, John Alton

2015-11-01

This report presents a performance evaluation of air-operated valves (AOVs) at U.S. commercial nuclear power plants. The data used in this study are based on the operating experience failure reports from fiscal year 1998 through 2014 for the component reliability as reported in the Institute of Nuclear Power Operations (INPO) Consolidated Events Database (ICES). The AOV failure modes considered are failure-to-open/close, failure to operate or control, and spurious operation. The component reliability estimates and the reliability data are trended for the most recent 10-year period, while yearly estimates for reliability are provided for the entire active period. One statistically significantmore » trend was observed in the AOV data: The frequency of demands per reactor year for valves recording the fail-to-open or fail-to-close failure modes, for high-demand valves (those with greater than twenty demands per year), was found to be decreasing. The decrease was about three percent over the ten year period trended.« less
[Development and reliability evaluation of an instrument to measure health-related quality of life in independent elderly].

PubMed

Lima, Maria José Barbosa de; Portela, Margareth Crisóstomo

2010-08-01

This study presents an instrument, the health-related quality of life (HRQOL) profile for independent elderly, to measure the health-related quality of life of the functionally independent elderly assisted in the outpatient setting, based on the adaptation of four validated scales: Short-Form Health Survey (SF-36), Duke-UNC Health Profile (DUHP), Sickness Impact Profile (SIP), and Nottingham Health Profile (NHP). The study also evaluates the instrument's reliability based on its use by two different observers with a 15-day interval. The instrument includes five dimensions (health perception, symptoms, physical function, psychological function, and social function) and 45 items. Reliability evaluation of the QUASI instrument was based on interviews with 142 elderly outpatients in the city of Rio de Janeiro, Brazil. Prevalence-adjusted kappa statistic was used to assess all 45 items. Correlation was also calculated between overall scores and scores on individual dimensions. In the reliability evaluation, 39 of the 45 items showed prevalence-adjusted kappa greater than 0.60.
Morphosyntactic Development of Bangla-Speaking Preschool Children

ERIC Educational Resources Information Center

Sultana, Asifa; Stokes, Stephanie; Klee, Thomas; Fletcher, Paul

2016-01-01

This study examines the morphosyntactic development, specifically verb morphology, of typically-developing Bangla-speaking children between the ages of two and four. Three verb forms were studied: the Present Simple, the Present Progressive and the Past Progressive. The study was motivated by the observations that reliable language-specific…
Reliability of Lactation Assessment Tools Applied to Overweight and Obese Women.

PubMed

Chapman, Donna J; Doughty, Katherine; Mullin, Elizabeth M; Pérez-Escamilla, Rafael

2016-05-01

The interrater reliability of lactation assessment tools has not been evaluated in overweight/obese women. This study aimed to compare the interrater reliability of 4 lactation assessment tools in this population. A convenience sample of 45 women (body mass index > 27.0) was videotaped while breastfeeding (twice daily on days 2, 4, and 7 postpartum). Three International Board Certified Lactation Consultants independently rated each videotaped session using 4 tools (Infant Breastfeeding Assessment Tool [IBFAT], modified LATCH [mLATCH], modified Via Christi [mVC], and Riordan's Tool [RT]). For each day and tool, we evaluated interrater reliability with 1-way repeated-measures analyses of variance, intraclass correlation coefficients (ICCs), and percentage absolute agreement between raters. Analyses of variance showed significant differences between raters' scores on day 2 (all scales) and day 7 (RT). Intraclass correlation coefficient values reflected good (mLATCH) to excellent reliability (IBFAT, mVC, and RT) on days 2 and 7. All day 4 ICCs reflected good reliability. The ICC for mLATCH was significantly lower than all others on day 2 and was significantly lower than IBFAT (day 7). Percentage absolute interrater agreement for scale components ranged from 31% (day 2: observable swallowing, RT) to 92% (day 7: IBFAT, fixing; and mVC, latch time). Swallowing scores on all scales had the lowest levels of interrater agreement (31%-64%). We demonstrated differences in the interrater reliability of 4 lactation assessment tools when applied to overweight/obese women, with the lowest values observed on day 4. Swallowing assessment was particularly unreliable. Researchers and clinicians using these scales should be aware of the differences in their psychometric behavior. © The Author(s) 2015.
An Experimental Study of Procedures to Enhance Ratings of Fidelity to an Evidence-Based Family Intervention.

PubMed

Smith, Justin D; Dishion, Thomas J; Brown, Kimbree; Ramos, Karina; Knoble, Naomi B; Shaw, Daniel S; Wilson, Melvin N

2016-01-01

The valid and reliable assessment of fidelity is critical at all stages of intervention research and is particularly germane to interpreting the results of efficacy and implementation trials. Ratings of protocol adherence typically are reliable, but ratings of therapist competence are plagued by low reliability. Because family context and case conceptualization guide the therapist's delivery of interventions, the reliability of fidelity ratings might be improved if the coder is privy to client context in the form of an ecological assessment. We conducted a randomized experiment to test this hypothesis. A subsample of 46 families with 5-year-old children from a multisite randomized trial who participated in the feedback session of the Family Check-Up (FCU) intervention were selected. We randomly assigned FCU feedback sessions to be rated for fidelity to the protocol using the COACH rating system either after the coder reviewed the results of a recent ecological assessment or had not. Inter-rater reliability estimates of fidelity ratings were meaningfully higher for the assessment information condition compared to the no-information condition. Importantly, the reliability of the COACH mean score was found to be statistically significantly higher in the information condition. These findings suggest that the reliability of observational ratings of fidelity, particularly when the competence or quality of delivery is considered, could be improved by providing assessment data to the coders. Our findings might be most applicable to assessment-driven interventions, where assessment data explicitly guides therapist's selection of intervention strategies tailored to the family's context and needs, but they could also apply to other intervention programs and observational coding of context-dependent therapy processes, such as the working alliance.

An Experimental Study of Procedures to Enhance Ratings of Fidelity to an Evidence-Based Family Intervention

PubMed Central

Smith, Justin D.; Dishion, Thomas J.; Brown, Kimbree; Ramos, Karina; Knoble, Naomi B.; Shaw, Daniel S.; Wilson, Melvin N.

2015-01-01

The valid and reliable assessment of fidelity is critical at all stages of intervention research and is particularly germane to interpreting the results of efficacy and implementation trials. Ratings of protocol adherence typically are reliable, but ratings of therapist competence are plagued by low reliability. Because family context and case conceptualization guide the therapist's delivery of interventions, the reliability of fidelity ratings might be improved if the coder is privy to client context in the form of an ecological assessment. We conducted a randomized experiment to test this hypothesis. A subsample of 46 families with 5-year-old children from a multisite randomized trial who participated in the feedback session of the Family Check-Up (FCU) intervention were selected. We randomly assigned FCU feedback sessions to be rated for fidelity to the protocol using the COACH rating system either after the coder reviewed the results of a recent ecological assessment or had not. Inter-rater reliability estimates of fidelity ratings were meaningfully higher for the assessment information condition compared to the no-information condition. Importantly, the reliability of the COACH mean score was found to be statistically significantly higher in the information condition. These findings suggest that the reliability of observational ratings of fidelity, particularly when the competence or quality of delivery is considered, could be improved by providing assessment data to the coders. Our findings might be most applicable to assessment-driven interventions, where assessment data explicitly guides therapist's selection of intervention strategies tailored to the family's context and needs, but they could also apply to other intervention programs and observational coding of context-dependent therapy processes, such as the working alliance. PMID:26271300
Reliable and valid assessment of competence in endoscopic ultrasonography and fine-needle aspiration for mediastinal staging of non-small cell lung cancer.

PubMed

Konge, L; Vilmann, P; Clementsen, P; Annema, J T; Ringsted, C

2012-10-01

Fine-needle aspiration (FNA) guided by endoscopic ultrasonography (EUS) is important in mediastinal staging of non-small cell lung cancer (NSCLC). Training standards and implementation strategies of this technique are currently under discussion. The aim of this study was to explore the reliability and validity of a newly developed EUS Assessment Tool (EUSAT) designed to measure competence in EUS - FNA for mediastinal staging of NSCLC. A total of 30 patients with proven or suspected NSCLC underwent EUS - FNA for mediastinal staging by three trainees and three experienced physicians. Their performances were assessed prospectively by three experts in EUS under direct observation and again 2 months later in a blinded fashion using digital video-recordings. Based on the assessments, intra-rater reliability, inter-rater reliability, and construct validity were explored. The intra-rater reliability was good (Cronbach's α = 0.80), but comparison of results based on direct observations and blinded video-recordings indicated a significant bias favoring consultants (P = 0.022). Inter-rater reliability was very good (Cronbach's α = 0.93). However, one rater assessing five procedures or two raters each assessing four procedures were necessary to secure a generalizability coefficient of 0.80. The assessment tool demonstrated construct validity by discriminating between trainees and experienced physicians (P = 0.034). Competency in mediastinal staging of NSCLC using EUS and EUS - FNA can be assessed in a reliable and valid way using the EUSAT assessment tool. Measuring and defining competency and training requirements could improve EUS quality and benefit patient care. © Georg Thieme Verlag KG Stuttgart · New York.
The Development of the Cleft Aesthetic Rating Scale: A New Rating Scale for the Assessment of Nasolabial Appearance in Complete Unilateral Cleft Lip and Palate Patients.

PubMed

Mosmuller, David G M; Mennes, Lisette M; Prahl, Charlotte; Kramer, Gem J C; Disse, Melissa A; van Couwelaar, Gijs M; Niessen, Frank B; Griot, J P W Don

2017-09-01

The development of the Cleft Aesthetic Rating Scale, a simple and reliable photographic reference scale for the assessment of nasolabial appearance in complete unilateral cleft lip and palate patients. A blind retrospective analysis of photographs of cleft lip and palate patients was performed with this new rating scale. VU Medical Center Amsterdam and the Academic Center for Dentistry of Amsterdam. Complete unilateral cleft lip and palate patients at the age of 6 years. Photographs that showed the highest interobserver agreement in earlier assessments were selected for the photographic reference scale. Rules were attached to the rating scale to provide a guideline for the assessment and improve interobserver reliability. Cropped photographs revealing only the nasolabial area were assessed by six observers using this new Cleft Aesthetic Rating Scale in two different sessions. Photographs of 62 children (6 years of age, 44 boys and 18 girls) were assessed. The interobserver reliability for the nose and lip together was 0.62, obtained with the intraclass correlation coefficient. To measure the internal consistency, a Cronbach alpha of .91 was calculated. The estimated reliability for three observers was .84, obtained with the Spearman Brown formula. A new, easy to use, and reliable scoring system with a photographic reference scale is presented in this study.
Further examination of the temporal stability of alcohol demand.

PubMed

Acuff, Samuel F; Murphy, James G

2017-08-01

Demand, or the amount of a substance consumed as a function of price, is a central dependent measure in behavioral economic research and represents the relative valuation of a substance. Although demand is often utilized as an index of substance use severity and is assumed to be relatively stable, recent experimental and clinical research has identified conditions in which demand can be manipulated, such as through craving and stress inductions, and treatment. Our study examines the 1-month reliability of the alcohol purchase task in a sample of heavy drinking college students. We also analyzed reliability in subgroup of individuals whose consumption decreased, increased, or stayed the same over the 1-month period, and in individuals with moderate/severe Alcohol Use Disorder (AUD) vs. those with no/mild AUD. Reliability was moderate in the full sample, high in the group with stable consumption, and did not differ appreciably between AUD groups. Observed indices and indices derived from an exponentiated equation (Koffarnus et al., 2015) were generally comparable, although P max observed had very low reliability. Area under the curve, O max derived, and essential value showed the greatest reliability in the full sample (rs=0.75-0.77). These results provide evidence for the relative stability over time of demand and across AUD groups, particularly in those whose consumption remains stable. Copyright © 2017 Elsevier B.V. All rights reserved.
Spanish version of the Kidney Disease Knowledge Survey (KiKS) in Peru: cross-cultural adaptation and validation.

PubMed

Mota-Anaya, Evelin; Yumpo-Cárdenas, Daniel; Alva-Bravo, Edmundo; Wright-Nunes, Julie; Mayta-Tristán, Percy

2016-08-08

Chronic kidney disease (CKD) affects 50 million people globally. Several studies show the importance of implementing interventions that enhance patients knowledge about their disease. In 2011 the Kidney Disease Knowledge Survey (KiKS) was developed: a questionnaire that assesses the specific knowledge about chronic kidney disease in pre-dialysis patients. To translate to Spanish, culturally adapt and validate the Kidney Disease Knowledge Survey questionnaire in a population of patients with pre-dialysis chronic kidney disease. We carried out a Spanish translation and cross-cultural adaptation of the Kidney Disease Knowledge Survey questionnaire. Subsequently, we determined its validity and reliability. We determined the validity through construct validity; and reliability by evaluating its internal consistency and its intra-observer reliability (test-retest). We found a good internal consistency (Kuder-Richardson = 0.85). The intra-observer reliability was measured by the intra-class correlation coefficient that yielded a value of 0.78 (95% CI: 0.5-1.0). This value indicated a good reproducibility; also, the mean difference of -1.1 test-retest SD 6.0 (p = 0.369) confirms this finding. The translated Spanish version of the Kidney Disease Knowledge Survey is acceptable and equivalent to the original version; it also has a good reliability, validity and reproducibility. Therefore, it can be used in a population of patients with pre-dialysis chronic kidney disease.
Analysis of the psychometric properties of the American Orthopaedic Foot and Ankle Society Score (AOFAS) in rheumatoid arthritis patients: application of the Rasch model.

PubMed

Conceição, Cristiano Sena da; Neto, Mansueto Gomes; Neto, Anolino Costa; Mendes, Selena M D; Baptista, Abrahão Fontes; Sá, Kátia Nunes

2016-01-01

To tested the reliability and validity of Aofas in a sample of rheumatoid arthritis patients. The scale was applicable to rheumatoid arthritis patients, twice by the interviewer 1 and once by the interviewer 2. The Aofas was subjected to test-retest reliability analysis (with 20 Rheumatoid arthritis subjects). The psychometric properties were investigated using Rasch analysis on 33 Rheumatoid arthritis patients. Intra-Class Correlation Coefficient (ICC) were (0.90
How Many Sleep Diary Entries Are Needed to Reliably Estimate Adolescent Sleep?

PubMed Central

Arora, Teresa; Gradisar, Michael; Taheri, Shahrad; Carskadon, Mary A.

2017-01-01

Abstract Study Objectives: To investigate (1) how many nights of sleep diary entries are required for reliable estimates of five sleep-related outcomes (bedtime, wake time, sleep onset latency [SOL], sleep duration, and wake after sleep onset [WASO]) and (2) the test–retest reliability of sleep diary estimates of school night sleep across 12 weeks. Methods: Data were drawn from four adolescent samples (Australia [n = 385], Qatar [n = 245], United Kingdom [n = 770], and United States [n = 366]), who provided 1766 eligible sleep diary weeks for reliability analyses. We performed reliability analyses for each cohort using complete data (7 days), one to five school nights, and one to two weekend nights. We also performed test–retest reliability analyses on 12-week sleep diary data available from a subgroup of 55 US adolescents. Results: Intraclass correlation coefficients for bedtime, SOL, and sleep duration indicated good-to-excellent reliability from five weekday nights of sleep diary entries across all adolescent cohorts. Four school nights was sufficient for wake times in the Australian and UK samples, but not the US or Qatari samples. Only Australian adolescents showed good reliability for two weekend nights of bedtime reports; estimates of SOL were adequate for UK adolescents based on two weekend nights. WASO was not reliably estimated using 1 week of sleep diaries. We observed excellent test–rest reliability across 12 weeks of sleep diary data in a subsample of US adolescents. Conclusion: We recommend at least five weekday nights of sleep dairy entries to be made when studying adolescent bedtimes, SOL, and sleep duration. Adolescent sleep patterns were stable across 12 consecutive school weeks. PMID:28199718
Fatigue in children: reliability and validity of the Dutch PedsQL™ Multidimensional Fatigue Scale.

PubMed

Gordijn, M Suzanne; Suzanne Gordijn, M; Cremers, Eline M P; Kaspers, Gertjan J L; Gemke, Reinoud J B J

2011-09-01

The aim of the study is to report on the feasibility, reliability, validity, and the norm-references of the Dutch version of the PedsQL™ Multidimensional Fatigue Scale. The study participants are four hundred and ninety-seven parents of children aged 2-18 years and 366 children aged 5-18 years from various day care facilities, elementary schools, and a high school who completed the Dutch version of the PedsQL™ Multidimensional Fatigue Scale. The number of missing items was minimal. All scales showed satisfactory internal consistency reliability, with Cronbach's coefficient alpha exceeding 0.70. Test-retest reliability was good to excellent (ICCs 0.68-0.84) and inter-observer reliability varied from moderate to excellent (ICCs 0.56-0.93) for total scores. Parent/child concordance for total scores was poor to good (ICCs 0.25-0.68). The PedsQL™ Multidimensional Fatigue Scale was able to distinguish between healthy children and children with an impaired health condition. The Dutch version of the PedsQL™ Multidimensional Fatigue Scale demonstrates an adequate feasibility, reliability, and validity in another sociocultural context. With the obtained norm-references, it can be utilized as a tool in the evaluation of fatigue in healthy and chronically ill children aged 2-18 years.
21 CFR 201.57 - Specific requirements on content and format of labeling for human prescription drug and...

Code of Federal Regulations, 2013 CFR

2013-04-01

... comparative rates of occurrence cannot be reliably determined (e.g., adverse reactions were observed only in... in vivo study designs or results (e.g., drug interaction studies), may be included in this section if...
21 CFR 201.57 - Specific requirements on content and format of labeling for human prescription drug and...

Code of Federal Regulations, 2012 CFR

2012-04-01

... comparative rates of occurrence cannot be reliably determined (e.g., adverse reactions were observed only in... in vivo study designs or results (e.g., drug interaction studies), may be included in this section if...
21 CFR 201.57 - Specific requirements on content and format of labeling for human prescription drug and...

Code of Federal Regulations, 2014 CFR

2014-04-01

... comparative rates of occurrence cannot be reliably determined (e.g., adverse reactions were observed only in... in vivo study designs or results (e.g., drug interaction studies), may be included in this section if...
Evaluation of patients with painful total hip arthroplasty using combined single photon emission tomography and conventional computerized tomography (SPECT/CT) - a comparison of semi-quantitative versus 3D volumetric quantitative measurements.

PubMed

Barthassat, Emilienne; Afifi, Faik; Konala, Praveen; Rasch, Helmut; Hirschmann, Michael T

2017-05-08

It was the primary purpose of our study to evaluate the inter- and intra-observer reliability of a standardized SPECT/CT algorithm for evaluating patients with painful primary total hip arthroplasty (THA). The secondary purpose was a comparison of semi-quantitative and 3D volumetric quantification method for assessment of bone tracer uptake (BTU) in those patients. A novel SPECT/CT localization scheme consisting of 14 femoral and 4 acetabular regions on standardized axial and coronal slices was introduced and evaluated in terms of inter- and intra-observer reliability in 37 consecutive patients with hip pain after THA. BTU for each anatomical region was assessed semi-quantitatively using a color-coded Likert type scale (0-10) and volumetrically quantified using a validated software. Two observers interpreted the SPECT/CT findings in all patients two times with six weeks interval between interpretations in random order. Semi-quantitative and quantitative measurements were compared in terms of reliability. In addition, the values were correlated using Pearson`s correlation. A factorial cluster analysis of BTU was performed to identify clinically relevant regions, which should be grouped and analysed together. The localization scheme showed high inter- and intra-observer reliabilities for all femoral and acetabular regions independent of the measurement method used (semiquantitative versus 3D volumetric quantitative measurements). A high to moderate correlation between both measurement methods was shown for the distal femur, the proximal femur and the acetabular cup. The factorial cluster analysis showed that the anatomical regions might be summarized into three distinct anatomical regions. These were the proximal femur, the distal femur and the acetabular cup region. The SPECT/CT algorithm for assessment of patients with pain after THA is highly reliable independent from the measurement method used. Three clinically relevant anatomical regions (proximal femoral, distal femoral, acetabular) were identified.
A Tool for Measuring Active Learning in the Classroom

PubMed Central

Devlin, John W.; Kirwin, Jennifer L.; Qualters, Donna M.

2007-01-01

Objectives To develop a valid and reliable active-learning inventory tool for use in large classrooms and compare faculty perceptions of active-learning using the Active-Learning Inventory Tool. Methods The Active-Learning Inventory Tool was developed using published literature and validated by national experts in educational research. Reliability was established by trained faculty members who used the Active-Learning Inventory Tool to observe 9 pharmacy lectures. Instructors were then interviewed to elicit perceptions regarding active learning and asked to share their perceptions. Results Per lecture, 13 (range: 4-34) episodes of active learning encompassing 3 (range: 2-5) different types of active learning occurred over 2.2 minutes (0.6-16) per episode. Both interobserver (≥87%) and observer-instructor agreement (≥68%) were high for these outcomes. Conclusions The Active-Learning Inventory Tool is a valid and reliable tool to measure active learning in the classroom. Future studies are needed to determine the impact of the Active-Learning Inventory Tool on teaching and its usefulness in other disciplines. PMID:17998982
Point-Connecting Measurements of the Hallux Valgus Deformity: A New Measurement and Its Clinical Application

PubMed Central

Seo, Jeong-Ho; Boedijono, Dimas

2016-01-01

Purpose The aim of this study was to investigate new point-connecting measurements for the hallux valgus angle (HVA) and the first intermetatarsal angle (IMA), which can reflect the degree of subluxation of the first metatarsophalangeal joint (MTPJ). Also, this study attempted to compare the validity of midline measurements and the new point-connecting measurements for the determination of HVA and IMA values. Materials and Methods Sixty feet of hallux valgus patients who underwent surgery between 2007 and 2011 were classified in terms of the severity of HVA, congruency of the first MTPJ, and type of chevron metatarsal osteotomy. On weight-bearing dorsal-plantar radiographs, HVA and IMA values were measured and compared preoperatively and postoperatively using both the conventional and new methods. Results Compared with midline measurements, point-connecting measurements showed higher inter- and intra-observer reliability for preoperative HVA/IMA and similar or higher inter- and intra-observer reliability for postoperative HVA/IMA. Patients who underwent distal chevron metatarsal osteotomy (DCMO) had higher intraclass correlation coefficient for inter- and intra-observer reliability for pre- and post-operative HVA and IMA measured by the point-connecting method compared with the midline method. All differences in the preoperative HVAs and IMAs determined by both the midline method and point-connecting methods were significant between the deviated group and subluxated groups (p=0.001). Conclusion The point-connecting method for measuring HVA and IMA in the subluxated first MTPJ may better reflect the severity of a HV deformity with higher reliability than the midline method, and is more useful in patients with DCMO than in patients with proximal chevron metatarsal osteotomy. PMID:26996576
What do we gain with Probabilistic Flood Loss Models?

NASA Astrophysics Data System (ADS)

Schroeter, K.; Kreibich, H.; Vogel, K.; Merz, B.; Lüdtke, S.

2015-12-01

The reliability of flood loss models is a prerequisite for their practical usefulness. Oftentimes, traditional uni-variate damage models as for instance depth-damage curves fail to reproduce the variability of observed flood damage. Innovative multi-variate probabilistic modelling approaches are promising to capture and quantify the uncertainty involved and thus to improve the basis for decision making. In this study we compare the predictive capability of two probabilistic modelling approaches, namely Bagging Decision Trees and Bayesian Networks and traditional stage damage functions which are cast in a probabilistic framework. For model evaluation we use empirical damage data which are available from computer aided telephone interviews that were respectively compiled after the floods in 2002, 2005, 2006 and 2013 in the Elbe and Danube catchments in Germany. We carry out a split sample test by sub-setting the damage records. One sub-set is used to derive the models and the remaining records are used to evaluate the predictive performance of the model. Further we stratify the sample according to catchments which allows studying model performance in a spatial transfer context. Flood damage estimation is carried out on the scale of the individual buildings in terms of relative damage. The predictive performance of the models is assessed in terms of systematic deviations (mean bias), precision (mean absolute error) as well as in terms of reliability which is represented by the proportion of the number of observations that fall within the 95-quantile and 5-quantile predictive interval. The reliability of the probabilistic predictions within validation runs decreases only slightly and achieves a very good coverage of observations within the predictive interval. Probabilistic models provide quantitative information about prediction uncertainty which is crucial to assess the reliability of model predictions and improves the usefulness of model results.
Confronting uncertainty in flood damage predictions

NASA Astrophysics Data System (ADS)

Schröter, Kai; Kreibich, Heidi; Vogel, Kristin; Merz, Bruno

2015-04-01

Reliable flood damage models are a prerequisite for the practical usefulness of the model results. Oftentimes, traditional uni-variate damage models as for instance depth-damage curves fail to reproduce the variability of observed flood damage. Innovative multi-variate probabilistic modelling approaches are promising to capture and quantify the uncertainty involved and thus to improve the basis for decision making. In this study we compare the predictive capability of two probabilistic modelling approaches, namely Bagging Decision Trees and Bayesian Networks. For model evaluation we use empirical damage data which are available from computer aided telephone interviews that were respectively compiled after the floods in 2002, 2005 and 2006, in the Elbe and Danube catchments in Germany. We carry out a split sample test by sub-setting the damage records. One sub-set is used to derive the models and the remaining records are used to evaluate the predictive performance of the model. Further we stratify the sample according to catchments which allows studying model performance in a spatial transfer context. Flood damage estimation is carried out on the scale of the individual buildings in terms of relative damage. The predictive performance of the models is assessed in terms of systematic deviations (mean bias), precision (mean absolute error) as well as in terms of reliability which is represented by the proportion of the number of observations that fall within the 95-quantile and 5-quantile predictive interval. The reliability of the probabilistic predictions within validation runs decreases only slightly and achieves a very good coverage of observations within the predictive interval. Probabilistic models provide quantitative information about prediction uncertainty which is crucial to assess the reliability of model predictions and improves the usefulness of model results.
A clinician-administered observation and corresponding caregiver interview capturing DSM-5 sensory reactivity symptoms in children with ASD.

PubMed

Siper, Paige M; Kolevzon, Alexander; Wang, A Ting; Buxbaum, Joseph D; Tavassoli, Teresa

2017-06-01

Sensory reactivity is a new criterion for autism spectrum disorder (ASD) in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5). However, there is no consensus on how to reliably measure sensory reactivity, particularly in minimally verbal individuals. The current study is an initial validation of the Sensory Assessment for Neurodevelopmental Disorders (SAND), a novel clinician-administered observation and corresponding caregiver interview that captures sensory symptoms based on DSM-5 criteria for ASD. Eighty children between the ages of 2 and 12 participated in this study; 44 children with ASD and 36 typically developing (TD) children. Sensory reactivity symptoms were measured using the SAND and the already validated Short Sensory Profile (SSP). Initial psychometric properties of the SAND were examined including reliability, validity, sensitivity and specificity. Children with ASD showed significantly more sensory reactivity symptoms compared to TD children across sensory domains (visual, tactile, and auditory) and within sensory subtypes (hyperreactivity, hyporeactivity and seeking). The SAND showed strong internal consistency, inter-rater reliability and test-retest reliability, high sensitivity (95.5%) and specificity (91.7%), and strong convergent validity with the SSP. The SAND provides a novel method to characterize sensory reactivity symptoms based on DSM-5 criteria for ASD. This is the first known sensory assessment that combines a clinician-administered observation and caregiver interview to optimally capture sensory phenotypes characteristic of individuals with neurodevelopmental disorders. The SAND offers a beneficial new tool for both research and clinical purposes and has the potential to meaningfully enhance gold-standard assessment of ASD. Autism Res 2017, 10: 1133-1140. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.
Can we improve accuracy and reliability of MRI interpretation in children with optic pathway glioma? Proposal for a reproducible imaging classification.

PubMed

Lambron, Julien; Rakotonjanahary, Josué; Loisel, Didier; Frampas, Eric; De Carli, Emilie; Delion, Matthieu; Rialland, Xavier; Toulgoat, Frédérique

2016-02-01

Magnetic resonance (MR) images from children with optic pathway glioma (OPG) are complex. We initiated this study to evaluate the accuracy of MR imaging (MRI) interpretation and to propose a simple and reproducible imaging classification for MRI. We randomly selected 140 MRIs from among 510 MRIs performed on 104 children diagnosed with OPG in France from 1990 to 2004. These images were reviewed independently by three radiologists (F.T., 15 years of experience in neuroradiology; D.L., 25 years of experience in pediatric radiology; and J.L., 3 years of experience in radiology) using a classification derived from the Dodge and modified Dodge classifications. Intra- and interobserver reliabilities were assessed using the Bland-Altman method and the kappa coefficient. These reviews allowed the definition of reliable criteria for MRI interpretation. The reviews showed intraobserver variability and large discrepancies among the three radiologists (kappa coefficient varying from 0.11 to 1). These variabilities were too large for the interpretation to be considered reproducible over time or among observers. A consensual analysis, taking into account all observed variabilities, allowed the development of a definitive interpretation protocol. Using this revised protocol, we observed consistent intra- and interobserver results (kappa coefficient varying from 0.56 to 1). The mean interobserver difference for the solid portion of the tumor with contrast enhancement was 0.8 cm(3) (limits of agreement = -16 to 17). We propose simple and precise rules for improving the accuracy and reliability of MRI interpretation for children with OPG. Further studies will be necessary to investigate the possible prognostic value of this approach.
Two-year Test-Retest Reliability in High School Athletes Using the Four- and Two-Factor ImPACT Composite Structures: The Effects of Learning Disorders and Headache/Migraine Treatment History.

PubMed

Brett, Benjamin L; Solomon, Gary S; Hill, Jennifer; Schatz, Philip

2018-03-01

This study examined the test-retest reliability of the four- and two-factor structures (i.e., Memory and Speed) of ImPACT over a 2-year interval across multiple groups with premorbid conditions, including those with a history of special education or learning disorders (LD; n = 114), treatment history for headache/migraine (n = 81), and a control group (n = 792). Nine hundred and eighty seven high school athletes completed baseline testing using online ImPACT across a 2-year interval. Paired-samples t-tests documented improvement from initial to follow-up assessments. Test stability was examined using Regression-based measures (RBM) and Reliable change indices (RCI). Reliability was examined using intraclass correlation coefficients (ICC). Significant improvement on all four composites were observed for the control group over a 2-year interval; whereas significant differences were observed only on Visual Motor Speed for the LD and headache/migraine treatment history groups. ICCs ranges were similar across groups and greater or comparable reliability was observed for the two-factor structure on Memory (0.67-0.73) and Speed (0.76-0.78) composites. RCIs and RBMs demonstrated stability for the four- and two-factor structures, with few cases falling outside the range of expected change within a healthy sample at the 90% and 95% CIs. Typical practices of obtaining new baselines every 2 years in the high school population can be applied to athletes with a history of special education or LD and headache/migraine treatment. The two-factor structure has potential to increase test-retest reliability. Further research regarding clinical utility is needed. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Reliability of a survey tool for measuring consumer nutrition environment in urban food stores.

PubMed

Hosler, Akiko S; Dharssi, Aliza

2011-01-01

Despite the increase in the volume and importance of food environment research, there is a general lack of reliable measurement tools. This study presents the development and reliability assessment of a tool for measuring consumer nutrition environment in urban food stores. Cross-sectional design. A racially diverse downtown portion (6 ZIP code areas) in Albany, New York. A sample of 39 food stores was visited by our research team in 2009 to 2010. These stores were randomly selected from 123 eligible food stores identified through multiple government lists and ground-truthing. The Food Retail Outlet Survey Tool was developed to assess the presence of selected food and nonfood items, placement, milk prices, physical characteristics of the store, policy implementation, and advertisements on outside windows. For in-store items, agreement of observations between experienced and lightly trained surveyors was assessed. For window advertisement assessments, inter-method agreement (on-site sketch vs digital photo), and inter-rater agreement (both on-site) among lightly trained surveyors were evaluated. Percent agreement, Kappa, and prevalence-adjusted bias-adjusted kappa were calculated for in-store observations. Interclass correlation coefficients were calculated for window observations. Twenty-seven of the 47 in-store items had 100% agreement. The prevalence-adjusted bias-adjusted kappa indicated excellent agreement (≥0.90) on all items, except aisle width (0.74) and dark-green/orange colored fresh vegetables (0.85). The store type (nonconvenience store), the order of visits (first half), and the time to complete survey (>10 minutes) were associated with lower reliability in these 2 items. Both the inter-method and inter-rater agreements for window advertisements were uniformly high (intraclass correlation coefficient ranged 0.94-1.00), indicating high reliability. The Food Retail Outlet Survey Tool is a reliable tool for quickly measuring consumer nutrition environment. It can be effectively used by an individual who attended a 30-minute group briefing and practiced with 3 to 4 stores.

ASSOCIATIONS BETWEEN THREE CLINICAL ASSESSMENT TOOLS FOR POSTURAL STABILITY

PubMed Central

Saxion, Casie E.; Cameron, Kenneth L.; Gerber, J. Parry

2010-01-01

Study Design: Clinical Measurement, Correlation, Reliability Objectives: To assess the relationship between the Single Leg Balance (SLB), modified Balance Error Scoring System (mBESS), and modified Star Excursion Balance (mSEBT) tests and secondarily to assess inter-rater and test-retest reliability of these tests. Background: Ankle sprains often result in chronic instability and dysfunction. Several clinical tests assess postural deficits as a potential cause of this dysfunction; however, limited information exists pertaining to the relationship that these tests have with one another. Methods: Two independent examiners measured the performance of 34 healthy participants completing the SLB Test, mBESS test, and mSEBT at two different time periods. The relationship between tests was assessed using the Pearson Correlation and Fisher's Exact Tests. Inter-rater and test-retest reliability were assessed using the intraclass correlation coefficient (ICC) and Kappa statistics. Results: A significant correlation (r = -0.35) was observed between the mSEBT and the mBESS. Fisher's Exact Test showed a significant association between the SLB Test and mBESS (P = .048), but no association between the SLB and mSEBT (P = 1.000). Inter-rater reliability was excellent for the mSEBT and fair for the mBESS (ICCs of .91 and .61 respectively). Excellent agreement was observed between raters for the SLB test (k = 1.00). Test-retest reliability was excellent for the mSEBT (ICC = 0.98) and fair for the mBESS (ICC = 0.74). There was poor test-retest agreement for the SLB test (k = .211). Conclusion: There was a significant relationship observed between the SLB Test, mBESS test, and mSEBT: however; strength of association measures showed limited overlap between these tests. This suggests that these tests are interrelated but may not assess equal components of postural stability. PMID:21589668
Measuring standing balance in multiple sclerosis: Further progress towards an automatic and reliable method in clinical practice.

PubMed

Keune, Philipp M; Young, William R; Paraskevopoulos, Ioannis T; Hansen, Sascha; Muenssinger, Jana; Oschmann, Patrick; Müller, Roy

2017-08-15

Balance deficits in multiple sclerosis (MS) are often monitored by means of observer-rated tests. These may provide reliable data, but may also be time-consuming, subject to inter-rater variability, and potentially insensitive to mild fluctuations throughout the clinical course. On the other hand, laboratory assessments are often not available. The Nintendo Wii Balance Board (WBB) may represent a low-cost solution. The purpose of the current study was to examine the methodological quality of WBB data in MS (internal consistency, test-retest reliability), convergent validity with observer-rated tests (Berg Balance Scale, BBS; Timed-Up and Go Test, TUG), and discriminative validity concerning clinical status (Expanded Disability Status Scale, EDSS). Standing balance was assessed with the WBB for 4min in 63 MS patients at two assessment points, four months apart. Additionally, patients were examined with the BBS, TUG and the EDSS. A period of 4min on the WBB provided data characterized by excellent internal consistency and test-retest reliability. Significant correlations between WBB data and results of the BBS and TUG were obtained after merely 2min on the board. An EDSS median-split revealed that higher EDSS values (>3) were associated with significantly increased postural sway on the WBB. WBB measures reflecting postural sway are methodologically robust in MS, involving excellent internal consistency and test-retest reliability. They are also characterized by convergent validity with other considerably lengthier observer-rated balance measures (BBS) and sensitive to broader clinical characteristics (EDSS). The WBB may hence represent an effective, easy-to-use monitoring tool for MS patients in clinical practice. Copyright © 2017 Elsevier B.V. All rights reserved.
Examiner Training and Reliability in Two Randomized Clinical Trials of Adult Dental Caries

PubMed Central

Banting, David W.; Amaechi, Bennett T.; Bader, James D.; Blanchard, Peter; Gilbert, Gregg H.; Gullion, Christina M.; Holland, Jan Carlton; Makhija, Sonia K.; Papas, Athena; Ritter, André V.; Singh, Mabi L.; Vollmer, William M.

2013-01-01

Objectives This report describes the training of dental examiners participating in two dental caries clinical trials and reports the inter- and intra- examiner reliability scores from the initial standardization sessions. Methods Study examiners were trained to use a modified ICDAS-II system to detect the visual signs of non-cavitated and cavitated dental caries in adult subjects. Dental caries was classified as no caries (S), non-cavitated caries (D1), enamel caries (D2) and dentine caries (D3). Three standardization sessions involving 60 subjects and 3604 tooth surface calls were used to calculate several measures of examiner reliability. Results The prevalence of dental caries observed in the standardization sessions ranged from 1.4% to 13.5% of the coronal tooth surfaces examined. Overall agreement between pairs of examiners ranged from 0.88 to 0.99. An intra-class coefficient threshold of 0.60 was surpassed for all but one examiner. Inter-examiner unweighted kappa values were low (0.23– 0.35) but weighted kappas and the ratio of observed to maximum kappas were more encouraging (0.42– 0.83). The highest kappa values occurred for the S/D1 vs. D2/D3 two-level classification of dental caries, for which seven of the eight examiners achieved observed to maximum kappa values over 0.90.Intra-examiner reliability was notably higher than inter-examiner reliability for all measures and dental caries classification systems employed. Conclusion The methods and results for the initial examiner training and standardization sessions for two large clinical trials are reported. Recommendations for others planning examiner training and standardization sessions are offered. PMID:22320292
Intra- and interrater reliability of the Chicago Classification of achalasia subtypes in pediatric high-resolution esophageal manometry (HRM) recordings.

PubMed

Singendonk, M M J; Rosen, R; Oors, J; Rommel, N; van Wijk, M P; Benninga, M A; Nurko, S; Omari, T I

2017-11-01

Subtyping achalasia by high-resolution manometry (HRM) is clinically relevant as response to therapy and prognosis have shown to vary accordingly. The aim of this study was to assess inter- and intrarater reliability of diagnosing achalasia and achalasia subtyping in children using the Chicago Classification (CC) V3.0. Six observers analyzed 40 pediatric HRM recordings (22 achalasia and 18 non-achalasia) twice by using dedicated analysis software (ManoView 3.0, Given Imaging, Los Angeles, CA, USA). Integrated relaxation pressure (IRP4s), distal contractile integral (DCI), intrabolus pressurization pattern (IBP), and distal latency (DL) were extracted and analyzed hierarchically. Cohen's κ (2 raters) and Fleiss' κ (>2 raters) and the intraclass correlation coefficient (ICC) were used for categorical and ordinal data, respectively. Based on the results of dedicated analysis software only, intra- and interrater reliability was excellent and moderate (κ=0.89 and κ=0.52, respectively) for differentiating achalasia from non-achalasia. For subtyping achalasia, reliability decreased to substantial and fair (κ=0.72 and κ=0.28, respectively). When observers were allowed to change the software-driven diagnosis according to their own interpretation of the manometric patterns, intra- and interrater reliability increased for diagnosing achalasia (κ=0.98 and κ=0.92, respectively) and for subtyping achalasia (κ=0.79 and κ=0.58, respectively). Intra- and interrater agreement for diagnosing achalasia when using HRM and the CC was very good to excellent when results of automated analysis software were interpreted by experienced observers. More variability was seen when relying solely on the software-driven diagnosis and for subtyping achalasia. Therefore, diagnosing and subtyping achalasia should be performed in pediatric motility centers with significant expertise. © 2017 John Wiley & Sons Ltd.
Examiner training and reliability in two randomized clinical trials of adult dental caries.

PubMed

Banting, David W; Amaechi, Bennett T; Bader, James D; Blanchard, Peter; Gilbert, Gregg H; Gullion, Christina M; Holland, Jan Carlton; Makhija, Sonia K; Papas, Athena; Ritter, André V; Singh, Mabi L; Vollmer, William M

2011-01-01

This report describes the training of dental examiners participating in two dental caries clinical trials and reports the inter- and intra-examiner reliability scores from the initial standardization sessions. Study examiners were trained to use a modified International Caries Detection and Assessment System II system to detect the visual signs of non-cavitated and cavitated dental caries in adult subjects. Dental caries was classified as no caries (S), non-cavitated caries (D1), enamel caries (D2), and dentine caries (D3). Three standardization sessions involving 60 subjects and 3,604 tooth surface calls were used to calculate several measures of examiner reliability. The prevalence of dental caries observed in the standardization sessions ranged from 1.4 percent to 13.5 percent of the coronal tooth surfaces examined. Overall agreement between pairs of examiners ranged from 0.88 to 0.99. An intra-class coefficient threshold of 0.60 was surpassed for all but one examiner. Inter-examiner unweighted kappa values were low (0.23-0.35), but weighted kappas and the ratio of observed to maximum kappas were more encouraging (0.42-0.83). The highest kappa values occurred for the S/D1 versus D2/D3 two-level classification of dental caries, for which seven of the eight examiners achieved observed to maximum kappa values over 0.90. Intra-examiner reliability was notably higher than inter-examiner reliability for all measures and dental caries classifications employed. The methods and results for the initial examiner training and standardization sessions for two large clinical trials are reported. Recommendations for others planning examiner training and standardization sessions are offered. © 2011 American Association of Public Health Dentistry.
The reliability of cause-of-death coding in The Netherlands.

PubMed

Harteloh, Peter; de Bruin, Kim; Kardaun, Jan

2010-08-01

Cause-of-death statistics are a major source of information for epidemiological research or policy decisions. Information on the reliability of these statistics is important for interpreting trends in time or differences between populations. Variations in coding the underlying cause of death could hinder the attribution of observed differences to determinants of health. Therefore we studied the reliability of cause-of-death statistics in The Netherlands. We performed a double coding study. Death certificates from the month of May 2005 were coded again in 2007. Each death certificate was coded manually by four coders. Reliability was measured by calculating agreement between coders (intercoder agreement) and by calculating the consistency of each individual coder in time (intracoder agreement). Our analysis covered an amount of 10,833 death certificates. The intercoder agreement of four coders on the underlying cause of death was 78%. In 2.2% of the cases coders agreed on a change of the code assigned in 2005. The (mean) intracoder agreement of four coders was 89%. Agreement was associated with the specificity of the ICD-10 code (chapter, three digits, four digits), the age of the deceased, the number of coders and the number of diseases reported on the death certificate. The reliability of cause-of-death statistics turned out to be high (>90%) for major causes of death such as cancers and acute myocardial infarction. For chronic diseases, such as diabetes and renal insufficiency, reliability was low (<70%). The reliability of cause-of-death statistics varies by ICD-10 code/chapter. A statistical office should provide coders with (additional) rules for coding diseases with a low reliability and evaluate these rules regularly. Users of cause-of-death statistics should exercise caution when interpreting causes of death with a low reliability. Studies of reliability should take into account the number of coders involved and the number of codes on a death certificate.
A Fresh Pair of Eyes: A Blind Observation Method for Evaluating Social Skills of Children with ASD in a Naturalistic Peer Situation in School

ERIC Educational Resources Information Center

Dekker, Vera; Nauta, Maaike H.; Mulder, Erik J.; Sytema, Sjoerd; de Bildt, Annelies

2016-01-01

The Social skills Observation Measure (SOM) is a direct observation method for social skills used in naturalistic everyday situations in school. This study describes the development of the SOM and investigates its psychometric properties in 86 children with Autism spectrum disorder, aged 9.8-13.1 years. The interrater reliability was found to be…
The Reliability and Validity of the Persian Version of Three-Factor Eating Questionnaire-R18 (TFEQ-R18) in Overweight and Obese Females

PubMed Central

Mostafavi, Seyed-Ali; Akhondzadeh, Shahin; Mohammadi, Mohammad Reza; Eshraghian, Mohammad Reza; Hosseini, Saeed; Chamari, Maryam; Keshavarz, Seyed Ali

2017-01-01

Objective : The Three-Factor Eating Questionnaire Reduced (TFEQ-R18) is one of the most widely used instruments for assessing eating behavior worldwide. The present study aimed at confirming the reliability and validity of the Persian version of TFEQ-R18 among overweight and obese females in Iran. Method: In the present study, 168 overweight and obese females consented to participate. We estimated the anthropometric indices and asked the participants to complete the TFEQ-R18. Beck Depression Inventory (BDI), Spielberger Anxiety Scale, Appetite Visual Analogue Rating Scale, Food Craving Questionnaire (FCQ), Compulsive Eating Scale (CES), and Restraint Eating Visual Analogue Rating Scale were performed simultaneously to assess concurrent validity. Two weeks later, TFEQ-R18 was repeated for 126 participants to assess test-retest reliability. Moreover, we reported the internal consistency and factor analysis of this questionnaire. Results: Using the results of the reliability analysis and exploratory factor analysis of the principal component by varimax rotation, we extracted 3 factors: hunger, cognitive restraint, and emotional eating. After removing the Items 16 and 18, the Cronbach’s alpha was increased to 0.73 (The Cronbach’s alpha of the factors was 0.84, 0.64, and 0.7, respectively). The results of the Pearson correlation revealed a consistency of 0.87 between the test and retest administrations (p = 0.001). Significant positive correlations were observed between TFEQ-R18 and BDI, Spielberger Anxiety Scale, FCQ, CES, appetite, body weight, fat percentage, and calorie intake. Moreover, a negative correlation was observed in Restraint Eating Visual Analogue Rating Scale and muscle percentage. Conclusion: This study aimed at presenting preliminary support for the reliability and validity of the Persian version of TFEQ-R18 and its psychometric characteristics. This instrument may be helpful in clinical practice and research studies of obesity, appetite, and eating behavior. PMID:28659982
Validation of an automatic system (DoubleCage) for detecting the location of animals during preference tests.

PubMed

Tsai, P P; Nagelschmidt, N; Kirchner, J; Stelzer, H D; Hackbarth, H

2012-01-01

Preference tests have often been performed for collecting information about animals' acceptance of environmental refinement objects. In numerous published studies animals were individually tested during preference experiments, as it is difficult to observe group-housed animals with an automatic system. Thus, videotaping is still the most favoured method for observing preferences of socially-housed animals. To reduce the observation workload and to be able to carry out preference testing of socially-housed animals, an automatic recording system (DoubleCage) was developed for determining the location of group-housed animals in a preference test set-up. This system is able to distinguish the transition of individual animals between two cages and to record up to 16 animals at the same time (four animals per cage). The present study evaluated the reliability of the DoubleCage system. The data recorded by the DoubleCage program and the data obtained by human observation were compared. The measurements of the DoubleCage system and manual observation of the videotapes are comparable and significantly correlated (P < 0.0001) with good agreement. Using the DoubleCage system enables precise and reliable recording of the preferences of group-housed animals and a considerable reduction of animal observation time.
Observational Research Rigor Alone Does Not Justify Causal Inference

PubMed Central

Ejima, Keisuke; Li, Peng; Smith, Daniel L.; Nagy, Tim R.; Kadish, Inga; van Groen, Thomas; Dawson, John A.; Yang, Yongbin; Patki, Amit; Allison, David B.

2016-01-01

Background Differing opinions exist on whether associations obtained in observational studies can be reliable indicators of a causal effect if the observational study is sufficiently well controlled and executed. Materials and methods To test this, we conducted two animal observational studies that were rigorously controlled and executed beyond what is achieved in studies of humans. In study 1, we randomized 332 genetically identical C57BL/6J mice into three diet groups with differing food energy allotments and recorded individual self-selected daily energy intake and lifespan. In study 2, 60 male mice (CD1) were paired and divided into two groups for a 2-week feeding regimen. We evaluated the association between weight gain and food consumption. Within each pair, one animal was randomly assigned to an S group in which the animals had free access to food. The second paired animal (R group) was provided exactly the same diet that their S partner ate the day before. Results In study 1, across all three groups, we found a significant negative effect of energy intake on lifespan. However, we found a positive association between food intake and lifespan among the ad libitum feeding group: 29.99 (95% CI: 8.2 to 51.7) days per daily kcal. In study 2, we found a significant (P=0.003) group (randomized vs self-selected)-by-food consumption interaction effect on weight gain. Conclusions At least in nutrition research, associations derived from observational studies may not be reliable indicators of causal effects, even with the most rigorous study designs achievable. PMID:27711975
Reliability of drivers in urban intersections.

PubMed

Gstalter, Herbert; Fastenmeier, Wolfgang

2010-01-01

The concept of human reliability has been widely used in industrial settings by human factors experts to optimise the person-task fit. Reliability is estimated by the probability that a task will successfully be completed by personnel in a given stage of system operation. Human Reliability Analysis (HRA) is a technique used to calculate human error probabilities as the ratio of errors committed to the number of opportunities for that error. To transfer this notion to the measurement of car driver reliability the following components are necessary: a taxonomy of driving tasks, a definition of correct behaviour in each of these tasks, a list of errors as deviations from the correct actions and an adequate observation method to register errors and opportunities for these errors. Use of the SAFE-task analysis procedure recently made it possible to derive driver errors directly from the normative analysis of behavioural requirements. Driver reliability estimates could be used to compare groups of tasks (e.g. different types of intersections with their respective regulations) as well as groups of drivers' or individual drivers' aptitudes. This approach was tested in a field study with 62 drivers of different age groups. The subjects drove an instrumented car and had to complete an urban test route, the main features of which were 18 intersections representing six different driving tasks. The subjects were accompanied by two trained observers who recorded driver errors using standardized observation sheets. Results indicate that error indices often vary between both the age group of drivers and the type of driving task. The highest error indices occurred in the non-signalised intersection tasks and the roundabout, which exactly equals the corresponding ratings of task complexity from the SAFE analysis. A comparison of age groups clearly shows the disadvantage of older drivers, whose error indices in nearly all tasks are significantly higher than those of the other groups. The vast majority of these errors could be explained by high task load in the intersections, as they represent difficult tasks. The discussion shows how reliability estimates can be used in a constructive way to propose changes in car design, intersection layout and regulation as well as driver training.
Stability of Child Behavioral Style in the First 30 Months of Life: Single Timepoint and Aggregated Measures

ERIC Educational Resources Information Center

Parade, Stephanie H.; Dickstein, Susan; Schiller, Masha; Hayden, Lisa; Seifer, Ronald

2015-01-01

The current study examined the stability of temperament over time. Observers and mothers rated child behavior at eight timepoints across three assessment waves (8, 15, and 30 months of age). Internal consistency reliability of aggregates of the eight observer reports and eight mother reports were high. When considering single timepoint…
Video Analysis of Mother-Child Interactions: Does the Role of Experience Affect the Accuracy and Reliability of Clinical Observations?

ERIC Educational Resources Information Center

Choo, Dawn; Dettman, Shani J.

2016-01-01

During the pre- and post-implant habilitation process, mothers of children using cochlear implants may be coached by clinicians to use appropriate communicative strategies during play according to the family's choice of communication approach. The present study compared observations made by experienced and inexperienced individuals in the analysis…
A Validation of the Classroom Assessment Scoring System in Finnish Kindergartens

ERIC Educational Resources Information Center

Pakarinen, Eija; Lerkkanen, Marja-Kristiina; Poikkeus, Anna-Maija; Kiuru, Noona; Siekkinen, Martti; Rasku-Puttonen, Helena; Nurmi, Jari-Erik

2010-01-01

Research Findings: This study examined the validity and reliability of the Classroom Assessment Scoring System (CLASS; R. C. Pianta, K. M. La Paro, & B. K. Hamre, 2008) in Finnish kindergartens. A pair of trained observers used the CLASS to observe 49 kindergarten teachers (47 female, 2 male) on two different days. Questionnaires measuring…
Mobile Functional Reach Test in People Who Suffer Stroke: A Pilot Study.

PubMed

Merchán-Baeza, Jose Antonio; González-Sánchez, Manuel; Cuesta-Vargas, Antonio

2015-06-11

Postural instability is one of the major complications found in people who survive a stroke. Parameterizing the Functional Reach Test (FRT) could be useful in clinical practice and basic research, as this test is a clinically accepted tool (for its simplicity, reliability, economy, and portability) to measure the semistatic balance of a subject. The aim of this study is to analyze the reliability in the FRT parameterization using inertial sensor within mobile phones (mobile sensors) for recording kinematic variables in patients who have suffered a stroke. Our hypothesis is that the sensors in mobile phones will be reliable instruments for kinematic study of the FRT. This is a cross-sectional study of 7 subjects over 65 years of age who suffered a stroke. During the execution of FRT, the subjects carried two mobile phones: one placed in the lumbar region and the other one on the trunk. After analyzing the data obtained in the kinematic registration by the mobile sensors, a number of direct and indirect variables were obtained. The variables extracted directly from FRT through the mobile sensors were distance, maximum angular lumbosacral/thoracic displacement, time for maximum angular lumbosacral/thoracic displacement, time of return to the initial position, and total time. Using these data, we calculated speed and acceleration of each. A descriptive analysis of all kinematic outcomes recorded by the two mobile sensors (trunk and lumbar) was developed and the average range achieved in the FRT. Reliability measures were calculated by analyzing the internal consistency of the measures with 95% confidence interval of each outcome variable. We calculated the reliability of mobile sensors in the measurement of the kinematic variables during the execution of the FRT. The values in the FRT obtained in this study (2.49 cm, SD 13.15) are similar to those found in other studies with this population and with the same age range. Intrasubject reliability values observed in the use of mobile phones are all located above 0.831, ranging from 0.831 (time B_C trunk area) and 0.894 (displacement A_B trunk area). Likewise, the observed intersubject values range from 0.835 (time B_C trunk area) and 0.882 (displacement A_C trunk area). On the other hand, the reliability of the FRT was 0.989 (0.981-0.996) and 0.978 (0.970-0.985), intrasubject and intersubject respectively. We found that mobile sensors in mobile phones could be reliable tools in the parameterization of the Functional Reach Test in people who have had a stroke. ©Jose Antonio Merchán-Baeza, Manuel González-Sánchez, Antonio Cuesta-Vargas. Originally published in JMIR Rehabilitation and Assistive Technology (http://rehab.jmir.org), 11.06.2015.
Indices of Paraspinal Muscles Degeneration: Reliability and Association With Facet Joint Osteoarthritis: Feasibility Study.

PubMed

Kalichman, Leonid; Klindukhov, Alexander; Li, Ling; Linov, Lina

2016-11-01

A reliability and cross-sectional observational study. To introduce a scoring system for visible fat infiltration in paraspinal muscles; to evaluate intertester and intratester reliability of this system and its relationship with indices of muscle density; to evaluate the association between indices of paraspinal muscle degeneration and facet joint osteoarthritis. Current evidence suggests that the paraspinal muscles degeneration is associated with low back pain, facet joint osteoarthritis, spondylolisthesis, and degenerative disc disease. However, the evaluation of paraspinal muscles on computed tomography is not radiological routine, probably because of absence of simple and reliable indices of paraspinal degeneration. One hundred fifty consecutive computed tomography scans of the lower back (N=75) or abdomen (N=75) were evaluated. Mean radiographic density (in Hounsfield units) and SD of the density of multifidus and erector spinae were evaluated at the L4-L5 spinal level. A new index of muscle degeneration, radiographic density ratio=muscle density/SD of density, was calculated. To evaluate the visible fat infiltration in paraspinal muscles, we proposed a 3-graded scoring system. The prevalence of facet joint osteoarthritis was also evaluated. Intraclass correlation and κ statistics were used to evaluate inter-rater and intra-rater reliability. Logistic regression examined the association between paraspinal muscle indices and facet joint osteoarthritis. Intra-rater reliability for fat infiltration score (κ) ranged between 0.87 and 0.92; inter-rater reliability between 0.70 and 0.81. Intra-rater reliability (intraclass correlation) for mean density of paraspinal muscles ranged between 0.96 and 0.99, inter-rater reliability between 0.95 and 0.99; SD intra-rater reliability ranged between 0.82 and 0.91, inter-rater reliability between 0.80 and 0.89. Significant associations (P<0.01) were found between facet joint osteoarthritis, fat infiltration score, and radiographic density ratio. Two suggested indices of paraspinal muscle degeneration showed excellent reliability and were significantly associated with facet joint osteoarthritis. Additional studies are needed to evaluate the associations with other spinal degeneration features and low back pain.
A Comparison of Three Methods for the Analysis of Skin Flap Viability: Reliability and Validity.

PubMed

Tim, Carla Roberta; Martignago, Cintia Cristina Santi; da Silva, Viviane Ribeiro; Dos Santos, Estefany Camila Bonfim; Vieira, Fabiana Nascimento; Parizotto, Nivaldo Antonio; Liebano, Richard Eloin

2018-05-01

Objective: Technological advances have provided new alternatives to the analysis of skin flap viability in animal models; however, the interrater validity and reliability of these techniques have yet to be analyzed. The present study aimed to evaluate the interrater validity and reliability of three different methods: weight of paper template (WPT), paper template area (PTA), and photographic analysis. Approach: Sixteen male Wistar rats had their cranially based dorsal skin flap elevated. On the seventh postoperative day, the viable tissue area and the necrotic area of the skin flap were recorded using the paper template method and photo image. The evaluation of the percentage of viable tissue was performed using three methods, simultaneously and independently by two raters. The analysis of interrater reliability and viability was performed using the intraclass correlation coefficient and Bland Altman Plot Analysis was used to visualize the presence or absence of systematic bias in the evaluations of data validity. Results: The results showed that interrater reliability for WPT, measurement of PTA, and photographic analysis were 0.995, 0.990, and 0.982, respectively. For data validity, a correlation >0.90 was observed for all comparisons made between the three methods. In addition, Bland Altman Plot Analysis showed agreement between the comparisons of the methods and the presence of systematic bias was not observed. Innovation: Digital methods are an excellent choice for assessing skin flap viability; moreover, they make data use and storage easier. Conclusion: Independently from the method used, the interrater reliability and validity proved to be excellent for the analysis of skin flaps' viability.
Inter- and intratester reliability values of ultrasound imaging measurements of diaphragm movement in the thoracic and thoracolumbar curves in adolescent idiopathic scoliosis.

PubMed

Noh, Dong Koog; Koh, Jae-Hyun; You, Joshua Sung-H

2016-01-01

The purpose of this study was to determine intertester and intratester reliability of ultrasound measurements of bilateral diaphragm excursions in the thoracic and thoracolumbar spinal curves of 31 females with adolescent idiopathic scoliosis (AIS) (mean age = 14.1 ± 1.8 years). Subjects were tested during tidal breathing using real-time ultrasound imaging with a 3.5 MHz curvilinear transducer. There were no significant differences in intratester and intertester reliability values in bilateral diaphragmatic excursions measured at the thoracolumbar spinal curve, whereas significant differences were observed in measurements taken at the thoracic spinal curve (p < 0.05). Overall, the intertester and intratester reliabilities of the thoracic and thoracolumbar curves in AIS ranged from 0.764 to 0.998. These findings suggest that ultrasound imaging is highly reliable between and within testers and is useful to precisely discriminate pathological diaphragm movement in idiopathic thoracic scoliosis and idiopathic thoracolumbar scoliosis.
Towards a new protocol of scoliosis assessments and monitoring in clinical practice: A pilot study.

PubMed

Lukovic, Tanja; Cukovic, Sasa; Lukovic, Vanja; Devedzic, Goran; Djordjevic, Dusica

2015-01-01

Although intensively investigated, the procedures for assessment and monitoring of scoliosis are still a subject of controversies. The aim of this study was to assess validity and reliability of a number of physiotherapeutic measurements that could be used for clinical monitoring of scoliosis. Fifteen healthy (symmetric) subjects were subjected to a set of measurements two times, by two experienced and two inexperienced physiotherapists. Intra-observer and inter-observer reliability of measurements were determined. Following measurements were performed: body height and weight, chest girth in inspirium and expirium, the length of legs, the spine translation, the lateral pelvic tilt, the equality of the shoulders, position of scapulas, the equality of stature triangles, the rib hump, the existence of m. iliopsoas contracture, Fröhner index, the size of lumbar lordosis and the angle of trunk rotation. Intraclass correlation coefficient was high (> 0.8) for majority of measurements when experienced physiotherapists performed them, while inexperienced physiotherapists performed precisely only basic, easy measurements. We showed in this pilot study on healthy subjects, that majority of basic physiotherapeutic measurements are valid and reliable when performed by specialized physiotherapist, and it can be expected that this protocol will gain high value when measurements on subjects with scoliosis are performed.
Haptic fMRI: Reliability and performance of electromagnetic haptic interfaces for motion and force neuroimaging experiments.

PubMed

Menon, Samir; Zhu, Jack; Goyal, Deeksha; Khatib, Oussama

2017-07-01

Haptic interfaces compatible with functional magnetic resonance imaging (Haptic fMRI) promise to enable rich motor neuroscience experiments that study how humans perform complex manipulation tasks. Here, we present a large-scale study (176 scans runs, 33 scan sessions) that characterizes the reliability and performance of one such electromagnetically actuated device, Haptic fMRI Interface 3 (HFI-3). We outline engineering advances that ensured HFI-3 did not interfere with fMRI measurements. Observed fMRI temporal noise levels with HFI-3 operating were at the fMRI baseline (0.8% noise to signal). We also present results from HFI-3 experiments demonstrating that high resolution fMRI can be used to study spatio-temporal patterns of fMRI blood oxygenation dependent (BOLD) activation. These experiments include motor planning, goal-directed reaching, and visually-guided force control. Observed fMRI responses are consistent with existing literature, which supports Haptic fMRI's effectiveness at studying the brain's motor regions.

Three-column classification and Schatzker classification: a three- and two-dimensional computed tomography characterisation and analysis of tibial plateau fractures.

PubMed

Patange Subba Rao, Sheethal Prasad; Lewis, James; Haddad, Ziad; Paringe, Vishal; Mohanty, Khitish

2014-10-01

The aim of the study was to evaluate inter-observer reliability and intra-observer reproducibility between the three-column classification and Schatzker classification systems using 2D and 3D CT models. Fifty-two consecutive patients with tibial plateau fractures were evaluated by five orthopaedic surgeons. All patients were classified into Schatzker and three-column classification systems using x-rays and 2D and 3D CT images. The inter-observer reliability was evaluated in the first round and the intra-observer reliability was determined during the second round 2 weeks later. The average intra-observer reproducibility for the three-column classification was from substantial to excellent in all sub classifications, as compared with Schatzker classification. The inter-observer kappa values increased from substantial to excellent in three-column classification and to moderate in Schatzker classification The average values for three-column classification for all the categories are as follows: (I-III) k2D = 0.718, 95% CI 0.554-0.864, p < 0.0001 and average 3D = 0.874, 95% CI 0.754-0.890, p < 0.0001. For Schatzker classification system, the average values for all six categories are as follows: (I-VI) k2D = 0.536, 95% CI 0.365-0.685, p < 0.0001 and average k3D = 0.552 95% CI 0.405-0.700, p < 0.0001. The values are statistically significant. Statistically significant inter-observer values in both rounds were noted with the three-column classification, making it statistically an excellent agreement. The intra-observer reproducibility for the three-column classification improved as compared with the Schatzker classification. The three-column classification seems to be an effective way to characterise and classify fractures of tibial plateau.
COMFORT scale: a reliable and valid method to measure the amount of stress of ventilated preterm infants.

PubMed

Wielenga, J M; De Vos, R; de Leeuw, R; De Haan, R J

2004-01-01

Assessment of clinimetric properties and diagnostic quality of a stress measurement scale (COMFORT scale). Sample of an open population. Neonatology department (Neonatal Intensive Care Unit), Academic Medical Centre/Emma Children's Hospital, Amsterdam, The Netherlands. One clinical expert and 9 observers observed ventilated premature born babies simultaneously. Criterion validity was assessed by correlating the COMFORT scale with the clinical judgment regarding the amount of stress. Interobserver reliability was assessed on the clinical judgment as well as on the COMFORT scale. Diagnostic qualities were evaluated with a ROC curve. On 19 ventilated prematurely born babies (mean gestational age 30 weeks, mean birth weight 1385 gm), one clinical expert and 9 observers made 30 paired observations. The criterion validity of the COMFORT scale was good (Pearson's r of 0.84). The interobserver reliability of the clinical judgment was very good (weighted Kappa 0.84). The interobserver reliability of each item varied from good to almost perfect (weighted Kappa of 0.64 for muscle tone to 1.00 on heart rate). The reliability of the total COMFORT scale score was satisfying (intra-class correlation coefficient of 0.94). The diagnostic quality of the COMFORT scale was excellent, at a cut-off point of 20 the sensitivity was 100 percent, the specificity was 77 percent, and the area under the curve (AUC) of 0.95. In this first evaluation, the COMFORT scale appears to be a valid and reliable measurement tool to assess the stress of ventilated prematurely born babies.
Development and reliability testing of a food store observation form.

PubMed

Rimkus, Leah; Powell, Lisa M; Zenk, Shannon N; Han, Euna; Ohri-Vachaspati, Punam; Pugach, Oksana; Barker, Dianne C; Resnick, Elissa A; Quinn, Christopher M; Myllyluoma, Jaana; Chaloupka, Frank J

2013-01-01

To develop a reliable food store observational data collection instrument to be used for measuring product availability, pricing, and promotion. Observational data collection. A total of 120 food stores (26 supermarkets, 34 grocery stores, 54 gas/convenience stores, and 6 mass merchandise stores) in the Chicago metropolitan statistical area. Inter-rater reliability for product availability, pricing, and promotion measures on a food store observational data collection instrument. Cohen's kappa coefficient and proportion of overall agreement for dichotomous variables and intra-class correlation coefficient for continuous variables. Inter-rater reliability, as measured by average kappa coefficient, was 0.84 for food and beverage product availability measures, 0.80 for interior store characteristics, and 0.70 for exterior store characteristics. For continuous measures, average intra-class correlation coefficient was 0.82 for product pricing measures; 0.90 for counts of fresh, frozen, and canned fruit and vegetable options; and 0.85 for counts of advertisements on the store exterior and property. The vast majority of measures demonstrated substantial or almost perfect agreement. Although some items may require revision, results suggest that the instrument may be used to reliably measure the food store environment. Copyright © 2013 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
The reliability of Cavalier's principle of stereological method in determining volumes of enchondromas using the computerized tomography tools.

PubMed

Acar, Nihat; Karakasli, Ahmet; Karaarslan, Ahmet; Mas, Nermin Ng; Hapa, Onur

2017-01-01

Volumetric measurements of benign tumors enable surgeons to trace volume changes during follow-up periods. For a volumetric measurement technique to be applicable, it should be easy, rapid, and inexpensive and should carry a high interobserver reliability. We aimed to assess the interobserver reliability of a volumetric measurement technique using the Cavalier's principle of stereological methods. The computerized tomography (CT) of 15 patients with a histopathologically confirmed diagnosis of enchondroma with variant tumor sizes and localizations was retrospectively reviewed for interobserver reliability evaluation of the volumetric stereological measurement with the Cavalier's principle, V = t × [((SU) × d) /SL]2 × Σ P. The volumes of the 15 tumors collected by the observers are demonstrated in Table 1. There was no statistical significance between the first and second observers ( p = 0.000 and intraclass correlation coefficient = 0.970) and between the first and third observers ( p = 0.000 and intraclass correlation coefficient = 0.981). No statistical significance was detected between the second and third observers ( p = 0.000 and intraclass correlation coefficient = 0.976). The Cavalier's principle with the stereological technique using the CT scans is an easy, rapid, and inexpensive technique in volumetric evaluation of enchondromas with a trustable interobserver reliability.
Reconciling Streamflow Uncertainty Estimation and River Bed Morphology Dynamics. Insights from a Probabilistic Assessment of Streamflow Uncertainties Using a Reliability Diagram

NASA Astrophysics Data System (ADS)

Morlot, T.; Mathevet, T.; Perret, C.; Favre Pugin, A. C.

2014-12-01

Streamflow uncertainty estimation has recently received a large attention in the literature. A dynamic rating curve assessment method has been introduced (Morlot et al., 2014). This dynamic method allows to compute a rating curve for each gauging and a continuous streamflow time-series, while calculating streamflow uncertainties. Streamflow uncertainty takes into account many sources of uncertainty (water level, rating curve interpolation and extrapolation, gauging aging, etc.) and produces an estimated distribution of streamflow for each days. In order to caracterise streamflow uncertainty, a probabilistic framework has been applied on a large sample of hydrometric stations of the Division Technique Générale (DTG) of Électricité de France (EDF) hydrometric network (>250 stations) in France. A reliability diagram (Wilks, 1995) has been constructed for some stations, based on the streamflow distribution estimated for a given day and compared to a real streamflow observation estimated via a gauging. To build a reliability diagram, we computed the probability of an observed streamflow (gauging), given the streamflow distribution. Then, the reliability diagram allows to check that the distribution of probabilities of non-exceedance of the gaugings follows a uniform law (i.e., quantiles should be equipropables). Given the shape of the reliability diagram, the probabilistic calibration is caracterised (underdispersion, overdispersion, bias) (Thyer et al., 2009). In this paper, we present case studies where reliability diagrams have different statistical properties for different periods. Compared to our knowledge of river bed morphology dynamic of these hydrometric stations, we show how reliability diagram gives us invaluable information on river bed movements, like a continuous digging or backfilling of the hydraulic control due to erosion or sedimentation processes. Hence, the careful analysis of reliability diagrams allows to reconcile statistics and long-term river bed morphology processes. This knowledge improves our real-time management of hydrometric stations, given a better caracterisation of erosion/sedimentation processes and the stability of hydrometric station hydraulic control.
Applying signal-detection theory to the study of observer accuracy and bias in behavioral assessment.

PubMed

Lerman, Dorothea C; Tetreault, Allison; Hovanetz, Alyson; Bellaci, Emily; Miller, Jonathan; Karp, Hilary; Mahmood, Angela; Strobel, Maggie; Mullen, Shelley; Keyl, Alice; Toupard, Alexis

2010-01-01

We evaluated the feasibility and utility of a laboratory model for examining observer accuracy within the framework of signal-detection theory (SDT). Sixty-one individuals collected data on aggression while viewing videotaped segments of simulated teacher-child interactions. The purpose of Experiment 1 was to determine if brief feedback and contingencies for scoring accurately would bias responding reliably. Experiment 2 focused on one variable (specificity of the operational definition) that we hypothesized might decrease the likelihood of bias. The effects of social consequences and information about expected behavior change were examined in Experiment 3. Results indicated that feedback and contingencies reliably biased responding and that the clarity of the definition only moderately affected this outcome.
Long-term Behavioral Tracking of Freely Swimming Weakly Electric Fish

PubMed Central

Jun, James J.; Longtin, André; Maler, Leonard

2014-01-01

Long-term behavioral tracking can capture and quantify natural animal behaviors, including those occurring infrequently. Behaviors such as exploration and social interactions can be best studied by observing unrestrained, freely behaving animals. Weakly electric fish (WEF) display readily observable exploratory and social behaviors by emitting electric organ discharge (EOD). Here, we describe three effective techniques to synchronously measure the EOD, body position, and posture of a free-swimming WEF for an extended period of time. First, we describe the construction of an experimental tank inside of an isolation chamber designed to block external sources of sensory stimuli such as light, sound, and vibration. The aquarium was partitioned to accommodate four test specimens, and automated gates remotely control the animals' access to the central arena. Second, we describe a precise and reliable real-time EOD timing measurement method from freely swimming WEF. Signal distortions caused by the animal's body movements are corrected by spatial averaging and temporal processing stages. Third, we describe an underwater near-infrared imaging setup to observe unperturbed nocturnal animal behaviors. Infrared light pulses were used to synchronize the timing between the video and the physiological signal over a long recording duration. Our automated tracking software measures the animal's body position and posture reliably in an aquatic scene. In combination, these techniques enable long term observation of spontaneous behavior of freely swimming weakly electric fish in a reliable and precise manner. We believe our method can be similarly applied to the study of other aquatic animals by relating their physiological signals with exploratory or social behaviors. PMID:24637642
Osteochondritis dissecans of the humeral capitellum: reliability of four classification systems using radiographs and computed tomography.

PubMed

Claessen, Femke M A P; van den Ende, Kimberly I M; Doornberg, Job N; Guitton, Thierry G; Eygendaal, Denise; van den Bekerom, Michel P J

2015-10-01

The radiographic appearance of osteochondritis dissecans (OCD) of the humeral capitellum varies according to the stage of the lesion. It is important to evaluate the stage of OCD lesion carefully to guide treatment. We compared the interobserver reliability of currently used classification systems for OCD of the humeral capitellum to identify the most reliable classification system. Thirty-two musculoskeletal radiologists and orthopaedic surgeons specialized in elbow surgery from several countries evaluated anteroposterior and lateral radiographs and corresponding computed tomography (CT) scans of 22 patients to classify the stage of OCD of the humeral capitellum according to the classification systems developed by (1) Minami, (2) Berndt and Harty, (3) Ferkel and Sgaglione, and (4) Anderson on a Web-based study platform including a Digital Imaging and Communications in Medicine viewer. Magnetic resonance imaging was not evaluated as part of this study. We measured agreement among observers using the Siegel and Castellan multirater κ. All OCD classification systems, except for Berndt and Harty, which had poor agreement among observers (κ = 0.20), had fair interobserver agreement: κ was 0.27 for the Minami, 0.23 for Anderson, and 0.22 for Ferkel and Sgaglione classifications. The Minami Classification was significantly more reliable than the other classifications (P < .001). The Minami Classification was the most reliable for classifying different stages of OCD of the humeral capitellum. However, it is unclear whether radiographic evidence of OCD of the humeral capitellum, as categorized by the Minami Classification, guides treatment in clinical practice as a result of this fair agreement. Copyright © 2015 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
Ensemble assimilation of ARGO temperature profile, sea surface temperature and Altimetric satellite data into an eddy permitting primitive equation model of the North Atlantic ocean

NASA Astrophysics Data System (ADS)

Yan, Yajing; Barth, Alexander; Beckers, Jean-Marie; Candille, Guillem; Brankart, Jean-Michel; Brasseur, Pierre

2015-04-01

Sea surface height, sea surface temperature and temperature profiles at depth collected between January and December 2005 are assimilated into a realistic eddy permitting primitive equation model of the North Atlantic Ocean using the Ensemble Kalman Filter. 60 ensemble members are generated by adding realistic noise to the forcing parameters related to the temperature. The ensemble is diagnosed and validated by comparison between the ensemble spread and the model/observation difference, as well as by rank histogram before the assimilation experiments. Incremental analysis update scheme is applied in order to reduce spurious oscillations due to the model state correction. The results of the assimilation are assessed according to both deterministic and probabilistic metrics with observations used in the assimilation experiments and independent observations, which goes further than most previous studies and constitutes one of the original points of this paper. Regarding the deterministic validation, the ensemble means, together with the ensemble spreads are compared to the observations in order to diagnose the ensemble distribution properties in a deterministic way. Regarding the probabilistic validation, the continuous ranked probability score (CRPS) is used to evaluate the ensemble forecast system according to reliability and resolution. The reliability is further decomposed into bias and dispersion by the reduced centred random variable (RCRV) score in order to investigate the reliability properties of the ensemble forecast system. The improvement of the assimilation is demonstrated using these validation metrics. Finally, the deterministic validation and the probabilistic validation are analysed jointly. The consistency and complementarity between both validations are highlighted. High reliable situations, in which the RMS error and the CRPS give the same information, are identified for the first time in this paper.
10 CFR 712.12 - HRP implementation.

Code of Federal Regulations, 2012 CFR

2012-01-01

... DEPARTMENT OF ENERGY HUMAN RELIABILITY PROGRAM Establishment of and Procedures for the Human Reliability...) Report any observed or reported behavior or condition of another HRP-certified individual that could indicate a reliability concern, including those behaviors and conditions listed in § 712.13(c), to a...
A Topology Control Strategy with Reliability Assurance for Satellite Cluster Networks in Earth Observation

PubMed Central

Chen, Qing; Zhang, Jinxiu; Hu, Ze

2017-01-01

This article investigates the dynamic topology control problem of satellite cluster networks (SCNs) in Earth observation (EO) missions by applying a novel metric of stability for inter-satellite links (ISLs). The properties of the periodicity and predictability of satellites’ relative position are involved in the link cost metric which is to give a selection criterion for choosing the most reliable data routing paths. Also, a cooperative work model with reliability is proposed for the situation of emergency EO missions. Based on the link cost metric and the proposed reliability model, a reliability assurance topology control algorithm and its corresponding dynamic topology control (RAT) strategy are established to maximize the stability of data transmission in the SCNs. The SCNs scenario is tested through some numeric simulations of the topology stability of average topology lifetime and average packet loss rate. Simulation results show that the proposed reliable strategy applied in SCNs significantly improves the data transmission performance and prolongs the average topology lifetime. PMID:28241474
A Topology Control Strategy with Reliability Assurance for Satellite Cluster Networks in Earth Observation.

PubMed

Chen, Qing; Zhang, Jinxiu; Hu, Ze

2017-02-23

This article investigates the dynamic topology control problemof satellite cluster networks (SCNs) in Earth observation (EO) missions by applying a novel metric of stability for inter-satellite links (ISLs). The properties of the periodicity and predictability of satellites' relative position are involved in the link cost metric which is to give a selection criterion for choosing the most reliable data routing paths. Also, a cooperative work model with reliability is proposed for the situation of emergency EO missions. Based on the link cost metric and the proposed reliability model, a reliability assurance topology control algorithm and its corresponding dynamic topology control (RAT) strategy are established to maximize the stability of data transmission in the SCNs. The SCNs scenario is tested through some numeric simulations of the topology stability of average topology lifetime and average packet loss rate. Simulation results show that the proposed reliable strategy applied in SCNs significantly improves the data transmission performance and prolongs the average topology lifetime.
How reliable is apparent age at death on cadavers?

PubMed

Amadasi, Alberto; Merusi, Nicolò; Cattaneo, Cristina

2015-07-01

The assessment of age at death for identification purposes is a frequent and tough challenge for forensic pathologists and anthropologists. Too frequently, visual assessment of age is performed on well-preserved corpses, a method considered subjective and full of pitfalls, but whose level of inadequacy no one has yet tested or proven. This study consisted in the visual estimation of the age of 100 cadavers performed by a total of 37 observers among those usually attending the dissection room. Cadavers were of Caucasian ethnicity, well preserved, belonging to individuals who died of natural death. All the evaluations were performed prior to autopsy. Observers assessed the age with ranges of 5 and 10 years, indicating also the body part they mainly observed for each case. Globally, the 5-year range had an accuracy of 35%, increasing to 69% with the 10-year range. The highest accuracy was in the 31-60 age category (74.7% with the 10-year range), and the skin seemed to be the most reliable age parameter (71.5% of accuracy when observed), while the face was considered most frequently, in 92.4% of cases. A simple formula with the general "mean of averages" in the range given by the observers and related standard deviations was then developed; the average values with standard deviations of 4.62 lead to age estimation with ranges of some 20 years that seem to be fairly reliable and suitable, sometimes in alignment with classic anthropological methods, in the age estimation of well-preserved corpses.
The Best of Both Worlds: Building on the COPUS and RTOP Observation Protocols to Easily and Reliably Measure Various Levels of Reformed Instructional Practice

PubMed Central

Lund, Travis J.; Pilarz, Matthew; Velasco, Jonathan B.; Chakraverty, Devasmita; Rosploch, Kaitlyn; Undersander, Molly; Stains, Marilyne

2015-01-01

Researchers, university administrators, and faculty members are increasingly interested in measuring and describing instructional practices provided in science, technology, engineering, and mathematics (STEM) courses at the college level. Specifically, there is keen interest in comparing instructional practices between courses, monitoring changes over time, and mapping observed practices to research-based teaching. While increasingly common observation protocols (Reformed Teaching Observation Protocol [RTOP] and Classroom Observation Protocol in Undergraduate STEM [COPUS]) at the postsecondary level help achieve some of these goals, they also suffer from weaknesses that limit their applicability. In this study, we leverage the strengths of these protocols to provide an easy method that enables the reliable and valid characterization of instructional practices. This method was developed empirically via a cluster analysis using observations of 269 individual class periods, corresponding to 73 different faculty members, 28 different research-intensive institutions, and various STEM disciplines. Ten clusters, called COPUS profiles, emerged from this analysis; they represent the most common types of instructional practices enacted in the classrooms observed for this study. RTOP scores were used to validate the alignment of the 10 COPUS profiles with reformed teaching. Herein, we present a detailed description of the cluster analysis method, the COPUS profiles, and the distribution of the COPUS profiles across various STEM courses at research-intensive universities. PMID:25976654
Translation, adaptation and inter-rater reliability of the administration manual for the Fugl-Meyer assessment.

PubMed

Michaelsen, Stella M; Rocha, André S; Knabben, Rodrigo J; Rodrigues, Luciano P; Fernandes, Claudia G C

2011-01-01

Recently, the reliability of the Brazilian version of the Fugl-Meyer Assessment (FMA) was assessed through the scoring given according to observations made by a single evaluator who applied the test. When different raters apply the scale, the reliability may depend on the interpretation given to the assessment sheet. In such cases, a clear administration manual is essential for ensuring homogeneity of application. To translate and adapt the French Canadian version of the FMA administration manual into Brazilian Portuguese and to evaluate the inter-rater reliability when different evaluators apply the FMA on the basis of the information contained in the manual. Eighteen adults (59±10 years) with chronic hemiparesis (38±35 months after a stroke) took part in this study. Eight patients participated in the first part of the study and 10 in the second part. Based on analyzing the results from part 1, an adapted version was developed, in which information and photos were added to illustrate the positions of the patient and evaluator. The inter-rater reliability was assessed using the intraclass correlation coefficient (ICC). The reliability of the FMA based on the adapted version of the manual was excellent for the total motor scores for the upper limbs (ICC=0.98) and lower limbs (ICC=0.90), as well as for movement sense (ICC=0.98) and upper and lower-limb passive range of motion (ICC=0.84 and 0.90, respectively). The reliability was moderate for tactile sensitivity (0.75). The joint pain assessment presented low reliability. The results showed that, except for pain assessment, application of the FMA based on the adapted version of the application manual for Brazilian Portuguese presented adequate inter-rater reliability.
Reliability and criterion validity of two applications of the iPhone™ to measure cervical range of motion in healthy participants

PubMed Central

2013-01-01

Summary of background data Recent smartphones, such as the iPhone, are often equipped with an accelerometer and magnetometer, which, through software applications, can perform various inclinometric functions. Although these applications are intended for recreational use, they have the potential to measure and quantify range of motion. The purpose of this study was to estimate the intra and inter-rater reliability as well as the criterion validity of the clinometer and compass applications of the iPhone in the assessment cervical range of motion in healthy participants. Methods The sample consisted of 28 healthy participants. Two examiners measured cervical range of motion of each participant twice using the iPhone (for the estimation of intra and inter-reliability) and once with the CROM (for the estimation of criterion validity). Estimates of reliability and validity were then established using the intraclass correlation coefficient (ICC). Results We observed a moderate intra-rater reliability for each movement (ICC = 0.65-0.85) but a poor inter-rater reliability (ICC < 0.60). For the criterion validity, the ICCs are moderate (>0.50) to good (>0.65) for movements of flexion, extension, lateral flexions and right rotation, but poor (<0.50) for the movement left rotation. Conclusion We found good intra-rater reliability and lower inter-rater reliability. When compared to the gold standard, these applications showed moderate to good validity. However, before using the iPhone as an outcome measure in clinical settings, studies should be done on patients presenting with cervical problems. PMID:23829201
A Preliminary Analysis of the Reliability and Validity of the Leader Observation System.

DTIC Science & Technology

1982-08-01

financial instituition , a state agency, a medium sized manufacturing plant, a campus police department, and the Navy and Army R.O.T.C. units of a...specifics of the study. The outside observers (N=8) used in the study were graduate students in management . Three were assigned to the financial ... managing Interpersonal conflict etc. between subordinates or others d. routine financial reporting nnd b. appealing to higher authority to bookkeeping
Reliability and factorial validity of flexibility tests for team sports.

PubMed

Sporis, Goran; Vucetic, Vlatko; Jovanovic, Mario; Jukic, Igor; Omrcen, Darija

2011-04-01

The main goal of this method paper was to evaluate the reliability and factorial validity of flexibility tests used in soccer, and to do crossvalidation study on 2 other team sports using handball and basketball players. The second aim was to compare the validity of the different tests and evaluate the flexibility of soccer players; the third was to determine the positional differences between attackers, defenders, and midfielders in all flexibility tests. One hundred and fifty (n = 150) elite male junior soccer players, members of the First Croatian Junior League Teams, and 60 (n = 60) handball and 60 (n = 60) basketball players also members of the First Croatian Junior League Teams volunteered to participate in the study, tested for the purpose of crossvalidation. The SAR and V-SAR had the greatest AVR and ICC. The within-subjects variation ranged from between 0.3 and 3.8%. The lowest value of CV was found between the LSPL and LSPR. Low to moderate statistically significant correlation coefficients were found among all the measured flexibility tests. It was observed that the greatest correlations existed between the SAR and V-SAR (r = 0.65) and between the LLSR and LLSL (r = 0.56). Statistically significant correlations were also observed between the BLPL and BLPR (r = 0.62). The principal components factor analysis of 9 flexibility tests resulted in the extraction of 3 significant components. The results of this study have the following implications for the assessment of flexibility in soccer: (a) all flexibility tests used in this study have the acceptable between and within-subjects reliability and they can be used to estimate the flexibility of soccer players; (b) the LSPL and LSPR tests are the most reliable and valid flexibility tests for the estimation of flexibility of professional soccer players.
Prioritisation of patients on waiting lists for hip and knee arthroplasties and cataract surgery: Instruments validation

PubMed Central

Allepuz, Alejandro; Espallargues, Mireia; Moharra, Montse; Comas, Mercè; Pons, Joan MV

2008-01-01

Background Prioritisation instruments were developed for patients on waiting list for hip and knee arthroplasties (AI) and cataract surgery (CI). The aim of the study was to assess their convergent and discriminant validity and inter-observer reliability. Methods Multicentre validation study which included orthopaedic surgeons and ophthalmologists from 10 hospitals. Participating doctors were asked to include all eligible patients placed in the waiting list for the procedures under study during the medical visit. Doctors assessed patients' priority through a visual analogue scale (VAS) and administered the prioritisation instrument. Information on socio-demographic data and health-related quality of life (HRQOL) (HUI3, EQ-5D, WOMAC and VF-14) was obtained through a telephone interview with patients. The correlation coefficients between the prioritisation instrument score and VAS and HRQOL were calculated. For the reliability study a self-administered questionnaire, which included hypothetic patients' scenarios, was sent via postal mail to the doctors. The priority of these scenarios was assessed through the prioritisation instrument. The intraclass correlation coefficient (ICC) between doctors was calculated. Results Correlations with VAS were strong for the AI (0.64, CI95%: 0.59–0.68) and for the CI (0.65, CI95%: 0.62–0.69), and moderate between the WOMAC and the AI (0.39, CI95%: 0.33–0.45) and the VF-14 and the CI (0.38, IC95%: 0.33–0.43). The results of the discriminant analysis were in general as expected. Inter-observer reliability was 0.79 (CI95%: 0.64–0.94) for the AI, and 0.79 (CI95%: 0.63–0.95) for the CI. Conclusion The results show acceptable validity and reliability of the prioritisation instruments in establishing priority for surgery. PMID:18397519
Reliability and Validity of the Footprint Assessment Method Using Photoshop CS5 Software in Young People with Down Syndrome.

PubMed

Gutiérrez-Vilahú, Lourdes; Massó-Ortigosa, Núria; Rey-Abella, Ferran; Costa-Tutusaus, Lluís; Guerra-Balic, Myriam

2016-05-01

People with Down syndrome present skeletal abnormalities in their feet that can be analyzed by commonly used gold standard indices (the Hernández-Corvo index, the Chippaux-Smirak index, the Staheli arch index, and the Clarke angle) based on footprint measurements. The use of Photoshop CS5 software (Adobe Systems Software Ireland Ltd, Dublin, Ireland) to measure footprints has been validated in the general population. The present study aimed to assess the reliability and validity of this footprint assessment technique in the population with Down syndrome. Using optical podography and photography, 44 footprints from 22 patients with Down syndrome (11 men [mean ± SD age, 23.82 ± 3.12 years] and 11 women [mean ± SD age, 24.82 ± 6.81 years]) were recorded in a static bipedal standing position. A blinded observer performed the measurements using a validated manual method three times during the 4-month study, with 2 months between measurements. Test-retest was used to check the reliability of the Photoshop CS5 software measurements. Validity and reliability were obtained by intraclass correlation coefficient (ICC). The reliability test for all of the indices showed very good values for the Photoshop CS5 method (ICC, 0.982-0.995). Validity testing also found no differences between the techniques (ICC, 0.988-0.999). The Photoshop CS5 software method is reliable and valid for the study of footprints in young people with Down syndrome.

The Multitheoretical List of Therapeutic Interventions - 30 items (MULTI-30).

PubMed

Solomonov, Nili; McCarthy, Kevin S; Gorman, Bernard S; Barber, Jacques P

2018-01-16

To develop a brief version of the Multitheoretical List of Therapeutic Interventions (MULTI-60) in order to decrease completion time burden by approximately half, while maintaining content coverage. Study 1 aimed to select 30 items. Study 2 aimed to examine the reliability and internal consistency of the MULTI-30. Study 3 aimed to validate the MULTI-30 and ensure content coverage. In Study 1, the sample included 186 therapist and 255 patient MULTI ratings, and 164 ratings of sessions coded by trained observers. Internal consistency (Chronbach's alpha and McDonald's omega) was calculated and confirmatory factor analysis was conducted. Psychotherapy experts rated content relevance. Study 2 included a sample of 644 patient and 522 therapist ratings, and 793 codings of psychotherapy sessions. In Study 3, the sample included 33 codings of sessions. A series of regression analyses was conducted to examine replication of previously published findings using the MULTI-30. The MULTI-30 was found valid, reliable, and internally consistent across 2564 ratings examined across the three studies presented. The MULTI-30 a brief and reliable process measure. Future studies are required for further validation.
Alternate Forms Reliability of the Behavioral Relaxation Scale: Preliminary Results

ERIC Educational Resources Information Center

Lundervold, Duane A.; Dunlap, Angel L.

2006-01-01

Alternate forms reliability of the Behavioral Relaxation Scale (BRS; Poppen,1998), a direct observation measure of relaxed behavior, was examined. A single BRS score, based on long duration observation (5-minute), has been found to be a valid measure of relaxation and is correlated with self-report and some physiological measures. Recently,…
Using image J to document healing in ulcers of the foot in diabetes.

PubMed

Jeffcoate, William J; Musgrove, Alison J; Lincoln, Nadina B

2017-12-01

The aim of the study was to assess the reliability of measuring the cross-sectional area of diabetic foot ulcers using Image J software. The inter- and intra-rater reliability of ulcer area measures were assessed using digital images of acetate tracings of ulcers of the foot affecting 31 participants in an off-loading randomised trial. Observations were made independently by five specialist podiatrists, one of whom was experienced in the use of Image J software and educated the other four in a single session. The mean (±SD) of the mean cross-sectional areas of the 31 ulcers determined independently by the five observers was 1386·7 (±22·7) mm 2 . The correlation between all pairs of observers was >0·99 (P < 0·001). There was no significant difference overall between the five observers (ANOVA F1.538; P = 0·165) and no difference between any two (paired samples test t = -0·787-1·396; P ≥ 0·088). The correlation between the areas determined by two observers on two occasions separated by not less than 1 week was very high (0·997 and 0·999; P < 0·001 and <0·001, respectively). The inter- and intra-reliability of the Image J software is very high, with no evidence of a difference either between or within observers. This technique should be considered for both research and clinical use in order to document changes in ulcer area. © 2017 Medicalhelplines.com Inc and John Wiley & Sons Ltd.
Validation of personal digital photography to assess dietary quality among people with intellectual disabilities.

PubMed

Elinder, L S; Brunosson, A; Bergström, H; Hagströmer, M; Patterson, E

2012-02-01

Dietary assessment is a challenge in general, and specifically in individuals with intellectual disabilities (ID). This study aimed to evaluate personal digital photography as a method of assessing different aspects of dietary quality in this target group. Eighteen adults with ID were recruited from community residences and activity centres in Stockholm County. Participants were instructed to photograph all foods and beverages consumed during 1 day, while observed. Photographs were coded by two raters. Observations and photographs of meal frequency, intake occasions of four specific food and beverage items, meal quality and dietary diversity were compared. Evaluation of inter-rater reliability and validity of the method was performed by intra-class correlation analysis. With reminders from staff, 85% of all observed eating or drinking occasions were photographed. The inter-rater reliability was excellent for all assessed variables (ICC ≥ 0.88), except for meal quality where ICC was 0.66. The correlations between items assessed in photos and observations were strong to almost perfect with ICC values ranging from 0.71 to 0.92 and all were statistically significant. Personal digital photography appears to be a feasible, reliable and valid method for assessing dietary quality in people with mild to moderate ID, who have daily staff support. © 2011 The Authors. Journal of Intellectual Disability Research © 2011 Blackwell Publishing Ltd.
Development and validation of self-reported line drawings of the modified Beighton score for the assessment of generalised joint hypermobility.

PubMed

Cooper, Dale J; Scammell, Brigitte E; Batt, Mark E; Palmer, Debbie

2018-01-17

The impracticalities and comparative expense of carrying out a clinical assessment is an obstacle in many large epidemiological studies. The purpose of this study was to develop and validate a series of electronic self-reported line drawing instruments based on the modified Beighton scoring system for the assessment of self-reported generalised joint hypermobility. Five sets of line drawings were created to depict the 9-point Beighton score criteria. Each instrument consisted of an explanatory question whereby participants were asked to select the line drawing which best represented their joints. Fifty participants completed the self-report online instrument on two occasions, before attending a clinical assessment. A blinded expert clinical observer then assessed participants' on two occasions, using a standardised goniometry measurement protocol. Validity of the instrument was assessed by participant-observer agreement and reliability by participant repeatability and observer repeatability using unweighted Cohen's kappa (k). Validity and reliability were assessed for each item in the self-reported instrument separately, and for the sum of the total scores. An aggregate score for generalised joint hypermobility was determined based on a Beighton score of 4 or more out of 9. Observer-repeatability between the two clinical assessments demonstrated perfect agreement (k 1.00; 95% CI 1.00, 1.00). Self-reported participant-repeatability was lower but it was still excellent (k 0.91; 95% CI 0.74, 1.00). The participant-observer agreement was excellent (k 0.96; 95% CI 0.87, 1.00). Validity was excellent for the self-report instrument, with a good sensitivity of 0.87 (95% CI 0.81, 0.91) and excellent specificity of 0.99 (95% CI 0.98, 1.00). The self-reported instrument provides a valid and reliable assessment of the presence of generalised joint hypermobility and may have practical use in epidemiological studies.
Cross-Cultural Adaptation, Validity, and Reliability of the Persian Version of the Orebro Musculoskeletal Pain Screening Questionnaire.

PubMed

Shafeei, Asrin; Mokhtarinia, Hamid Reza; Maleki-Ghahfarokhi, Azam; Piri, Leila

2017-08-01

Observational study. To cross-culturally translate the Orebro Musculoskeletal Pain Screening Questionnaire (OMPQ) into Persian and then evaluate its psychometric properties (reliability, validity, ceiling, and flooring effects). To the authors' knowledge, prior to this study there has been no validated instrument to screen the risk of chronicity in Persian-speaking patients with low back pain (LBP) in Iran. The OMPQ was specifically developed as a self-administered screening tool for assessing the risk of LBP chronicity. The forward-backward translation method was used for the translation and cross-cultural adaptation of the original questionnaire. In total, 202 patients with subacute LBP completed the OMPQ and the pain disability questionnaire (PDQ), which was used to assess convergent validity. 62 patients completed the OMPQ a week later as a retest. Slight changes were made to the OMPQ during the translation/cultural adaptation process; face validity of the Persian version was obtained. The Persian OMPQ showed excellent test-retest reliability (intraclass correlation coefficient=0.89). Its internal consistency was 0.71, and its convergent validity was confirmed by good correlation coefficient between the OMPQ and PDQ total scores ( r =0.72, p <0.05). No ceiling or floor effects were observed. The Persian version of the OMPQ is acceptable for the target society in terms of face validity, construct validity, reliability, and consistency. It is therefore considered a useful instrument for screening Iranian patients with LBP.
Reliability and measurement error of sagittal spinal motion parameters in 220 patients with chronic low back pain using a three-dimensional measurement device.

PubMed

Mieritz, Rune M; Bronfort, Gert; Jakobsen, Markus D; Aagaard, Per; Hartvigsen, Jan

2014-09-01

A basic premise for any instrument measuring spinal motion is that reliable outcomes can be obtained on a relevant sample under standardized conditions. The purpose of this study was to assess the overall reliability and measurement error of regional spinal sagittal plane motion in patients with chronic low back pain (LBP), and then to evaluate the influence of body mass index, examiner, gender, stability of pain, and pain distribution on reliability and measurement error. This study comprises a test-retest design separated by 7 to 14 days. The patient cohort consisted of 220 individuals with chronic LBP. Kinematics of the lumbar spine were sampled during standardized spinal extension-flexion testing using a 6-df instrumented spatial linkage system. Test-retest reliability and measurement error were evaluated using interclass correlation coefficients (ICC(1,1)) and Bland-Altman limits of agreement (LOAs). The overall test-retest reliability (ICC(1,1)) for various motion parameters ranged from 0.51 to 0.70, and relatively wide LOAs were observed for all parameters. Reliability measures in patient subgroups (ICC(1,1)) ranged between 0.34 and 0.77. In general, greater (ICC(1,1)) coefficients and smaller LOAs were found in subgroups with patients examined by the same examiner, patients with a stable pain level, patients with a body mass index less than below 30 kg/m(2), patients who were men, and patients in the Quebec Task Force classifications Group 1. This study shows that sagittal plane kinematic data from patients with chronic LBP may be sufficiently reliable in measurements of groups of patients. However, because of the large LOAs, this test procedure appears unusable at the individual patient level. Furthermore, reliability and measurement error varies substantially among subgroups of patients. Copyright © 2014 Elsevier Inc. All rights reserved.
A hierarchical Bayesian approach to adaptive vision testing: A case study with the contrast sensitivity function.

PubMed

Gu, Hairong; Kim, Woojae; Hou, Fang; Lesmes, Luis Andres; Pitt, Mark A; Lu, Zhong-Lin; Myung, Jay I

2016-01-01

Measurement efficiency is of concern when a large number of observations are required to obtain reliable estimates for parametric models of vision. The standard entropy-based Bayesian adaptive testing procedures addressed the issue by selecting the most informative stimulus in sequential experimental trials. Noninformative, diffuse priors were commonly used in those tests. Hierarchical adaptive design optimization (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) further improves the efficiency of the standard Bayesian adaptive testing procedures by constructing an informative prior using data from observers who have already participated in the experiment. The present study represents an empirical validation of HADO in estimating the human contrast sensitivity function. The results show that HADO significantly improves the accuracy and precision of parameter estimates, and therefore requires many fewer observations to obtain reliable inference about contrast sensitivity, compared to the method of quick contrast sensitivity function (Lesmes, Lu, Baek, & Albright, 2010), which uses the standard Bayesian procedure. The improvement with HADO was maintained even when the prior was constructed from heterogeneous populations or a relatively small number of observers. These results of this case study support the conclusion that HADO can be used in Bayesian adaptive testing by replacing noninformative, diffuse priors with statistically justified informative priors without introducing unwanted bias.
A hierarchical Bayesian approach to adaptive vision testing: A case study with the contrast sensitivity function

PubMed Central

Gu, Hairong; Kim, Woojae; Hou, Fang; Lesmes, Luis Andres; Pitt, Mark A.; Lu, Zhong-Lin; Myung, Jay I.

2016-01-01

Measurement efficiency is of concern when a large number of observations are required to obtain reliable estimates for parametric models of vision. The standard entropy-based Bayesian adaptive testing procedures addressed the issue by selecting the most informative stimulus in sequential experimental trials. Noninformative, diffuse priors were commonly used in those tests. Hierarchical adaptive design optimization (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) further improves the efficiency of the standard Bayesian adaptive testing procedures by constructing an informative prior using data from observers who have already participated in the experiment. The present study represents an empirical validation of HADO in estimating the human contrast sensitivity function. The results show that HADO significantly improves the accuracy and precision of parameter estimates, and therefore requires many fewer observations to obtain reliable inference about contrast sensitivity, compared to the method of quick contrast sensitivity function (Lesmes, Lu, Baek, & Albright, 2010), which uses the standard Bayesian procedure. The improvement with HADO was maintained even when the prior was constructed from heterogeneous populations or a relatively small number of observers. These results of this case study support the conclusion that HADO can be used in Bayesian adaptive testing by replacing noninformative, diffuse priors with statistically justified informative priors without introducing unwanted bias. PMID:27105061
The Teacher's Role in Quality Classroom Interactions: Q&A with Dr. Drew Gitomer. REL Mid-Atlantic Teacher Effectiveness Webinar Series

ERIC Educational Resources Information Center

Regional Educational Laboratory Mid-Atlantic, 2013

2013-01-01

In this webinar, Dr. Drew Gitomer, professor at Rutgers University, shared results from recent studies of classroom observations that helped participants understand both general findings about the qualities of classroom interactions and also the challenges to carrying out valid and reliable observations. This Q&A addressed the questions…
Exposure assessment in different occupational groups at a hospital using Quick Exposure Check (QEC) - a pilot study.

PubMed

Ericsson, Pernilla; Björklund, Martin; Wahlström, Jens

2012-01-01

In order to test the feasibility and sensitivity of the ergonomic exposure assessment tool Quick Exposure Check (QEC), a pilot-study was conducted. The aim was to test QEC in different occupational groups to compare the exposure in the most common work task with the exposure in the work task perceived as the most strenuous for the neck/shoulder region, and to test intra-observer reliability. One experienced ergonomist observed 23 workers. The mean observation time was 45 minutes, waiting time and time for complementary questions included. The exposure scores varied between the different occupational groups as well as between workers within the occupational groups. Eighteen workers rated their most common work task as also being the most strenuous for the neck/shoulder region. For the remaining five workers, the mean exposure score were higher both for the neck and shoulder/arm in the most common work task. Intra-observer reliability shows agreement in 86% of the exposure interactions in the neck and in 71% in the shoulder/arm. QEC seems to fulfill the expectations of being a quick, sensible and practical exposure assessment tool that covers physical risk factors in the neck, upper extremities and low back.
Development and evaluation of the nurse quality of communication with patient questionnaire.

PubMed

Vuković, Mira; Gvozdenović, Branislav S; Stamatović-Gajić, Branka; Ilić, Miodrag; Gajić, Tomislav

2010-01-01

Nurse/patient relationship as a complex interrelation or as an interaction of the factor patient and factor nurse has been a subject of a number of studies during the past ten years. Nurse/patient communication is a special entity, usually observed within a framework of the wider nurse/patient relationship. In that regard, we wanted to develop a standardized questionnaire that could reliably measure the quality of communication between nurse and patient, and be used by nurses. The main goal of this study was to develop and evaluate construct validity of the Nurse Quality of Communication with Patient Questionnaire (NQCPQ), as well as to evaluate its reliability. The goal was also to establish a measure of inter-raters reliability, using two repeated measurements of results by items and scores of the NQCPQ, on the same observed units by two assessors. The starting NQCPQ that consists of 25 items, was filled in by two groups of nurses. Each nurse was questioned during morning and afternoon shifts, in order to evaluate their communication with hospitalized patients, using marks from 1 to 6. To evaluate construct validity, we used the analysis of main components, while reliability was assessed using intraclass correlation coefficient and Cronbach-alpha coefficient. To evaluate interraters reliability, we used Pearson correlation coefficient. Using a group of 118 patients, we explained 86% of the unknown, regarding the investigated phenomenon (communication nurse/patient), using one component by which we separated 6 items of the questionnaire. Inter-item correlation (alpha) in this component was 0.96. Pearson correlation coefficient was highly significant, value 0.7 by item, and correlation coefficient for scores at repeated measurements was 0.84. NQCPQ is 6-item instrument with high construct validity. It can be used to measure quality of nurse/patient communication in a simple, fast and reliable way. It could contribute to more adequate research and defining of this problem, and as such could be used in studies of interaction of psychometric, clinical, biochemical, socio-cultural, demographic and other parameters as well.
Reliability analysis of the objective structured clinical examination using generalizability theory.

PubMed

Trejo-Mejía, Juan Andrés; Sánchez-Mendiola, Melchor; Méndez-Ramírez, Ignacio; Martínez-González, Adrián

2016-01-01

The objective structured clinical examination (OSCE) is a widely used method for assessing clinical competence in health sciences education. Studies using this method have shown evidence of validity and reliability. There are no published studies of OSCE reliability measurement with generalizability theory (G-theory) in Latin America. The aims of this study were to assess the reliability of an OSCE in medical students using G-theory and explore its usefulness for quality improvement. An observational cross-sectional study was conducted at National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City. A total of 278 fifth-year medical students were assessed with an 18-station OSCE in a summative end-of-career final examination. There were four exam versions. G-theory with a crossover random effects design was used to identify the main sources of variance. Examiners, standardized patients, and cases were considered as a single facet of analysis. The exam was applied to 278 medical students. The OSCE had a generalizability coefficient of 0.93. The major components of variance were stations, students, and residual error. The sites and the versions of the tests had minimum variance. Our study achieved a G coefficient similar to that found in other reports, which is acceptable for summative tests. G-theory allows the estimation of the magnitude of multiple sources of error and helps decision makers to determine the number of stations, test versions, and examiners needed to obtain reliable measurements.
Reliability analysis of the objective structured clinical examination using generalizability theory.

PubMed

Trejo-Mejía, Juan Andrés; Sánchez-Mendiola, Melchor; Méndez-Ramírez, Ignacio; Martínez-González, Adrián

2016-01-01

Background The objective structured clinical examination (OSCE) is a widely used method for assessing clinical competence in health sciences education. Studies using this method have shown evidence of validity and reliability. There are no published studies of OSCE reliability measurement with generalizability theory (G-theory) in Latin America. The aims of this study were to assess the reliability of an OSCE in medical students using G-theory and explore its usefulness for quality improvement. Methods An observational cross-sectional study was conducted at National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City. A total of 278 fifth-year medical students were assessed with an 18-station OSCE in a summative end-of-career final examination. There were four exam versions. G-theory with a crossover random effects design was used to identify the main sources of variance. Examiners, standardized patients, and cases were considered as a single facet of analysis. Results The exam was applied to 278 medical students. The OSCE had a generalizability coefficient of 0.93. The major components of variance were stations, students, and residual error. The sites and the versions of the tests had minimum variance. Conclusions Our study achieved a G coefficient similar to that found in other reports, which is acceptable for summative tests. G-theory allows the estimation of the magnitude of multiple sources of error and helps decision makers to determine the number of stations, test versions, and examiners needed to obtain reliable measurements.
Reliability and validity of the English and Malay versions of the Driving and Riding Questionnaire: a pilot study amongst older car drivers and motorcycle riders.

PubMed

Ang, B H; Chen, W S; Ngin, C K; Oxley, J A; Lee, S W H

2018-02-01

This study aimed to examine the reliability and validity of the English and Malay versions of the Driving and Riding Questionnaire. An observational study with a mix-method approach by utilising both questionnaire and short debriefing interviews. Forward and backward translations of the original questionnaire were performed. The translated questionnaire was assessed for clarity by a multidisciplinary research team, translators, and several Malay native speakers. A total of 24 subjects participated in the pilot study. Reliability (Cronbach's alpha) and validity (content validity) of the original and translated questionnaires were examined. The English and Malay versions of the Driving and Riding Questionnaire were found to be reliable tools in measuring driving behaviours amongst older drivers and riders, with Cronbach's alpha of 0.9158 and 0.8919, respectively. For content validity, the questionnaires were critically reviewed in terms of relevance, clarity, simplicity, and ambiguity. The feedback obtained from participants addressed various aspects of the questionnaire related to the improvement of wordings used and inclusion of visual guide to enhance the understanding of the items in the questionnaire. This feedback was incorporated into the final versions of the English and Malay questionnaires. The findings of this study demonstrated both the English and Malay versions of the Driving and Riding Questionnaire to be valid and reliable. Copyright © 2017 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.
Medicine is not science: guessing the future, predicting the past.

PubMed

Miller, Clifford

2014-12-01

Irregularity limits human ability to know, understand and predict. A better understanding of irregularity may improve the reliability of knowledge. Irregularity and its consequences for knowledge are considered. Reliable predictive empirical knowledge of the physical world has always been obtained by observation of regularities, without needing science or theory. Prediction from observational knowledge can remain reliable despite some theories based on it proving false. A naïve theory of irregularity is outlined. Reducing irregularity and/or increasing regularity can increase the reliability of knowledge. Beyond long experience and specialization, improvements include implementing supporting knowledge systems of libraries of appropriately classified prior cases and clinical histories and education about expertise, intuition and professional judgement. A consequence of irregularity and complexity is that classical reductionist science cannot provide reliable predictions of the behaviour of complex systems found in nature, including of the human body. Expertise, expert judgement and their exercise appear overarching. Diagnosis involves predicting the past will recur in the current patient applying expertise and intuition from knowledge and experience of previous cases and probabilistic medical theory. Treatment decisions are an educated guess about the future (prognosis). Benefits of the improvements suggested here are likely in fields where paucity of feedback for practitioners limits development of reliable expert diagnostic intuition. Further analysis, definition and classification of irregularity is appropriate. Observing and recording irregularities are initial steps in developing irregularity theory to improve the reliability and extent of knowledge, albeit some forms of irregularity present inherent difficulties. © 2014 John Wiley & Sons, Ltd.
Assessment of radial torsion using computed tomography in dogs with and without antebrachial limb deformity.

PubMed

Kroner, Kevin; Cooley, Katie; Hoey, Seamus; Hetzel, Scott J; Bleedorn, Jason A

2017-01-01

To evaluate the reliability of radial torsion assessment in dogs using computed tomography (CT). Cadaveric and retrospective observational clinical study. Thoracic limbs (n = 40) from bilateral normal cadaveric canine specimens (10 pairs) and unilateral antebrachial angular limb deformity (ALD) dogs (10 uniapical and 10 biapical deformities). Limbs were evaluated using CT. Frontal, sagittal, and axial plane (torsion) values were obtained using published guidelines and compared between groups and limbs. Radial torsion reliability was assessed among 3 observers using intraclass correlation coefficients (ICC). The mean (±SD) radial torsion of normal dogs was 3.6° ± 6.4° and contained a significant right to left limb variation of 2.6°. Mean radial torsion in uniapical ALD limbs (3.6° ± 18.7°) was not significantly different from biapical ALD limbs (8.9° ± 17.9°). There was a wide range of torsion values in normal and ALD limbs. The interobserver reliability was excellent (ICC > 0.8) for normal dogs, good (0.73) for uniapical, and excellent (0.89) for biapical ALD limbs. The intraobserver reliability was excellent (>0.8) for all groups. There was a small side-to-side variation of radial torsion in normal dogs. With directed training, torsion assessment using CT is reliable in dogs with and without antebrachial bone deformity. © 2016 The American College of Veterinary Surgeons.
Development and validation of a tool to evaluate the quality of medical education websites in pathology.

PubMed

Alyusuf, Raja H; Prasad, Kameshwar; Abdel Satir, Ali M; Abalkhail, Ali A; Arora, Roopa K

2013-01-01

The exponential use of the internet as a learning resource coupled with varied quality of many websites, lead to a need to identify suitable websites for teaching purposes. The aim of this study is to develop and to validate a tool, which evaluates the quality of undergraduate medical educational websites; and apply it to the field of pathology. A tool was devised through several steps of item generation, reduction, weightage, pilot testing, post-pilot modification of the tool and validating the tool. Tool validation included measurement of inter-observer reliability; and generation of criterion related, construct related and content related validity. The validated tool was subsequently tested by applying it to a population of pathology websites. Reliability testing showed a high internal consistency reliability (Cronbach's alpha = 0.92), high inter-observer reliability (Pearson's correlation r = 0.88), intraclass correlation coefficient = 0.85 and κ =0.75. It showed high criterion related, construct related and content related validity. The tool showed moderately high concordance with the gold standard (κ =0.61); 92.2% sensitivity, 67.8% specificity, 75.6% positive predictive value and 88.9% negative predictive value. The validated tool was applied to 278 websites; 29.9% were rated as recommended, 41.0% as recommended with caution and 29.1% as not recommended. A systematic tool was devised to evaluate the quality of websites for medical educational purposes. The tool was shown to yield reliable and valid inferences through its application to pathology websites.
Impact of image quality on reliability of the measurements of left ventricular systolic function and global longitudinal strain in 2D echocardiography

PubMed Central

Nagata, Yasufumi; Kado, Yuichiro; Onoue, Takeshi; Otani, Kyoko; Nakazono, Akemi; Otsuji, Yutaka; Takeuchi, Masaaki

2018-01-01

Background Left ventricular ejection fraction (LVEF) and global longitudinal strain (GLS) play important roles in diagnosis and management of cardiac diseases. However, the issue of the accuracy and reliability of LVEF and GLS remains to be solved. Image quality is one of the most important factors affecting measurement variability. The aim of this study was to investigate whether improved image quality could reduce observer variability. Methods Two sets of three apical images were acquired using relatively old- and new-generation ultrasound imaging systems (Vivid 7 and Vivid E95) in 308 subjects. Image quality was assessed by endocardial border delineation index (EBDI) using a 3-point scoring system. Three observers measured the LVEF and GLS, and these values and inter-observer variability were investigated. Results Image quality was significantly better with Vivid E95 (EBDI: 26.8 ± 5.9) than that with Vivid 7 (22.8 ± 6.3, P < 0.0001). Regarding the inter-observer variability of LVEF, the r-value, bias, 95% limit of agreement and intra-class correlation coefficient for Vivid 7 were comparable to those for Vivid E95. The % variabilities were significantly lower for Vivid E95 (5.3–6.5%) than those for Vivid 7 (6.5–7.5%). Regarding GLS, all observer variability parameters were better for Vivid E95 than for Vivid 7. Improvements in image quality yielded benefits to both LVEF and GLS measurement reliability. Multivariate analysis showed that image quality was indeed an important factor of observer variability in the measurement of LVEF and GLS. Conclusions The new-generation ultrasound imaging system offers improved image quality and reduces inter-observer variability in the measurement of LVEF and GLS. PMID:29432198
Impact of image quality on reliability of the measurements of left ventricular systolic function and global longitudinal strain in 2D echocardiography.

PubMed

Nagata, Yasufumi; Kado, Yuichiro; Onoue, Takeshi; Otani, Kyoko; Nakazono, Akemi; Otsuji, Yutaka; Takeuchi, Masaaki

2018-03-01

Left ventricular ejection fraction (LVEF) and global longitudinal strain (GLS) play important roles in diagnosis and management of cardiac diseases. However, the issue of the accuracy and reliability of LVEF and GLS remains to be solved. Image quality is one of the most important factors affecting measurement variability. The aim of this study was to investigate whether improved image quality could reduce observer variability. Two sets of three apical images were acquired using relatively old- and new-generation ultrasound imaging systems (Vivid 7 and Vivid E95) in 308 subjects. Image quality was assessed by endocardial border delineation index (EBDI) using a 3-point scoring system. Three observers measured the LVEF and GLS, and these values and inter-observer variability were investigated. Image quality was significantly better with Vivid E95 (EBDI: 26.8 ± 5.9) than that with Vivid 7 (22.8 ± 6.3, P < 0.0001). Regarding the inter-observer variability of LVEF, the r -value, bias, 95% limit of agreement and intra-class correlation coefficient for Vivid 7 were comparable to those for Vivid E95. The % variabilities were significantly lower for Vivid E95 (5.3-6.5%) than those for Vivid 7 (6.5-7.5%). Regarding GLS, all observer variability parameters were better for Vivid E95 than for Vivid 7. Improvements in image quality yielded benefits to both LVEF and GLS measurement reliability. Multivariate analysis showed that image quality was indeed an important factor of observer variability in the measurement of LVEF and GLS. The new-generation ultrasound imaging system offers improved image quality and reduces inter-observer variability in the measurement of LVEF and GLS. © 2018 The authors.

Are distal radius fracture classifications reproducible? Intra and interobserver agreement.

PubMed

Belloti, João Carlos; Tamaoki, Marcel Jun Sugawara; Franciozi, Carlos Eduardo da Silveira; Santos, João Baptista Gomes dos; Balbachevsky, Daniel; Chap Chap, Eduardo; Albertoni, Walter Manna; Faloppa, Flávio

2008-05-01

Various classification systems have been proposed for fractures of the distal radius, but the reliability of these classifications is seldom addressed. For a fracture classification to be useful, it must provide prognostic significance, interobserver reliability and intraobserver reproducibility. The aim here was to evaluate the intraobserver and interobserver agreement of distal radius fracture classifications. This was a validation study on interobserver and intraobserver reliability. It was developed in the Department of Orthopedics and Traumatology, Universidade Federal de São Paulo - Escola Paulista de Medicina. X-rays from 98 cases of displaced distal radius fracture were evaluated by five observers: one third-year orthopedic resident (R3), one sixth-year undergraduate medical student (UG6), one radiologist physician (XRP), one orthopedic trauma specialist (OT) and one orthopedic hand surgery specialist (OHS). The radiographs were classified on three different occasions (times T1, T2 and T3) using the Universal (Cooney), Arbeitsgemeinschaft für Osteosynthesefragen/Association for the Study of Internal Fixation (AO/ASIF), Frykman and Fernández classifications. The kappa coefficient (kappa) was applied to assess the degree of agreement. Among the three occasions, the highest mean intraobserver k was observed in the Universal classification (0.61), followed by Fernández (0.59), Frykman (0.55) and AO/ASIF (0.49). The interobserver agreement was unsatisfactory in all classifications. The Fernández classification showed the best agreement (0.44) and the worst was the Frykman classification (0.26). The low agreement levels observed in this study suggest that there is still no classification method with high reproducibility.
Methods to Improve Reliability of Video Recorded Behavioral Data

PubMed Central

Haidet, Kim Kopenhaver; Tate, Judith; Divirgilio-Thomas, Dana; Kolanowski, Ann; Happ, Mary Beth

2009-01-01

Behavioral observation is a fundamental component of nursing practice and a primary source of clinical research data. The use of video technology in behavioral research offers important advantages to nurse scientists in assessing complex behaviors and relationships between behaviors. The appeal of using this method should be balanced, however, by an informed approach to reliability issues. In this paper, we focus on factors that influence reliability, such as the use of sensitizing sessions to minimize participant reactivity and the importance of training protocols for video coders. In addition, we discuss data quality, the selection and use of observational tools, calculating reliability coefficients, and coding considerations for special populations based on our collective experiences across three different populations and settings. PMID:19434651
Reproducibility and reliability of the ankle-brachial index as assessed by vascular experts, family physicians and nurses.

PubMed

Holland-Letz, Tim; Endres, Heinz G; Biedermann, Stefanie; Mahn, Matthias; Kunert, Joachim; Groh, Sabine; Pittrow, David; von Bilderling, Peter; Sternitzky, Reinhardt; Diehm, Curt

2007-05-01

The reliability of ankle-brachial index (ABI) measurements performed by different observer groups in primary care has not yet been determined. The aims of the study were to provide precise estimates for all effects influencing the variability of the ABI (patients' individual variability, intra- and inter-observer variability), with particular focus on the performance of different observer groups. Using a partially balanced incomplete block design, 144 unselected individuals aged > or = 65 years underwent double ABI measurements by one vascular surgeon or vascular physician, one family physician and one nurse with training in Doppler sonography. Three groups comprising a total of 108 individuals were analyzed (only two with ABI < 0.90). Errors for two repeated measurements for all three observer groups did not differ (experts 8.5%, family physicians 7.7%, and nurses 7.5%, p = 0.39). There was no relevant bias among observer groups. Intra-observer variability expressed as standard deviation divided by the mean was 8%, and inter-observer variability was 9%. In conclusion, reproducibility of the ABI measurement was good in this cohort of elderly patients who almost all had values in the normal range. The mean error of 8-9% within or between observers is smaller than with established screening measures. Since there were no differences among observers with different training backgrounds, our study confirms the appropriateness of ABI assessment for screening peripheral arterial disease (PAD) and generalized atherosclerosis in the primary case setting. Given the importance of the early detection and management of PAD, this diagnostic tool should be used routinely as a standard for PAD screening. Additional studies will be required to confirm our observations in patients with PAD of various severities.
Reliability analysis of a phaser measurement unit using a generalized fuzzy lambda-tau(GFLT) technique.

PubMed

Komal

2018-05-01

Nowadays power consumption is increasing day-by-day. To fulfill failure free power requirement, planning and implementation of an effective and reliable power management system is essential. Phasor measurement unit(PMU) is one of the key device in wide area measurement and control systems. The reliable performance of PMU assures failure free power supply for any power system. So, the purpose of the present study is to analyse the reliability of a PMU used for controllability and observability of power systems utilizing available uncertain data. In this paper, a generalized fuzzy lambda-tau (GFLT) technique has been proposed for this purpose. In GFLT, system components' uncertain failure and repair rates are fuzzified using fuzzy numbers having different shapes such as triangular, normal, cauchy, sharp gamma and trapezoidal. To select a suitable fuzzy number for quantifying data uncertainty, system experts' opinion have been considered. The GFLT technique applies fault tree, lambda-tau method, fuzzified data using different membership functions, alpha-cut based fuzzy arithmetic operations to compute some important reliability indices. Furthermore, in this study ranking of critical components of the system using RAM-Index and sensitivity analysis have also been performed. The developed technique may be helpful to improve system performance significantly and can be applied to analyse fuzzy reliability of other engineering systems. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
Brief Report: An Exploratory Study of the Diagnostic Reliability for Autism Spectrum Disorder

ERIC Educational Resources Information Center

Taylor, Lauren J.; Eapen, Valsamma; Maybery, Murray; Midford, Sue; Paynter, Jessica; Quarmby, Lyndsay; Smith, Timothy; Williams, Katrina; Whitehouse, Andrew J.

2017-01-01

Previous research shows inconsistency in clinician-assigned diagnoses of Autism Spectrum Disorder (ASD). We conducted an exploratory study that examined the concordance of diagnoses between a multidisciplinary assessment team and a range of independent clinicians throughout Australia. Nine video-taped Autism Diagnostic Observation Schedule (ADOS)…
Evaluating the Sensitivity of Agricultural Model Performance to Different Climate Inputs: Supplemental Material

NASA Technical Reports Server (NTRS)

Glotter, Michael J.; Ruane, Alex C.; Moyer, Elisabeth J.; Elliott, Joshua W.

2015-01-01

Projections of future food production necessarily rely on models, which must themselves be validated through historical assessments comparing modeled and observed yields. Reliable historical validation requires both accurate agricultural models and accurate climate inputs. Problems with either may compromise the validation exercise. Previous studies have compared the effects of different climate inputs on agricultural projections but either incompletely or without a ground truth of observed yields that would allow distinguishing errors due to climate inputs from those intrinsic to the crop model. This study is a systematic evaluation of the reliability of a widely used crop model for simulating U.S. maize yields when driven by multiple observational data products. The parallelized Decision Support System for Agrotechnology Transfer (pDSSAT) is driven with climate inputs from multiple sources reanalysis, reanalysis that is bias corrected with observed climate, and a control dataset and compared with observed historical yields. The simulations show that model output is more accurate when driven by any observation-based precipitation product than when driven by non-bias-corrected reanalysis. The simulations also suggest, in contrast to previous studies, that biased precipitation distribution is significant for yields only in arid regions. Some issues persist for all choices of climate inputs: crop yields appear to be oversensitive to precipitation fluctuations but under sensitive to floods and heat waves. These results suggest that the most important issue for agricultural projections may be not climate inputs but structural limitations in the crop models themselves.
Evaluating the sensitivity of agricultural model performance to different climate inputs

PubMed Central

Glotter, Michael J.; Moyer, Elisabeth J.; Ruane, Alex C.; Elliott, Joshua W.

2017-01-01

Projections of future food production necessarily rely on models, which must themselves be validated through historical assessments comparing modeled to observed yields. Reliable historical validation requires both accurate agricultural models and accurate climate inputs. Problems with either may compromise the validation exercise. Previous studies have compared the effects of different climate inputs on agricultural projections, but either incompletely or without a ground truth of observed yields that would allow distinguishing errors due to climate inputs from those intrinsic to the crop model. This study is a systematic evaluation of the reliability of a widely-used crop model for simulating U.S. maize yields when driven by multiple observational data products. The parallelized Decision Support System for Agrotechnology Transfer (pDSSAT) is driven with climate inputs from multiple sources – reanalysis, reanalysis bias-corrected with observed climate, and a control dataset – and compared to observed historical yields. The simulations show that model output is more accurate when driven by any observation-based precipitation product than when driven by un-bias-corrected reanalysis. The simulations also suggest, in contrast to previous studies, that biased precipitation distribution is significant for yields only in arid regions. However, some issues persist for all choices of climate inputs: crop yields appear oversensitive to precipitation fluctuations but undersensitive to floods and heat waves. These results suggest that the most important issue for agricultural projections may be not climate inputs but structural limitations in the crop models themselves. PMID:29097985
Validity and reliability of acoustic analysis of respiratory sounds in infants

PubMed Central

Elphick, H; Lancaster, G; Solis, A; Majumdar, A; Gupta, R; Smyth, R

2004-01-01

Objective: To investigate the validity and reliability of computerised acoustic analysis in the detection of abnormal respiratory noises in infants. Methods: Blinded, prospective comparison of acoustic analysis with stethoscope examination. Validity and reliability of acoustic analysis were assessed by calculating the degree of observer agreement using the κ statistic with 95% confidence intervals (CI). Results: 102 infants under 18 months were recruited. Convergent validity for agreement between stethoscope examination and acoustic analysis was poor for wheeze (κ = 0.07 (95% CI, –0.13 to 0.26)) and rattles (κ = 0.11 (–0.05 to 0.27)) and fair for crackles (κ = 0.36 (0.18 to 0.54)). Both the stethoscope and acoustic analysis distinguished well between sounds (discriminant validity). Agreement between observers for the presence of wheeze was poor for both stethoscope examination and acoustic analysis. Agreement for rattles was moderate for the stethoscope but poor for acoustic analysis. Agreement for crackles was moderate using both techniques. Within-observer reliability for all sounds using acoustic analysis was moderate to good. Conclusions: The stethoscope is unreliable for assessing respiratory sounds in infants. This has important implications for its use as a diagnostic tool for lung disorders in infants, and confirms that it cannot be used as a gold standard. Because of the unreliability of the stethoscope, the validity of acoustic analysis could not be demonstrated, although it could discriminate between sounds well and showed good within-observer reliability. For acoustic analysis, targeted training and the development of computerised pattern recognition systems may improve reliability so that it can be used in clinical practice. PMID:15499065
Strength Analysis and Reliability Evaluation for Speed Reducers

NASA Astrophysics Data System (ADS)

Tsai, Yuo-Tern; Hsu, Yung-Yuan

2017-09-01

This paper studies the structural stresses of differential drive (DD) and harmonic drive (HD) for design improvement of reducers. The designed principles of the two reducers are reported for function comparison. The critical components of the reducers are constructed for performing motion simulation and stress analysis. DD is designed based on differential displacement of the decelerated gear ring as well as HD on a flexible spline. Finite element method (FEM) is used to analyze the structural stresses including the dynamic properties of the reducers. The stresses including kinematic properties of the two reducers are compared to observe the properties of the designs. The analyzed results are applied to identify the allowable loads of the reducers in use. The reliabilities of the reducers in different loads are further calculated according to the variation of stress. The studied results are useful on engineering analysis and reliability evaluation for designing a speed reducer with high ratios.
Is One Trial Sufficient to Obtain Excellent Pressure Pain Threshold Reliability in the Low Back of Asymptomatic Individuals? A Test-Retest Study.

PubMed

Balaguier, Romain; Madeleine, Pascal; Vuillerme, Nicolas

2016-01-01

The assessment of pressure pain threshold (PPT) provides a quantitative value related to the mechanical sensitivity to pain of deep structures. Although excellent reliability of PPT has been reported in numerous anatomical locations, its absolute and relative reliability in the lower back region remains to be determined. Because of the high prevalence of low back pain in the general population and because low back pain is one of the leading causes of disability in industrialized countries, assessing pressure pain thresholds over the low back is particularly of interest. The purpose of this study study was (1) to evaluate the intra- and inter- absolute and relative reliability of PPT within 14 locations covering the low back region of asymptomatic individuals and (2) to determine the number of trial required to ensure reliable PPT measurements. Fifteen asymptomatic subjects were included in this study. PPTs were assessed among 14 anatomical locations in the low back region over two sessions separated by one hour interval. For the two sessions, three PPT assessments were performed on each location. Reliability was assessed computing intraclass correlation coefficients (ICC), standard error of measurement (SEM) and minimum detectable change (MDC) for all possible combinations between trials and sessions. Bland-Altman plots were also generated to assess potential bias in the dataset. Relative reliability for both intra- and inter- session was almost perfect with ICC ranged from 0.85 to 0.99. With respect to the intra-session, no statistical difference was reported for ICCs and SEM regardless of the conducted comparisons between trials. Conversely, for inter-session, ICCs and SEM values were significantly larger when two consecutive PPT measurements were used for data analysis. No significant difference was observed for the comparison between two consecutive measurements and three measurements. Excellent relative and absolute reliabilities were reported for both intra- and inter-session. Reliable measurements can be equally achieved when using the mean of two or three consecutive PPT measurements, as usually proposed in the literature, or with only the first one. Although reliability was almost perfect regardless of the conducted comparison between PPT assessments, our results suggest using two consecutive measurements to obtain higher short term absolute reliability.
Observational research rigour alone does not justify causal inference.

PubMed

Ejima, Keisuke; Li, Peng; Smith, Daniel L; Nagy, Tim R; Kadish, Inga; van Groen, Thomas; Dawson, John A; Yang, Yongbin; Patki, Amit; Allison, David B

2016-12-01

Differing opinions exist on whether associations obtained in observational studies can be reliable indicators of a causal effect if the observational study is sufficiently well controlled and executed. To test this, we conducted two animal observational studies that were rigorously controlled and executed beyond what is achieved in studies of humans. In study 1, we randomized 332 genetically identical C57BL/6J mice into three diet groups with differing food energy allotments and recorded individual self-selected daily energy intake and lifespan. In study 2, 60 male mice (CD1) were paired and divided into two groups for a 2-week feeding regimen. We evaluated the association between weight gain and food consumption. Within each pair, one animal was randomly assigned to an S group in which the animals had free access to food. The second paired animal (R group) was provided exactly the same diet that their S partner ate the day before. In study 1, across all three groups, we found a significant negative effect of energy intake on lifespan. However, we found a positive association between food intake and lifespan among the ad libitum feeding group: 29·99 (95% CI: 8·2-51·7) days per daily kcal. In study 2, we found a significant (P = 0·003) group (randomized vs. self-selected)-by-food consumption interaction effect on weight gain. At least in nutrition research, associations derived from observational studies may not be reliable indicators of causal effects, even with the most rigorous study designs achievable. © 2016 Stichting European Society for Clinical Investigation Journal Foundation. Published by John Wiley & Sons Ltd.
Repeatability of self-report measures of physical activity, sedentary and travel behaviour in Hong Kong adolescents for the iHealt(H) and IPEN - Adolescent studies.

PubMed

Cerin, Ester; Sit, Cindy H P; Huang, Ya-Jun; Barnett, Anthony; Macfarlane, Duncan J; Wong, Stephen S H

2014-06-06

Physical activity and sedentary behaviour are important contributors to adolescents' health. These behaviours may be affected by the school and neighbourhood built environments. However, current evidence on such effects is mainly limited to Western countries. The International Physical Activity and the Environment Network (IPEN)-Adolescent study aims to examine associations of the built environment with adolescent physical activity and sedentary behaviour across five continents.We report on the repeatability of measures of in-school and out-of school physical activity, plus measures of out-of-school sedentary and travel behaviours adopted by the IPEN - Adolescent study and adapted for Chinese-speaking Hong Kong adolescents participating in the international Healthy environments and active living in teenagers-(Hong Kong) [iHealt(H)] study, which is part of IPEN-Adolescent. Items gauging in-school physical activity and out-of-school physical activity, and out-of-school sedentary and travel behaviours developed for the IPEN - Adolescent study were translated from English into Chinese, adapted, and pilot tested. Sixty-eight Chinese-speaking 12-17 year old secondary school students (36 boys; 32 girls) residing in areas of Hong Kong differing in transport-related walkability were recruited. They self-completed the survey items twice, 8-16 days apart. Test-retest reliability was assessed for the whole sample and by gender using one-way random effects intra-class correlation coefficients (ICC). Test-retest reliability of items with restricted variability was assessed using percentage agreement. Overall test-retest reliability of items and scales was moderate to excellent (ICC = 0.47-0.92). Items with restricted variability in responses had a high percentage agreement (92%-100%). Test-retest reliability was similar in girls and boys, with the exception of daily hours of homework (reliability higher in girls) and number of school-based sports teams or after-school physical activity classes (reliability higher in boys). The translated and adapted self-report measures of physical activity, sedentary and travel behaviours used in the iHealt(H) study are sufficiently reliable. Levels of reliability are comparable or slightly higher than those observed for the original measures.
Assessing the environmental characteristics of cycling routes to school: a study on the reliability and validity of a Google Street View-based audit.

PubMed

Vanwolleghem, Griet; Van Dyck, Delfien; Ducheyne, Fabian; De Bourdeaudhuij, Ilse; Cardon, Greet

2014-06-10

Google Street View provides a valuable and efficient alternative to observe the physical environment compared to on-site fieldwork. However, studies on the use, reliability and validity of Google Street View in a cycling-to-school context are lacking. We aimed to study the intra-, inter-rater reliability and criterion validity of EGA-Cycling (Environmental Google Street View Based Audit - Cycling to school), a newly developed audit using Google Street View to assess the physical environment along cycling routes to school. Parents (n = 52) of 11-to-12-year old Flemish children, who mostly cycled to school, completed a questionnaire and identified their child's cycling route to school on a street map. Fifty cycling routes of 11-to-12-year olds were identified and physical environmental characteristics along the identified routes were rated with EGA-Cycling (5 subscales; 37 items), based on Google Street View. To assess reliability, two researchers performed the audit. Criterion validity of the audit was examined by comparing the ratings based on Google Street View with ratings through on-site assessments. Intra-rater reliability was high (kappa range 0.47-1.00). Large variations in the inter-rater reliability (kappa range -0.03-1.00) and criterion validity scores (kappa range -0.06-1.00) were reported, with acceptable inter-rater reliability values for 43% of all items and acceptable criterion validity for 54% of all items. EGA-Cycling can be used to assess physical environmental characteristics along cycling routes to school. However, to assess the micro-environment specifically related to cycling, on-site assessments have to be added.
Construct Validity and Reliability of the SARA Gait and Posture Sub-scale in Early Onset Ataxia

PubMed Central

Lawerman, Tjitske F.; Brandsma, Rick; Verbeek, Renate J.; van der Hoeven, Johannes H.; Lunsing, Roelineke J.; Kremer, Hubertus P. H.; Sival, Deborah A.

2017-01-01

Aim: In children, gait and posture assessment provides a crucial marker for the early characterization, surveillance and treatment evaluation of early onset ataxia (EOA). For reliable data entry of studies targeting at gait and posture improvement, uniform quantitative biomarkers are necessary. Until now, the pediatric test construct of gait and posture scores of the Scale for Assessment and Rating of Ataxia sub-scale (SARA) is still unclear. In the present study, we aimed to validate the construct validity and reliability of the pediatric (SARAGAIT/POSTURE) sub-scale. Methods: We included 28 EOA patients [15.5 (6–34) years; median (range)]. For inter-observer reliability, we determined the ICC on EOA SARAGAIT/POSTURE sub-scores by three independent pediatric neurologists. For convergent validity, we associated SARAGAIT/POSTURE sub-scores with: (1) Ataxic gait Severity Measurement by Klockgether (ASMK; dynamic balance), (2) Pediatric Balance Scale (PBS; static balance), (3) Gross Motor Function Classification Scale -extended and revised version (GMFCS-E&R), (4) SARA-kinetic scores (SARAKINETIC; kinetic function of the upper and lower limbs), (5) Archimedes Spiral (AS; kinetic function of the upper limbs), and (6) total SARA scores (SARATOTAL; i.e., summed SARAGAIT/POSTURE, SARAKINETIC, and SARASPEECH sub-scores). For discriminant validity, we investigated whether EOA co-morbidity factors (myopathy and myoclonus) could influence SARAGAIT/POSTURE sub-scores. Results: The inter-observer agreement (ICC) on EOA SARAGAIT/POSTURE sub-scores was high (0.97). SARAGAIT/POSTURE was strongly correlated with the other ataxia and functional scales [ASMK (rs = -0.819; p < 0.001); PBS (rs = -0.943; p < 0.001); GMFCS-E&R (rs = -0.862; p < 0.001); SARAKINETIC (rs = 0.726; p < 0.001); AS (rs = 0.609; p = 0.002); and SARATOTAL (rs = 0.935; p < 0.001)]. Comorbid myopathy influenced SARAGAIT/POSTURE scores by concurrent muscle weakness, whereas comorbid myoclonus predominantly influenced SARAKINETIC scores. Conclusion: In young EOA patients, separate SARAGAIT/POSTURE parameters reveal a good inter-observer agreement and convergent validity, implicating the reliability of the scale. In perspective of incomplete discriminant validity, it is advisable to interpret SARAGAIT/POSTURE scores for comorbid muscle weakness. PMID:29326569
Resting-state test-retest reliability of a priori defined canonical networks over different preprocessing steps.

PubMed

Varikuti, Deepthi P; Hoffstaedter, Felix; Genon, Sarah; Schwender, Holger; Reid, Andrew T; Eickhoff, Simon B

2017-04-01

Resting-state functional connectivity analysis has become a widely used method for the investigation of human brain connectivity and pathology. The measurement of neuronal activity by functional MRI, however, is impeded by various nuisance signals that reduce the stability of functional connectivity. Several methods exist to address this predicament, but little consensus has yet been reached on the most appropriate approach. Given the crucial importance of reliability for the development of clinical applications, we here investigated the effect of various confound removal approaches on the test-retest reliability of functional-connectivity estimates in two previously defined functional brain networks. Our results showed that gray matter masking improved the reliability of connectivity estimates, whereas denoising based on principal components analysis reduced it. We additionally observed that refraining from using any correction for global signals provided the best test-retest reliability, but failed to reproduce anti-correlations between what have been previously described as antagonistic networks. This suggests that improved reliability can come at the expense of potentially poorer biological validity. Consistent with this, we observed that reliability was proportional to the retained variance, which presumably included structured noise, such as reliable nuisance signals (for instance, noise induced by cardiac processes). We conclude that compromises are necessary between maximizing test-retest reliability and removing variance that may be attributable to non-neuronal sources.
Resting-state test-retest reliability of a priori defined canonical networks over different preprocessing steps

PubMed Central

Varikuti, Deepthi P.; Hoffstaedter, Felix; Genon, Sarah; Schwender, Holger; Reid, Andrew T.; Eickhoff, Simon B.

2016-01-01

Resting-state functional connectivity analysis has become a widely used method for the investigation of human brain connectivity and pathology. The measurement of neuronal activity by functional MRI, however, is impeded by various nuisance signals that reduce the stability of functional connectivity. Several methods exist to address this predicament, but little consensus has yet been reached on the most appropriate approach. Given the crucial importance of reliability for the development of clinical applications, we here investigated the effect of various confound removal approaches on the test-retest reliability of functional-connectivity estimates in two previously defined functional brain networks. Our results showed that grey matter masking improved the reliability of connectivity estimates, whereas de-noising based on principal components analysis reduced it. We additionally observed that refraining from using any correction for global signals provided the best test-retest reliability, but failed to reproduce anti-correlations between what have been previously described as antagonistic networks. This suggests that improved reliability can come at the expense of potentially poorer biological validity. Consistent with this, we observed that reliability was proportional to the retained variance, which presumably included structured noise, such as reliable nuisance signals (for instance, noise induced by cardiac processes). We conclude that compromises are necessary between maximizing test-retest reliability and removing variance that may be attributable to non-neuronal sources. PMID:27550015
AO Distal Radius Fracture Classification: Global Perspective on Observer Agreement.

PubMed

Jayakumar, Prakash; Teunis, Teun; Giménez, Beatriz Bravo; Verstreken, Frederik; Di Mascio, Livio; Jupiter, Jesse B

2017-02-01

Background The primary objective of this study was to test interobserver reliability when classifying fractures by consensus by AO types and groups among a large international group of surgeons. Secondarily, we assessed the difference in inter- and intraobserver agreement of the AO classification in relation to geographical location, level of training, and subspecialty. Methods A randomized set of radiographic and computed tomographic images from a consecutive series of 96 distal radius fractures (DRFs), treated between October 2010 and April 2013, was classified using an electronic web-based portal by an invited group of participants on two occasions. Results Interobserver reliability was substantial when classifying AO type A fractures but fair and moderate for type B and C fractures, respectively. No difference was observed by location, except for an apparent difference between participants from India and Australia classifying type B fractures. No statistically significant associations were observed comparing interobserver agreement by level of training and no differences were shown comparing subspecialties. Intra-rater reproducibility was "substantial" for fracture types and "fair" for fracture groups with no difference accounting for location, training level, or specialty. Conclusion Improved definition of reliability and reproducibility of this classification may be achieved using large international groups of raters, empowering decision making on which system to utilize. Level of Evidence Level III.
AO Distal Radius Fracture Classification: Global Perspective on Observer Agreement

PubMed Central

Jayakumar, Prakash; Teunis, Teun; Giménez, Beatriz Bravo; Verstreken, Frederik; Di Mascio, Livio; Jupiter, Jesse B.

2016-01-01

Background The primary objective of this study was to test interobserver reliability when classifying fractures by consensus by AO types and groups among a large international group of surgeons. Secondarily, we assessed the difference in inter- and intraobserver agreement of the AO classification in relation to geographical location, level of training, and subspecialty. Methods A randomized set of radiographic and computed tomographic images from a consecutive series of 96 distal radius fractures (DRFs), treated between October 2010 and April 2013, was classified using an electronic web-based portal by an invited group of participants on two occasions. Results Interobserver reliability was substantial when classifying AO type A fractures but fair and moderate for type B and C fractures, respectively. No difference was observed by location, except for an apparent difference between participants from India and Australia classifying type B fractures. No statistically significant associations were observed comparing interobserver agreement by level of training and no differences were shown comparing subspecialties. Intra-rater reproducibility was “substantial” for fracture types and “fair” for fracture groups with no difference accounting for location, training level, or specialty. Conclusion Improved definition of reliability and reproducibility of this classification may be achieved using large international groups of raters, empowering decision making on which system to utilize. Level of Evidence Level III PMID:28119795
Validity of an Observation Method for Assessing Pain Behavior in Individuals With Multiple Sclerosis

PubMed Central

Cook, Karon F.; Roddey, Toni S.; Bamer, Alyssa M.; Amtmann, Dagmar; Keefe, Francis J

2012-01-01

Context Pain is a common and complex experience for individuals who live with multiple sclerosis (MS) that interferes with physical, psychological and social function. A valid and reliable tool for quantifying observed pain behaviors in MS is critical to understanding how pain behaviors contribute to pain-related disability in this clinical population. Objectives To evaluate the reliability and validity of a pain behavioral observation protocol in individuals who have MS. Methods Community-dwelling volunteers with multiple sclerosis (N=30), back pain (N=5), or arthritis (N=8) were recruited based on clinician referrals, advertisements, fliers, web postings, and participation in previous research. Participants completed measures of pain severity, pain interference, and self-reported pain behaviors and were videotaped doing typical activities (e.g., walking, sitting). Two coders independently recorded frequencies of pain behaviors by category (e.g., guarding, bracing) and inter-rater reliability statistics were calculated. Naïve observers reviewed videotapes of individuals with MS and rated their pain. Spearman correlations were calculated between pain behavior frequencies and self-reported pain and pain ratings by naïve observers. Results Inter-rater reliability estimates indicated the reliability of pain codes in the MS sample. Kappa coefficients ranged from moderate agreement (sighing = 0.40) to substantial agreement (guarding = 0.83). These values were comparable to those obtained in the combined back pain and arthritis sample. Concurrent validity was supported by correlations with self-reported pain (0.46-0.53) and with self-reports of pain behaviors (0.58). Construct validity was supported by finding of 0.87 correlation between total pain behaviors observed by coders and mean pain ratings by naïve observers. Conclusion Results support use of the pain behavior observation protocol for assessing pain behaviors of individuals with MS. Valid assessments of pain behaviors of individuals with MS in could lead to creative interventions in the management of chronic pain in this population. PMID:23159684
Mixed methods evaluation of a quality improvement and audit tool for nurse-to-nurse bedside clinical handover in ward settings.

PubMed

Redley, Bernice; Waugh, Rachael

2018-04-01

Nurse bedside handover quality is influenced by complex interactions related to the content, processes used and the work environment. Audit tools are seldom tested in 'real' settings. Examine the reliability, validity and usability of a quality improvement tool for audit of nurse bedside handover. Naturalistic, descriptive, mixed-methods. Six inpatient wards at a single large not-for-profit private health service in Victoria, Australia. Five nurse experts and 104 nurses involved in 199 change-of-shift bedside handovers. A focus group with experts and pilot test were used to examine content and face validity, and usability of the handover audit tool. The tool was examined for inter-rater reliability and usability using observation audits of handovers across six wards. Data were collected in 2013-2014. Two independent observers for 72 audits demonstrated acceptable inter-observer agreement for 27 (77%) items. Reliability was weak for items examining the handover environment. Seventeen items were not observed reflecting gaps in practices. Across 199 observation audits, gaps in nurse bedside handover practice most often related to process and environment, rather than content items. Usability was impacted by high observer burden, familiarity and non-specific illustrative behaviours. The reliability and validity of most items to audit handover content was acceptable. Gaps in practices for process and environment items were identified. Context specific exemplars and reducing the items used at each handover audit can enhance usability. Further research is needed to develop context specific exemplars and undertake additional reliability testing using a wide range of handover settings. CONTRIBUTION OF THE PAPER. Copyright © 2017 Elsevier Inc. All rights reserved.

Validation of Observations Obtained with a Liquid Mirror Telescope by Comparison with Sloan Digital Sky Survey Observations

NASA Astrophysics Data System (ADS)

Borra, E. F.

2015-06-01

The results of a search for peculiar astronomical objects using very low resolution spectra obtained with the NASA Orbital Debris Observatory (NODO) 3 m diameter liquid mirror telescope (LMT) are compared with results of spectra obtained with the Sloan Digital Sky Survey (SDSS). The main purpose of this comparison is to verify whether observations taken with this novel type of telescope are reliable. This comparison is important because LMTs are an inexpensive novel type of telescope that is very useful for astronomical surveys, particularly surveys in the time domain, and validation of the data taken with an LMT by comparison with data from a classical telescope will validate their reliability. We start from a published data analysis that classified as peculiar only 206 of the 18,000 astronomical objects observed with the NODO LMT. A total of 29 of these 206 objects were found in the SDSS. The reliability of the NODO data can be seen through the results of the detailed analysis that, in practice, incorrectly identified less than 0.3% of the 18,000 spectra as peculiar objects, most likely because they are variable stars. We conclude that the LMT gave reliable observations, comparable to those that would have been obtained with a telescope using a glass mirror.
A fatigue resistance test for elderly persons based on grip strength: reliability and comparison with healthy young subjects.

PubMed

Bautmans, Ivan; Mets, Tony

2005-06-01

Although a wide variety of protocols are available for evaluating skeletal muscle fatigue resistance, they often necessitate important technological resources or are too complicated for elderly subjects. We present here a new test, designed for elderly persons, based on maintaining maximal voluntary grip strength as long as possible. The aim of the study was to determine the reliability of this test procedure in hospitalized geriatric patients and in young healthy persons. Fatigue resistance was considered as the time in which grip strength decreases to 50% of its maximum value. Twenty geriatric, hospitalized patients (age 83 +/- 6 yrs) and thirty-nine young, healthy persons (age 23 +/- 4 yrs) were evaluated for fatigue resistance by two different observers. Height, weight and body mass index were determined for each participant and the current amount of sports activity was recorded in the young subjects. All participants were able to perform the test. Inter- and intra-rater reliability in both subgroups was good to excellent, with ICC(3,1) values ranging from 0.77 to 0.94. No significant differences in inter- and intra-rater measurements were found, except for inter-observer evaluations of the dominant hand in hospitalized geriatric patients. No significant relationships were found between fatigue resistance and maximal grip strength, anthropometrics or gender. The proposed fatigue resistance test is a reliable tool to evaluate geriatric hospitalized patients as well as young, active and healthy persons. Fatigue resistance scores are not related to gender, maximal strength or anthropometrics within the observed subgroups.
Reliability of the measures of weight-bearing distribution obtained during quiet stance by digital scales in subjects with and without hemiparesis.

PubMed

de Araujo-Barbosa, Paulo Henrique Ferreira; de Menezes, Lidiane Teles; Costa, Abraão Souza; Couto Paz, Clarissa Cardoso Dos Santos; Fachin-Martins, Emerson

2015-05-01

Described as an alternative way of assessing weight-bearing asymmetries, the measures obtained from digital scales have been used as an index to classify weight-bearing distribution. This study aimed to describe the intra-test and the test/retest reliability of measures in subjects with and without hemiparesis during quiet stance. The percentage of body weight borne by one limb was calculated for a sample of subjects with hemiparesis and for a control group that was matched by gender and age. A two-way analysis of variance was used to verify the intra-test reliability. This analysis was calculated using the differences between the averages of the measures obtained during single, double or triple trials. The intra-class correlation coefficient (ICC) was utilized and data plotted using the Bland-Altman method. The intra-test analysis showed significant differences, only observed in the hemiparesis group, between the measures obtained by single and triple trials. Excellent and moderate ICC values (0.69-0.84) between test and retest were observed in the hemiparesis group, while for control groups ICC values (0.41-0.74) were classified as moderate, progressing from almost poor for measures obtained by a single trial to almost excellent for those obtained by triple trials. In conclusion, good reliability ranging from moderate to excellent classifications was found for participants with and without hemiparesis. Moreover, an improvement of the repeatability was observed with fewer trials for participants with hemiparesis, and with more trials for participants without hemiparesis.
Measurement properties of existing clinical assessment methods evaluating scapular positioning and function. A systematic review.

PubMed

Larsen, Camilla Marie; Juul-Kristensen, Birgit; Lund, Hans; Søgaard, Karen

2014-10-01

The aims were to compile a schematic overview of clinical scapular assessment methods and critically appraise the methodological quality of the involved studies. A systematic, computer-assisted literature search using Medline, CINAHL, SportDiscus and EMBASE was performed from inception to October 2013. Reference lists in articles were also screened for publications. From 50 articles, 54 method names were identified and categorized into three groups: (1) Static positioning assessment (n = 19); (2) Semi-dynamic (n = 13); and (3) Dynamic functional assessment (n = 22). Fifteen studies were excluded for evaluation due to no/few clinimetric results, leaving 35 studies for evaluation. Graded according to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN checklist), the methodological quality in the reliability and validity domains was "fair" (57%) to "poor" (43%), with only one study rated as "good". The reliability domain was most often investigated. Few of the assessment methods in the included studies that had "fair" or "good" measurement property ratings demonstrated acceptable results for both reliability and validity. We found a substantially larger number of clinical scapular assessment methods than previously reported. Using the COSMIN checklist the methodological quality of the included measurement properties in the reliability and validity domains were in general "fair" to "poor". None were examined for all three domains: (1) reliability; (2) validity; and (3) responsiveness. Observational evaluation systems and assessment of scapular upward rotation seem suitably evidence-based for clinical use. Future studies should test and improve the clinimetric properties, and especially diagnostic accuracy and responsiveness, to increase utility for clinical practice.
Measurement of patient safety: a systematic review of the reliability and validity of adverse event detection with record review

PubMed Central

Hanskamp-Sebregts, Mirelle; Zegers, Marieke; Vincent, Charles; van Gurp, Petra J; de Vet, Henrica C W; Wollersheim, Hub

2016-01-01

Objectives Record review is the most used method to quantify patient safety. We systematically reviewed the reliability and validity of adverse event detection with record review. Design A systematic review of the literature. Methods We searched PubMed, EMBASE, CINAHL, PsycINFO and the Cochrane Library and from their inception through February 2015. We included all studies that aimed to describe the reliability and/or validity of record review. Two reviewers conducted data extraction. We pooled κ values (κ) and analysed the differences in subgroups according to number of reviewers, reviewer experience and training level, adjusted for the prevalence of adverse events. Results In 25 studies, the psychometric data of the Global Trigger Tool (GTT) and the Harvard Medical Practice Study (HMPS) were reported and 24 studies were included for statistical pooling. The inter-rater reliability of the GTT and HMPS showed a pooled κ of 0.65 and 0.55, respectively. The inter-rater agreement was statistically significantly higher when the group of reviewers within a study consisted of a maximum five reviewers. We found no studies reporting on the validity of the GTT and HMPS. Conclusions The reliability of record review is moderate to substantial and improved when a small group of reviewers carried out record review. The validity of the record review method has never been evaluated, while clinical data registries, autopsy or direct observations of patient care are potential reference methods that can be used to test concurrent validity. PMID:27550650
Clinical photographic observation of plantar corns and callus associated with a nominal scale classification and inter- observer reliability study in a student population.

PubMed

Tollafield, David R

2017-01-01

The management of plantar corns and callus has a low cost-benefit with reduced prioritisation in healthcare. The distinction between types of keratin lesions that forms corns and callus has attracted limited interest. Observation is imperative to improving diagnostic predictions and a number of studies point to some confusion as to how best to achieve this. The use of photographic observation has been proposed to improve our understanding of intractable keratin lesions. Students from a podiatry school reviewed photographs where plantar keratin lesions were divided into four nominal groups; light callus (Grade 1), heavy defined callus (Grade 2), concentric keratin plugs (Grade 3) and callus with deeper density changes under the forefoot (Grade 4). A group of 'experts' assigned from qualified podiatrists validated the observer rated responses by the students. Cohen's weighted statistic (k) was used to measure inter-observer reliability. First year students (unskilled) performed less well when viewing photographs ( k = 0.33) compared to third year students (semi-skilled, k = 0.62). The experts performed better than students ( k = 0.88) providing consistency with wound care models in other studies. Improved clinical annotation of clinical features, supported by classification of keratin- based lesions, combined with patient outcome tools, could improve the scientific rationale to prioritise patient care. Problems associated with photographic assessment involves trying to differentiate similar lesions without the benefit of direct palpation. Direct observation of callus with and without debridement requires further investigation alongside the model proposed in this paper.
Health Service Quality Scale: Brazilian Portuguese translation, reliability and validity.

PubMed

Rocha, Luiz Roberto Martins; Veiga, Daniela Francescato; e Oliveira, Paulo Rocha; Song, Elaine Horibe; Ferreira, Lydia Masako

2013-01-17

The Health Service Quality Scale is a multidimensional hierarchical scale that is based on interdisciplinary approach. This instrument was specifically created for measuring health service quality based on marketing and health care concepts. The aim of this study was to translate and culturally adapt the Health Service Quality Scale into Brazilian Portuguese and to assess the validity and reliability of the Brazilian Portuguese version of the instrument. We conducted a cross-sectional, observational study, with public health system patients in a Brazilian university hospital. Validity was assessed using Pearson's correlation coefficient to measure the strength of the association between the Brazilian Portuguese version of the instrument and the SERVQUAL scale. Internal consistency was evaluated using Cronbach's alpha coefficient; the intraclass (ICC) and Pearson's correlation coefficients were used for test-retest reliability. One hundred and sixteen consecutive postoperative patients completed the questionnaire. Pearson's correlation coefficient for validity was 0.20. Cronbach's alpha for the first and second administrations of the final version of the instrument were 0.982 and 0.986, respectively. For test-retest reliability, Pearson's correlation coefficient was 0.89 and ICC was 0.90. The culturally adapted, Brazilian Portuguese version of the Health Service Quality Scale is a valid and reliable instrument to measure health service quality.
TEST-RETEST RELIABILITY OF THE CLOSED KINETIC CHAIN UPPER EXTREMITY STABILITY TEST (CKCUEST) IN ADOLESCENTS: RELIABILITY OF CKCUEST IN ADOLESCENTS.

PubMed

de Oliveira, Valéria M A; Pitangui, Ana C R; Nascimento, Vinícius Y S; da Silva, Hítalo A; Dos Passos, Muana H P; de Araújo, Rodrigo C

2017-02-01

The Closed Kinetic Chain Upper Extremity Stability Test (CKCUEST) has been proposed as an option to assess upper limb function and stability; however, there are few studies that support the use of this test in adolescents. The purpose of the present study was to investigate the intersession reliability and agreement of three CKCUEST scores in adolescents and establish clinimetric values for this test. Test-retest reliability. Twenty-five healthy adolescents of both sexes were evaluated. The subjects performed two CKCUEST with an interval of one week between the tests. An intraclass correlation coefficient (ICC 3,3 ) two-way mixed model with a 95% interval of confidence was utilized to determine intersession reliability. A Bland-Altman graph was plotted to analyze the agreement between assessments. The presence of systematic error was evaluated by a one-sample t test. The difference between the evaluation and reevaluation was observed using a paired-sample t test. The level of significance was set at 0.05. Standard error of measurements and minimum detectable changes were calculated. The intersession reliability of the average touches score, normalized score, and power score were 0.68, 0.68 and 0.87, the standard error of measurement were 2.17, 1.35 and 6.49, and the minimal detectable change was 6.01, 3.74 and 17.98, respectively. The presence of systematic error (p < 0.014), the significant difference between the measurements (p < 0.05), and the analysis of the Bland-Altman graph infer that CKCUEST is a discordant test with moderate to excellent reliability when used with adolescents. The CKCUEST is a measurement with moderate to excellent reliability for adolescents. 2b.
Reliability and agreement in student ratings of the class environment.

PubMed

Nelson, Peter M; Christ, Theodore J

2016-09-01

The current study estimated the reliability and agreement of student ratings of the classroom environment obtained using the Responsive Environmental Assessment for Classroom Teaching (REACT; Christ, Nelson, & Demers, 2012; Nelson, Demers, & Christ, 2014). Coefficient alpha, class-level reliability, and class agreement indices were evaluated as each index provides important information for different interpretations and uses of student rating scale data. Data for 84 classes across 29 teachers in a suburban middle school were sampled to derive reliability and agreement indices for the REACT subscales across 4 class sizes: 25, 20, 15, and 10. All participating teachers were White and a larger number of 6th-grade classes were included (42%) relative to 7th- (33%) or 8th- (23%) grade classes. Teachers were responsible for a variety of content areas, including language arts (26%), science (26%), math (20%), social studies (19%), communications (6%), and Spanish (3%). Coefficient alpha estimates were generally high across all subscales and class sizes (α = .70-.95); class-mean estimates were greatly impacted by the number of students sampled from each class, with class-level reliability values generally falling below .70 when class size was reduced from 25 to 20. Further, within-class student agreement varied widely across the REACT subscales (mean agreement = .41-.80). Although coefficient alpha and test-retest reliability are commonly reported in research with student rating scales, class-level reliability and agreement are not. The observed differences across coefficient alpha, class-level reliability, and agreement indices provide evidence for evaluating students' ratings of the class environment according to their intended use (e.g., differentiating between classes, class-level instructional decisions). (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Functional gait assessment and balance evaluation system test: reliability, validity, sensitivity, and specificity for identifying individuals with Parkinson disease who fall.

PubMed

Leddy, Abigail L; Crowner, Beth E; Earhart, Gammon M

2011-01-01

Gait impairments, balance impairments, and falls are prevalent in individuals with Parkinson disease (PD). Although the Berg Balance Scale (BBS) can be considered the reference standard for the determination of fall risk, it has a noted ceiling effect. Development of ceiling-free measures that can assess balance and are good at discriminating "fallers" from "nonfallers" is needed. The purpose of this study was to compare the Functional Gait Assessment (FGA) and the Balance Evaluation Systems Test (BESTest) with the BBS among individuals with PD and evaluate the tests' reliability, validity, and discriminatory sensitivity and specificity for fallers versus nonfallers. This was an observational study of community-dwelling individuals with idiopathic PD. The BBS, FGA, and BESTest were administered to 80 individuals with PD. Interrater reliability (n=15) was assessed by 3 raters. Test-retest reliability was based on 2 tests of participants (n=24), 2 weeks apart. Intraclass correlation coefficients (2,1) were used to calculate reliability, and Spearman correlation coefficients were used to assess validity. Cutoff points, sensitivity, and specificity were based on receiver operating characteristic plots. Test-retest reliability was .80 for the BBS, .91 for the FGA, and .88 for the BESTest. Interrater reliability was greater than .93 for all 3 tests. The FGA and BESTest were correlated with the BBS (r=.78 and r=.87, respectively). Cutoff scores to identify fallers were 47/56 for the BBS, 15/30 for the FGA, and 69% for the BESTest. The overall accuracy (area under the curve) for the BBS, FGA, and BESTest was .79, .80, and .85, respectively. Fall reports were retrospective. Both the FGA and the BESTest have reliability and validity for assessing balance in individuals with PD. The BESTest is most sensitive for identifying fallers.
Cost-effective solutions to maintaining smart grid reliability

NASA Astrophysics Data System (ADS)

Qin, Qiu

As the aging power systems are increasingly working closer to the capacity and thermal limits, maintaining an sufficient reliability has been of great concern to the government agency, utility companies and users. This dissertation focuses on improving the reliability of transmission and distribution systems. Based on the wide area measurements, multiple model algorithms are developed to diagnose transmission line three-phase short to ground faults in the presence of protection misoperations. The multiple model algorithms utilize the electric network dynamics to provide prompt and reliable diagnosis outcomes. Computational complexity of the diagnosis algorithm is reduced by using a two-step heuristic. The multiple model algorithm is incorporated into a hybrid simulation framework, which consist of both continuous state simulation and discrete event simulation, to study the operation of transmission systems. With hybrid simulation, line switching strategy for enhancing the tolerance to protection misoperations is studied based on the concept of security index, which involves the faulted mode probability and stability coverage. Local measurements are used to track the generator state and faulty mode probabilities are calculated in the multiple model algorithms. FACTS devices are considered as controllers for the transmission system. The placement of FACTS devices into power systems is investigated with a criterion of maintaining a prescribed level of control reconfigurability. Control reconfigurability measures the small signal combined controllability and observability of a power system with an additional requirement on fault tolerance. For the distribution systems, a hierarchical framework, including a high level recloser allocation scheme and a low level recloser placement scheme, is presented. The impacts of recloser placement on the reliability indices is analyzed. Evaluation of reliability indices in the placement process is carried out via discrete event simulation. The reliability requirements are described with probabilities and evaluated from the empirical distributions of reliability indices.
Assessing Attachment Security With the Attachment Q Sort: Meta-Analytic Evidence for the Validity of the Observer AQS

ERIC Educational Resources Information Center

van I Jzendoorn,Marinus H.; Vereijken, Carolus M.J.L.; Bakermans-Kranenburg, Marian J.; Riksen-Walraven, Marianne J.

2004-01-01

The reliability and validity of the Attachment Q Sort (AQS; Waters & Deane, 1985) was tested in a series of meta-analyses on 139 studies with 13,835 children. The observer AQS security score showed convergent validity with Strange Situation procedure (SSP) security (r=31) and excellent predictive validity with sensitivity measures (r=39). Its…
Relationship between the alpha and beta angles in diagnosing CAM-type femoroacetabular impingement on frog-leg lateral radiographs.

PubMed

Khan, Moin; Ranawat, Anil; Williams, Dale; Gandhi, Rajiv; Choudur, Hema; Parasu, Naveen; Simunovic, Nicole; Ayeni, Olufemi R

2015-09-01

Alpha and beta angles are commonly used radiographic measures to assess the sphericity of the proximal femur and distance between the pathologic head-neck junction and the acetabular rim, respectively. The aim of this study was to explore the relationship between these two measurements on frog-leg lateral hip radiographs. Fifty frog-leg lateral hip radiographs were evaluated by two orthopaedic surgeons and two radiologists. Each reviewer measured the alpha and beta angles on two separate occasions to determine the relationship between positive alpha and beta angles and the inter- and intra-observer reliability of these measurements. There was no significant association between positive alpha and beta angles, [kappa range -0.043 (95 % CI -0.17 to 0.086) to 0.54 (95 % CI 0.33-0.75)]. Intra-observer reliability was high [alpha angle intra-class correlation coefficient (ICC) range 0.74 (95 % CI 0.58-0.84) to 0.99 (95 % CI 0.98-0.99) and beta angle ICC range 0.86 (95 % CI 0.76-0.92) to 0.97 (95 % CI 0.95-0.98)]. There is no statistical or functional relationship between readings of positive alpha and beta angles. The radiographic measurements resulted in high intra-observer and fair-to-moderate inter-observer reliability. Results of this study suggest that the presence of a CAM lesion on lateral radiographs as suggested by a positive alpha angle does not necessitate a decrease in clearance between the femoral head and acetabular rim as measured by the beta angle and thus may not be the best measure of functional impingement. Understanding the relationship between these two aspects of femoroacetabular impingement improves a surgeon's ability to anticipate potential operative management.
Validity and reliability of Arabic version of the ID Pain screening questionnaire in the assessment of neuropathic pain.

PubMed

Abu-Shaheen, Amani; Yousef, Shehu; Riaz, Muhammad; Nofal, Abdullah; Khan, Sarfaraz; Heena, Humariya

2018-01-01

Diagnosis of neuropathic pain (NP) can be challenging. The ID Pain (ID-P) questionnaire, a screening tool for NP, has been used widely both in the original version and translated forms. The aim of this study was to develop an Arabic version of ID-P and assess its validity and reliability in detecting neuropathic pain. The original ID-P was translated in Arabic language and administered to the study population. Reliability of the Arabic version was evaluated by percentage observed agreement, and Cohen's kappa; and validity by sensitivity, specificity, correctly classified, and receiver operating characteristic (ROC) curve. Physician diagnosis was considered as the gold standard for comparing the diagnostic accuracy. The study included 375 adult patients (153 [40.8%] with NP; 222 [59.2%] with nociceptive pain). Overall observed percentage agreement and Cohen's kappa were >90% and >0.80, respectively. Median (range) score of ID-P scale was 3 (2-4) and 1 (0-2) in the NP group and NocP group, respectively (p<0.001). Area under the ROC curve was 0.808 (95% CI, 0.764-0.851). For the cut-off value of ≥2, sensitivity was 84.3%, specificity was 66.7%, and correct classification was 73.9%. Thus, the Arabic version of ID-P showed moderate reliability and validity as a pain assessment tool. This article presents the psychometric properties of the Arabic version of ID Pain questionnaire. This Arabic version may serve as a simple yet important screening tool, and help in appropriate management of neuropathic pain, specifically in primary care centers in the Kingdom of Saudi Arabia.
Stability of measures from children's interviews: the effects of time, sample length, and topic.

PubMed

Heilmann, John; DeBrock, Lindsay; Riley-Tillman, T Chris

2013-08-01

The purpose of this study was to examine the reliability of, and sources of variability in, language measures from interviews collected from young school-age children. Two 10-min interviews were collected from 20 at-risk kindergarten children by an examiner using a standardized set of questions. Test-retest reliability coefficients were calculated for 8 language measures. Generalizability theory (G-theory) analyses were completed to document the variability introduced into the measures from the child, session, sample length, and topic. Significant and strong reliability correlation coefficients were observed for most of the language sample measures. The G-theory analyses revealed that most of the variance in the language measures was attributed to the child. Session, sample length, and topic accounted for negligible amounts of variance in most of the language measures. Measures from interviews were reliable across sessions, and the sample length and topic did not have a substantial impact on the reliability of the language measures. Implications regarding the clinical feasibility of language sample analysis for assessment and progress monitoring are discussed.
Skin colour assessment of replanted fingers in digital images and its reliability for the incorporation of images in nursing progress notes.

PubMed

Terashima, Taiko; Yoshimura, Sadako

2018-03-01

To determine whether nurses can accurately assess the skin colour of replanted fingers displayed as digital images on a computer screen. Colour measurement and clinical diagnostic methods for medical digital images have been studied, but reproducing skin colour on a computer screen remains difficult. The inter-rater reliability of skin colour assessment scores was evaluated. In May 2014, 21 nurses who worked on a trauma ward in Japan participated in testing. Six digital images with different skin colours were used. Colours were scored from both digital images and direct patient's observation. The score from a digital image was defined as the test score, and its difference from the direct assessment score as the difference score. Intraclass correlation coefficients were calculated. Nurses' opinions were classified and summarised. The intraclass correlation coefficients for the test scores were fair. Although the intraclass correlation coefficients for the difference scores were poor, they improved to good when three images that might have contributed to poor reliability were excluded. Most nurses stated that it is difficult to assess skin colour in digital images; they did not think it could be a substitute for direct visual assessment. However, most nurses were in favour of including images in nursing progress notes. Although the inter-rater reliability was fairly high, the reliability of colour reproduction in digital images as indicated by the difference scores was poor. Nevertheless, nurses expect the incorporation of digital images in nursing progress notes to be useful. This gap between the reliability of digital colour reproduction and nurses' expectations towards it must be addressed. High inter-rater reliability for digital images in nursing progress notes was not observed. Assessments of future improvements in colour reproduction technologies are required. Further digitisation and visualisation of nursing records might pose challenges. © 2017 John Wiley & Sons Ltd.
Impact of Isothermal Aging on Long-Term Reliability of Fine-Pitch Ball Grid Array Packages with Sn-Ag-Cu Solder Interconnects: Surface Finish Effects

NASA Astrophysics Data System (ADS)

Lee, Tae-Kyu; Ma, Hongtao; Liu, Kuo-Chuan; Xue, Jie

2010-12-01

The interaction between isothermal aging and the long-term reliability of fine-pitch ball grid array (BGA) packages with Sn-3.0Ag-0.5Cu (wt.%) solder ball interconnects was investigated. In this study, 0.4-mm fine-pitch packages with 300- μm-diameter Sn-Ag-Cu solder balls were used. Two different package substrate surface finishes were selected to compare their effects on the final solder composition, especially the effect of Ni, during thermal cycling. To study the impact on thermal performance and long-term reliability, samples were isothermally aged and thermally cycled from 0°C to 100°C with 10 min dwell time. Based on Weibull plots for each aging condition, package lifetime was reduced by approximately 44% by aging at 150°C. Aging at 100°C showed a smaller impact but similar trend. The microstructure evolution was observed during thermal aging and thermal cycling with different phase microstructure transformations between electrolytic Ni/Au and organic solderability preservative (OSP) surface finishes, focusing on the microstructure evolution near the package-side interface. Different mechanisms after aging at various conditions were observed, and their impacts on the fatigue lifetime of solder joints are discussed.
A French adaptation of the Overt Behaviour Scale (OBS) measuring challenging behaviours following acquired brain injury: The Échelle des comportements observables (ÉCO).

PubMed

Gagnon, Jean; Simpson, Grahame Kenneth; Kelly, Glenn; Godbout, Denis; Ouellette, Michel; Drolet, Jacques

2016-01-01

To develop a French version of the Overt Behaviour Scale (OBS) and examine some of its psychometric properties. The scale was adapted and validated according to standard guidelines for cross-cultural adaptation of questionnaires (Échelle des comportements observables; ÉCO). The reliability and construct validity of the ÉCO were studied among 29 inpatients and outpatients who sustained an acquired brain injury. The instruments were administered by 12 clinicians located at eight rehabilitation centres and the local brain injury association. The ÉCO provided behaviour profile descriptives much like the original scale. It showed excellent reliability and good convergent and divergent validity, as reflected by significant associations with other measures that contained similar behavioural items and by the absence of signification correlations with broader constructs such as physical and cognitive abilities. This study provides evidence that the ÉCO behaves much like the original OBS, has promising initial findings with respect to reliability and validity and is a valuable research and clinical instrument to assess the severity and typology of challenging behaviour after an acquired brain injury and to monitor the evolution of behaviours after intervention in French and bilingual communities.
The Social Science Observation Record: A Guide for Pre-service and In-service Teachers Participating in Microteaching.

ERIC Educational Resources Information Center

Casteel, J. Doyle; Stahl, Robert J.

Systematic and reliable feedback are critical elements of microteaching. One system whereby pre-service and in-service teachers may obtain systematic and reliable feedback during microteaching is called the Social Science Observation Record (SSOR). This monograph is intended to meet three purposes: (1) To explain the SSOR as a verbal system for…
Validity and Reliability of the Upper Extremity Work Demands Scale.

PubMed

Jacobs, Nora W; Berduszek, Redmar J; Dijkstra, Pieter U; van der Sluis, Corry K

2017-12-01

Purpose To evaluate validity and reliability of the upper extremity work demands (UEWD) scale. Methods Participants from different levels of physical work demands, based on the Dictionary of Occupational Titles categories, were included. A historical database of 74 workers was added for factor analysis. Criterion validity was evaluated by comparing observed and self-reported UEWD scores. To assess structural validity, a factor analysis was executed. For reliability, the difference between two self-reported UEWD scores, the smallest detectable change (SDC), test-retest reliability and internal consistency were determined. Results Fifty-four participants were observed at work and 51 of them filled in the UEWD twice with a mean interval of 16.6 days (SD 3.3, range = 10-25 days). Criterion validity of the UEWD scale was moderate (r = .44, p = .001). Factor analysis revealed that 'force and posture' and 'repetition' subscales could be distinguished with Cronbach's alpha of .79 and .84, respectively. Reliability was good; there was no significant difference between repeated measurements. An SDC of 5.0 was found. Test-retest reliability was good (intraclass correlation coefficient for agreement = .84) and all item-total correlations were >.30. There were two pairs of highly related items. Conclusion Reliability of the UEWD scale was good, but criterion validity was moderate. Based on current results, a modified UEWD scale (2 items removed, 1 item reworded, divided into 2 subscales) was proposed. Since observation appeared to be an inappropriate gold standard, we advise to investigate other types of validity, such as construct validity, in further research.

Reliability of Health-Related Physical Fitness Tests among Colombian Children and Adolescents: The FUPRECOL Study.

PubMed

Ramírez-Vélez, Robinson; Rodrigues-Bezerra, Diogo; Correa-Bautista, Jorge Enrique; Izquierdo, Mikel; Lobelo, Felipe

2015-01-01

Substantial evidence indicates that youth physical fitness levels are an important marker of lifestyle and cardio-metabolic health profiles and predict future risk of chronic diseases. The reliability physical fitness tests have not been explored in Latino-American youth population. This study's aim was to examine the reliability of health-related physical fitness tests that were used in the Colombian health promotion "Fuprecol study". Participants were 229 Colombian youth (boys n = 124 and girls n = 105) aged 9 to 17.9 years old. Five components of health-related physical fitness were measured: 1) morphological component: height, weight, body mass index (BMI), waist circumference, triceps skinfold, subscapular skinfold, and body fat (%) via impedance; 2) musculoskeletal component: handgrip and standing long jump test; 3) motor component: speed/agility test (4x10 m shuttle run); 4) flexibility component (hamstring and lumbar extensibility, sit-and-reach test); 5) cardiorespiratory component: 20-meter shuttle-run test (SRT) to estimate maximal oxygen consumption. The tests were performed two times, 1 week apart on the same day of the week, except for the SRT which was performed only once. Intra-observer technical errors of measurement (TEMs) and inter-rater (reliability) were assessed in the morphological component. Reliability for the Musculoskeletal, motor and cardiorespiratory fitness components was examined using Bland-Altman tests. For the morphological component, TEMs were small and reliability was greater than 95% of all cases. For the musculoskeletal, motor, flexibility and cardiorespiratory components, we found adequate reliability patterns in terms of systematic errors (bias) and random error (95% limits of agreement). When the fitness assessments were performed twice, the systematic error was nearly 0 for all tests, except for the sit and reach (mean difference: -1.03% [95% CI = -4.35% to -2.28%]. The results from this study indicate that the "Fuprecol study" health-related physical fitness battery, administered by physical education teachers, was reliable for measuring health-related components of fitness in children and adolescents aged 9-17.9 years old in a school setting in Colombia.
Assessing the surgical skills of trainees in the operating theatre: a prospective observational study of the methodology.

PubMed

Beard, J D; Marriott, J; Purdie, H; Crossley, J

2011-01-01

To compare user satisfaction and acceptability, reliability and validity of three different methods of assessing the surgical skills of trainees by direct observation in the operating theatre across a range of different surgical specialties and index procedures. A 2-year prospective, observational study in the operating theatres of three teaching hospitals in Sheffield. The assessment methods were procedure-based assessment (PBA), Objective Structured Assessment of Technical Skills (OSATS) and Non-technical Skills for Surgeons (NOTSS). The specialties were obstetrics and gynaecology (O&G) and upper gastrointestinal, colorectal, cardiac, vascular and orthopaedic surgery. Two to four typical index procedures were selected from each specialty. Surgical trainees were directly observed performing typical index procedures and assessed using a combination of two of the three methods (OSATS or PBA and NOTSS for O&G, PBA and NOTSS for the other specialties) by the consultant clinical supervisor for the case and the anaesthetist and/or scrub nurse, as well as one or more independent assessors from the research team. Information on user satisfaction and acceptability of each assessment method from both assessor and trainee perspectives was obtained from structured questionnaires. The reliability of each method was measured using generalisability theory. Aspects of validity included the internal structure of each tool and correlation between tools, construct validity, predictive validity, interprocedural differences, the effect of assessor designation and the effect of assessment on performance. Of the 558 patients who were consented, a total of 437 (78%) cases were included in the study: 51 consultant clinical supervisors, 56 anaesthetists, 39 nurses, 2 surgical care practitioners and 4 independent assessors provided 1635 assessments on 85 trainees undertaking the 437 cases. A total of 749 PBAs, 695 NOTSS and 191 OSATSs were performed. Non-O&G clinical supervisors and trainees provided mixed, but predominantly positive, responses about a range of applications of PBA. Most felt that PBA was important in surgical education, and would use it again in the future and did not feel that it added time to the operating list. The overall satisfaction of O&G clinical supervisors and trainees with OSATS was not as high, and a majority of those who used both preferred PBA. A majority of anaesthetists and nurses felt that NOTSS allowed them to rate interpersonal skills (communication, teamwork and leadership) more easily than cognitive skills (situation awareness and decision-making), that it had formative value and that it was a valuable adjunct to the assessment of technical skills. PBA demonstrated high reliability (G > 0.8 for only three assessor judgements on the same index procedure). OSATS had lower reliability (G > 0.8 for five assessor judgements on the same index procedure). Both were less reliable on a mix of procedures because of strong procedure-specific factors. A direct comparison of PBA between O&G and non-O&G cases showed a striking difference in reliability. Within O&G, a good level of reliability (G > 0.8) could not be obtained using a feasible number of assessments. Conversely, the reliability within non-O&G cases was exceptionally high, with only two assessor judgements being required. The reasons for this difference probably include the more summative purpose of assessment in O&G and the much higher proportion of O&G trainees in this study with training concerns (42% vs 4%). The reliability of NOTSS was lower than that for PBA. Reliability for the same procedure (G > 0.8) required six assessor judgements. However, as procedure-specific factors exerted a lesser influence on NOTSS, reliability on a mix of procedures could be achieved using only eight assessor judgements. NOTSS also demonstrated a valid internal structure. The strongest correlations between NOTSS and PBA or OSATS were in the 'decision-making' domain. PBA and NOTSS showed better construct validity than OSATS, the year of training and the number of recent index procedures performed being significant independent predictors of performance. There was little variation in scoring between different procedures or different designations of assessor. The results suggest that PBA is a reliable and acceptable method of assessing surgical skills, with good construct validity. Specialties that use OSATS may wish to consider changing the design or switching to PBA. Whatever workplace-based assessment method is used, the purpose, timing and frequency of assessment require detailed guidance. NOTSS is a promising tool for the assessment of non-technical skills, and surgical specialties may wish to consider its inclusion in their assessment framework. Further research is required into the use of health-care professionals other than consultant surgeons to assess trainees, the relationship between performance and experience, the educational impact of assessment and the additional value of video recording.
Chapter 3: Photovoltaic Module Stability and Reliability

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jordan, Dirk; Kurtz, Sarah

2017-01-01

Profits realized from investment in photovoltaic will benefit from decades of reliable operation. Service life prediction through accelerated tests is only possible if indoor tests duplicate power loss and failure modes observed in fielded systems. Therefore, detailing and quantifying power loss and failure modes is imperative. In the first section, we examine recent trends in degradation rates, the gradual power loss observed for different technologies, climates and other significant factors. In the second section, we provide a summary of the most commonly observed failure modes in fielded systems.
Reliability and Validity Assessment of a Linear Position Transducer

PubMed Central

Garnacho-Castaño, Manuel V.; López-Lastra, Silvia; Maté-Muñoz, José L.

2015-01-01

The objectives of the study were to determine the validity and reliability of peak velocity (PV), average velocity (AV), peak power (PP) and average power (AP) measurements were made using a linear position transducer. Validity was assessed by comparing measurements simultaneously obtained using the Tendo Weightlifting Analyzer Systemi and T-Force Dynamic Measurement Systemr (Ergotech, Murcia, Spain) during two resistance exercises, bench press (BP) and full back squat (BS), performed by 71 trained male subjects. For the reliability study, a further 32 men completed both lifts using the Tendo Weightlifting Analyzer Systemz in two identical testing sessions one week apart (session 1 vs. session 2). Intraclass correlation coefficients (ICCs) indicating the validity of the Tendo Weightlifting Analyzer Systemi were high, with values ranging from 0.853 to 0.989. Systematic biases and random errors were low to moderate for almost all variables, being higher in the case of PP (bias ±157.56 W; error ±131.84 W). Proportional biases were identified for almost all variables. Test-retest reliability was strong with ICCs ranging from 0.922 to 0.988. Reliability results also showed minimal systematic biases and random errors, which were only significant for PP (bias -19.19 W; error ±67.57 W). Only PV recorded in the BS showed no significant proportional bias. The Tendo Weightlifting Analyzer Systemi emerged as a reliable system for measuring movement velocity and estimating power in resistance exercises. The low biases and random errors observed here (mainly AV, AP) make this device a useful tool for monitoring resistance training. Key points This study determined the validity and reliability of peak velocity, average velocity, peak power and average power measurements made using a linear position transducer The Tendo Weight-lifting Analyzer Systemi emerged as a reliable system for measuring movement velocity and power. PMID:25729300
Children's Physical Activity While Gardening: Development of a Valid and Reliable Direct Observation Tool.

PubMed

Myers, Beth M; Wells, Nancy M

2015-04-01

Gardens are a promising intervention to promote physical activity (PA) and foster health. However, because of the unique characteristics of gardening, no extant tool can capture PA, postures, and motions that take place in a garden. The Physical Activity Research and Assessment tool for Garden Observation (PARAGON) was developed to assess children's PA levels, tasks, postures, and motions, associations, and interactions while gardening. PARAGON uses momentary time sampling in which a trained observer watches a focal child for 15 seconds and then records behavior for 15 seconds. Sixty-five children (38 girls, 27 boys) at 4 elementary schools in New York State were observed over 8 days. During the observation, children simultaneously wore Actigraph GT3X+ accelerometers. The overall interrater reliability was 88% agreement, and Ebel was .97. Percent agreement values for activity level (93%), garden tasks (93%), motions (80%), associations (95%), and interactions (91%) also met acceptable criteria. Validity was established by previously validated PA codes and by expected convergent validity with accelerometry. PARAGON is a valid and reliable observation tool for assessing children's PA in the context of gardening.
A model of scientific attitudes assessment by observation in physics learning based scientific approach: case study of dynamic fluid topic in high school

NASA Astrophysics Data System (ADS)

Yusliana Ekawati, Elvin

2017-01-01

This study aimed to produce a model of scientific attitude assessment in terms of the observations for physics learning based scientific approach (case study of dynamic fluid topic in high school). Development of instruments in this study adaptation of the Plomp model, the procedure includes the initial investigation, design, construction, testing, evaluation and revision. The test is done in Surakarta, so that the data obtained are analyzed using Aiken formula to determine the validity of the content of the instrument, Cronbach’s alpha to determine the reliability of the instrument, and construct validity using confirmatory factor analysis with LISREL 8.50 program. The results of this research were conceptual models, instruments and guidelines on scientific attitudes assessment by observation. The construct assessment instruments include components of curiosity, objectivity, suspended judgment, open-mindedness, honesty and perseverance. The construct validity of instruments has been qualified (rated load factor > 0.3). The reliability of the model is quite good with the Alpha value 0.899 (> 0.7). The test showed that the model fits the theoretical models are supported by empirical data, namely p-value 0.315 (≥ 0.05), RMSEA 0.027 (≤ 0.08)
Behavior States: Now You See Them, Now You Don't.

ERIC Educational Resources Information Center

Mudford, Oliver C.; Hogg, James; Roberts, Jessie

1999-01-01

A study attempted to replicate a previous study that presented reliability data from recordings of behavior state using a 13-category coding system. Replication was unsuccessful. Obtained mean percentage agreement on occurrence for individual behavior state and participants (n=34) ranged across observer pairs from 0 to 58 percent. (Contains 13…
Development of Multisensory Spatial Integration and Perception in Humans

ERIC Educational Resources Information Center

Neil, Patricia A.; Chee-Ruiter, Christine; Scheier, Christian; Lewkowicz, David J.; Shimojo, Shinsuke

2006-01-01

Previous studies have shown that adults respond faster and more reliably to bimodal compared to unimodal localization cues. The current study investigated for the first time the development of audiovisual (A-V) integration in spatial localization behavior in infants between 1 and 10 months of age. We observed infants' head and eye movements in…
Using Interval-Based Systems to Measure Behavior in Early Childhood Special Education and Early Intervention

ERIC Educational Resources Information Center

Lane, Justin D.; Ledford, Jennifer R.

2014-01-01

The purpose of this article is to summarize the current literature on the accuracy and reliability of interval systems using data from previously published experimental studies that used either human observations of behavior or computer simulations. Although multiple comparison studies provided mathematical adjustments or modifications to interval…
Reliable Control Using Disturbance Observer and Equivalent Transfer Function for Position Servo System in Current Feedback Loop Failure

NASA Astrophysics Data System (ADS)

Ishikawa, Kaoru; Nakamura, Taro; Osumi, Hisashi

A reliable control method is proposed for multiple loop control system. After a feedback loop failure, such as case of the sensor break down, the control system becomes unstable and has a big fluctuation even if it has a disturbance observer. To cope with this problem, the proposed method uses an equivalent transfer function (ETF) as active redundancy compensation after the loop failure. The ETF is designed so that it does not change the transfer function of the whole system before and after the loop failure. In this paper, the characteristic of reliable control system that uses an ETF and a disturbance observer is examined by the experiment that uses the DC servo motor for the current feedback loop failure in the position servo system.
In vitro and in vivo evaluations of three computer-aided shade matching instruments.

PubMed

Yuan, Kun; Sun, Xiang; Wang, Fu; Wang, Hui; Chen, Ji-hua

2012-01-01

This study evaluated the accuracy and reliability of three computer-aided shade matching instruments (Shadepilot, VITA Easyshade, and ShadeEye NCC) using both in vitro and in vivo models. The in vitro model included the measurement of five VITA Classical shade guides. The in vivo model utilized three instruments to measure the central region of the labial surface of maxillary right central incisors of 85 people. The accuracy and reliability of the three instruments in these two evaluating models were calculated. Significant differences were observed in the accuracy of instruments both in vitro and in vivo. No significant differences were found in the reliability of instruments between and within the in vitro and the in vivo groups. VITA Easyshade was significantly different in accuracy between in vitro and in vivo models, while no significant difference was found for the other two instruments. Shadepilot was the only instrument tested in the present study that showed high accuracy and reliability both in vitro and in vivo. Significant differences were observed in the L*a*b* values of the 85 natural teeth measured using three instruments in the in vivo assessment. The pair-agreement rates of shade matching among the three instruments ranged from 37.7% to 48.2%, and the incidence of identical shade results shared by all three instruments was 25.9%. As different L*a*b* values and shade matching results were reported for the same tooth, a combination of the evaluated shade matching instruments and visual shade confirmation is recommended for clinical use.
Inter-rater reliability of the German version of the Nurses' Global Assessment of Suicide Risk scale.

PubMed

Kozel, Bernd; Grieser, Manuela; Abderhalden, Christoph; Cutcliffe, John R

2016-10-01

In comparison to the general population, the suicide rates of psychiatric inpatient populations in Germany and Switzerland are very high. An important preventive contribution to the lowering of the suicide rates in mental health care is to ensure that the risk of suicide of psychiatric inpatients is assessed as accurately as possible. While risk-assessment instruments can serve an important function in determining such risk, very few have been translated to German. Therefore, in the present study, we reported on the German version of Nurses' Global Assessment of Suicide Risk (NGASR) scale. After translating the original instrument into German and pretesting the German version, we tested the inter-rater reliability of the instrument. Twelve video case studies were evaluated by 13 raters with the NGASR scale in a 'laboratory' trial. In each case, the observer's agreement was calculated for the single items, the overall scale, the risk levels, and the sum scores. The statistical data analysis was conducted with kappa and AC1 statistics for dichotomous (items, scale) scales. A high-to-very high observers' agreement (AC1: 0.62-1.00, kappa: 0.00-1.00) was determined for 16 items of the German version of the NGASR scale. We conclude that the German version of the NGASR scale is a reliable instrument for evaluating risk factors for suicide. A reliable application in the clinical practise appears to be enhanced by training in the use of the instrument and the right implementation instructions. © 2016 Australian College of Mental Health Nurses Inc.
Context, culture and (non-verbal) communication affect handover quality.

PubMed

Frankel, Richard M; Flanagan, Mindy; Ebright, Patricia; Bergman, Alicia; O'Brien, Colleen M; Franks, Zamal; Allen, Andrew; Harris, Angela; Saleem, Jason J

2012-12-01

Transfers of care, also known as handovers, remain a substantial patient safety risk. Although research on handovers has been done since the 1980s, the science is incomplete. Surprisingly few interventions have been rigorously evaluated and, of those that have, few have resulted in long-term positive change. Researchers, both in medicine and other high reliability industries, agree that face-to-face handovers are the most reliable. It is not clear, however, what the term face-to-face means in actual practice. We studied the use of non-verbal behaviours, including gesture, posture, bodily orientation, facial expression, eye contact and physical distance, in the delivery of information during face-to-face handovers. To address this question and study the role of non-verbal behaviour on the quality and accuracy of handovers, we videotaped 52 nursing, medicine and surgery handovers covering 238 patients. Videotapes were analysed using immersion/crystallisation methods of qualitative data analysis. A team of six researchers met weekly for 18 months to view videos together using a consensus-building approach. Consensus was achieved on verbal, non-verbal, and physical themes and patterns observed in the data. We observed four patterns of non-verbal behaviour (NVB) during handovers: (1) joint focus of attention; (2) 'the poker hand'; (3) parallel play and (4) kerbside consultation. In terms of safety, joint focus of attention was deemed to have the best potential for high quality and reliability; however, it occurred infrequently, creating opportunities for education and improvement. Attention to patterns of NVB in face-to-face handovers coupled with education and practice can improve quality and reliability.
Reliability of a rapid hematology stain for sputum cytology*

PubMed Central

Gonçalves, Jéssica; Pizzichini, Emilio; Pizzichini, Marcia Margaret Menezes; Steidle, Leila John Marques; Rocha, Cristiane Cinara; Ferreira, Samira Cardoso; Zimmermann, Célia Tânia

2014-01-01

Objective: To determine the reliability of a rapid hematology stain for the cytological analysis of induced sputum samples. Methods: This was a cross-sectional study comparing the standard technique (May-Grünwald-Giemsa stain) with a rapid hematology stain (Diff-Quik). Of the 50 subjects included in the study, 21 had asthma, 19 had COPD, and 10 were healthy (controls). From the induced sputum samples collected, we prepared four slides: two were stained with May-Grünwald-Giemsa, and two were stained with Diff-Quik. The slides were read independently by two trained researchers blinded to the identification of the slides. The reliability for cell counting using the two techniques was evaluated by determining the intraclass correlation coefficients (ICCs) for intraobserver and interobserver agreement. Agreement in the identification of neutrophilic and eosinophilic sputum between the observers and between the stains was evaluated with kappa statistics. Results: In our comparison of the two staining techniques, the ICCs indicated almost perfect interobserver agreement for neutrophil, eosinophil, and macrophage counts (ICC: 0.98-1.00), as well as substantial agreement for lymphocyte counts (ICC: 0.76-0.83). Intraobserver agreement was almost perfect for neutrophil, eosinophil, and macrophage counts (ICC: 0.96-0.99), whereas it was moderate to substantial for lymphocyte counts (ICC = 0.65 and 0.75 for the two observers, respectively). Interobserver agreement for the identification of eosinophilic and neutrophilic sputum using the two techniques ranged from substantial to almost perfect (kappa range: 0.91-1.00). Conclusions: The use of Diff-Quik can be considered a reliable alternative for the processing of sputum samples. PMID:25029648
SIERRA - A 3-D device simulator for reliability modeling

NASA Astrophysics Data System (ADS)

Chern, Jue-Hsien; Arledge, Lawrence A., Jr.; Yang, Ping; Maeda, John T.

1989-05-01

SIERRA is a three-dimensional general-purpose semiconductor-device simulation program which serves as a foundation for investigating integrated-circuit (IC) device and reliability issues. This program solves the Poisson and continuity equations in silicon under dc, transient, and small-signal conditions. Executing on a vector/parallel minisupercomputer, SIERRA utilizes a matrix solver which uses an incomplete LU (ILU) preconditioned conjugate gradient square (CGS, BCG) method. The ILU-CGS method provides a good compromise between memory size and convergence rate. The authors have observed a 5x to 7x speedup over standard direct methods in simulations of transient problems containing highly coupled Poisson and continuity equations such as those found in reliability-oriented simulations. The application of SIERRA to parasitic CMOS latchup and dynamic random-access memory single-event-upset studies is described.
Spatiotemporal image correlation-derived volumetric Doppler impedance indices from spherical samples of the placenta: intraobserver reliability and correlation with conventional umbilical artery Doppler indices.

PubMed

Welsh, A W; Hou, M; Meriki, N; Martins, W P

2012-10-01

Volumetric impedance indices derived from spatiotemporal image correlation (STIC) power Doppler ultrasound (PDU) might overcome the influence of machine settings and attenuation. We examined the feasibility of obtaining these indices from spherical samples of anterior placentas in healthy pregnancies, and assessed intraobserver reliability and correlation with conventional umbilical artery (UA) impedance indices. Uncomplicated singleton pregnancies with anterior placenta were included in the study. A single observer evaluated UA pulsatility index (PI), resistance index (RI) and systolic/diastolic ratio (S/D) and acquired three STIC-PDU datasets from the placenta just above the placental cord insertion. Another observer analyzed the STIC-PDU datasets using Virtual Organ Computer-aided AnaLysis (VOCAL) spherical samples from every frame to determine the vascularization index (VI) and vascularization flow index (VFI); maximum, minimum and average values were used to determine the three volumetric impedance indices (vPI, vRI, vS/D). Intraobserver reliability was examined by intraclass correlation coefficients (ICC) and association between volumetric indices from placenta, and UA Doppler indices were assessed by Pearson's correlation coefficient. A total of 25 pregnant women were evaluated but five were excluded because of artifacts observed during analysis. The reliability of measurement of volumetric indices of both VI and VFI from three STIC-PDU datasets was similar, with all ICCs ≥ 0.78. Pearson's r values showed a weak and non-significant correlation between UA pulsed-wave Doppler indices and their respective volumetric indices from spherical samples of placenta (all r ≥ 0.23). VOCAL indices from specific phases of the cardiac cycle showed good repeatability (ICC ≥ 0.92). Volumetric impedance indices determined from spherical samples of placenta are sufficiently reliable but do not correlate with UA Doppler indices in healthy pregnancies. Copyright © 2012 ISUOG. Published by John Wiley & Sons, Ltd.
Validity and reliability of the activPAL3 for measuring posture and stepping in adults and young people.

PubMed

Sellers, Ceri; Dall, Philippa; Grant, Margaret; Stansfield, Ben

2016-01-01

Characterisation of free-living physical activity requires the use of validated and reliable monitors. This study reports an evaluation of the validity and reliability of the activPAL3 monitor for the detection of posture and stepping in both adults and young people. Twenty adults (median 27.6y; IQR22.6y) and 8 young people (12.0y; IQR4.1y) performed standardised activities and activities of daily living (ADL) incorporating sedentary, upright and stepping activity. Agreement, specificity and positive predictive value were calculated between activPAL3 outcomes and the gold-standard of video observation. Inter-device reliability was calculated between 4 monitors. Sedentary and upright times for standardised activities were within ±5% of video observation as was step count (excluding jogging) for both adults and young people. Jogging step detection accuracy reduced with increasing cadence >150stepsmin(-1). For ADLs, sensitivity to stepping was very low for adults (40.4%) but higher for young people (76.1%). Inter-device reliability was either good (ICC(1,1)>0.75) or excellent (ICC(1,1)>0.90) for all outcomes. An excellent level of detection of standardised postures was demonstrated by the activPAL3. Postures such as seat-perching, kneeling and crouching were misclassified when compared to video observation. The activPAL3 appeared to accurately detect 'purposeful' stepping during ADL, but detection of smaller stepping movements was poor. Small variations in outcomes between monitors indicated that differences in monitor placement or hardware may affect outcomes. In general, the detection of posture and purposeful stepping with the activPAL3 was excellent indicating that it is a suitable monitor for characterising free-living posture and purposeful stepping activity in healthy adults and young people. Copyright © 2015 Elsevier B.V. All rights reserved.
Review on pen-and-paper-based observational methods for assessing ergonomic risk factors of computer work.

PubMed

Rahman, Mohd Nasrull Abdol; Mohamad, Siti Shafika

2017-01-01

Computer works are associated with Musculoskeletal Disorders (MSDs). There are several methods have been developed to assess computer work risk factor related to MSDs. This review aims to give an overview of current techniques available for pen-and-paper-based observational methods in assessing ergonomic risk factors of computer work. We searched an electronic database for materials from 1992 until 2015. The selected methods were focused on computer work, pen-and-paper observational methods, office risk factors and musculoskeletal disorders. This review was developed to assess the risk factors, reliability and validity of pen-and-paper observational method associated with computer work. Two evaluators independently carried out this review. Seven observational methods used to assess exposure to office risk factor for work-related musculoskeletal disorders were identified. The risk factors involved in current techniques of pen and paper based observational tools were postures, office components, force and repetition. From the seven methods, only five methods had been tested for reliability. They were proven to be reliable and were rated as moderate to good. For the validity testing, from seven methods only four methods were tested and the results are moderate. Many observational tools already exist, but no single tool appears to cover all of the risk factors including working posture, office component, force, repetition and office environment at office workstations and computer work. Although the most important factor in developing tool is proper validation of exposure assessment techniques, the existing observational method did not test reliability and validity. Futhermore, this review could provide the researchers with ways on how to improve the pen-and-paper-based observational method for assessing ergonomic risk factors of computer work.
Development and validation of a tool to evaluate the quality of medical education websites in pathology

PubMed Central

Alyusuf, Raja H.; Prasad, Kameshwar; Abdel Satir, Ali M.; Abalkhail, Ali A.; Arora, Roopa K.

2013-01-01

Background: The exponential use of the internet as a learning resource coupled with varied quality of many websites, lead to a need to identify suitable websites for teaching purposes. Aim: The aim of this study is to develop and to validate a tool, which evaluates the quality of undergraduate medical educational websites; and apply it to the field of pathology. Methods: A tool was devised through several steps of item generation, reduction, weightage, pilot testing, post-pilot modification of the tool and validating the tool. Tool validation included measurement of inter-observer reliability; and generation of criterion related, construct related and content related validity. The validated tool was subsequently tested by applying it to a population of pathology websites. Results and Discussion: Reliability testing showed a high internal consistency reliability (Cronbach's alpha = 0.92), high inter-observer reliability (Pearson's correlation r = 0.88), intraclass correlation coefficient = 0.85 and κ =0.75. It showed high criterion related, construct related and content related validity. The tool showed moderately high concordance with the gold standard (κ =0.61); 92.2% sensitivity, 67.8% specificity, 75.6% positive predictive value and 88.9% negative predictive value. The validated tool was applied to 278 websites; 29.9% were rated as recommended, 41.0% as recommended with caution and 29.1% as not recommended. Conclusion: A systematic tool was devised to evaluate the quality of websites for medical educational purposes. The tool was shown to yield reliable and valid inferences through its application to pathology websites. PMID:24392243
An Observed Structured Teaching Evaluation Demonstrates the Impact of a Resident-as-Teacher Curriculum on Teaching Competency.

PubMed

Zackoff, Matthew; Jerardi, Karen; Unaka, Ndidi; Sucharew, Heidi; Klein, Melissa

2015-06-01

Residents play a critical role in the education of peers and medical students, yet attainment of teaching skills is not routinely assessed. The primary aim of this study was to develop a novel, skill-based Observed Structured Teaching Evaluation (OSTE) and self-assessment survey to measure the impact of a resident-as-teacher curriculum on teaching competency. The secondary aim was to determine interrater reliability of the OSTE. A prospective study quantitatively assessed intern teaching competency via videotaped teaching encounters (videos) before and after a month-long hospital medicine rotation and self-assessment surveys over a 5-month period. The intervention group received the resident-as-teacher curriculum. Videos were evaluated by 2 blinded faculty via an OSTE covering 9 skills within 3 core components: preparation, teaching, and reflection. Pre- to post-HM rotation month differences were evaluated within and between groups using the Wilcoxon signed rank test and Wilcoxon rank-sum test, respectively. Twenty-two of 25 (88%) control and 27 of 28 (96%) intervention interns participated; 100% of participants completed the study. The intervention group's pre-post difference for the total OSTE score and the average self-assessed competence statistically improved; however, no significant difference was seen between groups. The difference in preparation scores was significant for the intervention compared with the control. The OSTE's interrater reliability demonstrated good agreement with weighted kappas of 0.86 for preparation, 0.71 for teaching, and 0.93 for reflection. Implementation of an objective, skill-based OSTE detected observable changes in interns' teaching competency after implementation of a brief resident-as-teacher curriculum. The OSTE's good interrater reliability may allow standardized assessment of skill attainment over time. Copyright © 2015 by the American Academy of Pediatrics.

Comparison of MRI-based estimates of articular cartilage contact area in the tibiofemoral joint.

PubMed

Henderson, Christopher E; Higginson, Jill S; Barrance, Peter J

2011-01-01

Knee osteoarthritis (OA) detrimentally impacts the lives of millions of older Americans through pain and decreased functional ability. Unfortunately, the pathomechanics and associated deviations from joint homeostasis that OA patients experience are not well understood. Alterations in mechanical stress in the knee joint may play an essential role in OA; however, existing literature in this area is limited. The purpose of this study was to evaluate the ability of an existing magnetic resonance imaging (MRI)-based modeling method to estimate articular cartilage contact area in vivo. Imaging data of both knees were collected on a single subject with no history of knee pathology at three knee flexion angles. Intra-observer reliability and sensitivity studies were also performed to determine the role of operator-influenced elements of the data processing on the results. The method's articular cartilage contact area estimates were compared with existing contact area estimates in the literature. The method demonstrated an intra-observer reliability of 0.95 when assessed using Pearson's correlation coefficient and was found to be most sensitive to changes in the cartilage tracings on the peripheries of the compartment. The articular cartilage contact area estimates at full extension were similar to those reported in the literature. The relationships between tibiofemoral articular cartilage contact area and knee flexion were also qualitatively and quantitatively similar to those previously reported. The MRI-based knee modeling method was found to have high intra-observer reliability, sensitivity to peripheral articular cartilage tracings, and agreeability with previous investigations when using data from a single healthy adult. Future studies will implement this modeling method to investigate the role that mechanical stress may play in progression of knee OA through estimation of articular cartilage contact area.
Validation of the UNESP-Botucatu unidimensional composite pain scale for assessing postoperative pain in cattle.

PubMed

de Oliveira, Flávia Augusta; Luna, Stelio Pacca Loureiro; do Amaral, Jackson Barros; Rodrigues, Karoline Alves; Sant'Anna, Aline Cristina; Daolio, Milena; Brondani, Juliana Tabarelli

2014-09-06

The recognition and measurement of pain in cattle are important in determining the necessity for and efficacy of analgesic intervention. The aim of this study was to record behaviour and determine the validity and reliability of an instrument to assess acute pain in 40 cattle subjected to orchiectomy after sedation with xylazine and local anaesthesia. The animals were filmed before and after orchiectomy to record behaviour. The pain scale was based on previous studies, on a pilot study and on analysis of the camera footage. Three blinded observers and a local observer assessed the edited films obtained during the preoperative and postoperative periods, before and after rescue analgesia and 24 hours after surgery. Re-evaluation was performed one month after the first analysis. Criterion validity (agreement) and item-total correlation using Spearman's coefficient were employed to refine the scale. Based on factor analysis, a unidimensional scale was adopted. The internal consistency of the data was excellent after refinement (Cronbach's α coefficient = 0.866). There was a high correlation (p < 0.001) between the proposed scale and the visual analogue, simple descriptive and numerical rating scales. The construct validity and responsiveness were confirmed by the increase and decrease in pain scores after surgery and rescue analgesia, respectively (p < 0.001). Inter- and intra-observer reliability ranged from moderate to very good. The optimal cut-off point for rescue analgesia was > 4, and analysis of the area under the curve (AUC = 0.963) showed excellent discriminatory ability. The UNESP-Botucatu unidimensional pain scale for assessing acute postoperative pain in cattle is a valid, reliable and responsive instrument with excellent internal consistency and discriminatory ability. The cut-off point for rescue analgesia provides an additional tool for guiding analgesic therapy.
The psychometric properties of Observer OPTION(5), an observer measure of shared decision making.

PubMed

Barr, Paul J; O'Malley, Alistair James; Tsulukidze, Maka; Gionfriddo, Michael R; Montori, Victor; Elwyn, Glyn

2015-08-01

Observer OPTION(5) was designed as a more efficient version of OPTION(12), the most commonly used measure of shared decision making (SDM). The current paper assesses the psychometric properties of OPTION(5). Two raters used OPTION(5) to rate recordings of clinical encounters from two previous patient decision aid (PDA) trials (n=201; n=110). A subsample was re-rated two weeks later. We assessed discriminative validity, inter-rater reliability, intra-rater reliability, and concurrent validity. OPTION(5) demonstrated discriminative validity, with increases in SDM between usual care and PDA arms. OPTION(5) also demonstrated concurrent validity with OPTION(12), r=0.61 (95%CI 0.54, 0.68) and intra-rater reliability, r=0.93 (0.83, 0.97). The mean difference in rater score was 8.89 (95% Credibility Interval, 7.5, 10.3), with intraclass correlation (ICC) of 0.67 (95% Credibility Interval, 0.51, 0.91) for the accuracy of rater scores and 0.70 (95% Credibility Interval, 0.56, 0.94) for the consistency of rater scores across encounters, indicating good inter-rater reliability. Raters reported lower cognitive burden when using OPTION(5) compared to OPTION(12). OPTION(5) is a brief, theoretically grounded observer measure of SDM with promising psychometric properties in this sample and low burden on raters. OPTION(5) has potential to provide reliable, valid assessment of SDM in clinical encounters. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Attachment narratives in refugee children: interrater reliability and qualitative analysis in pilot findings from a two-site study.

PubMed

De Haene, Lucia; Dalgaard, Nina Thorup; Montgomery, Edith; Grietens, Hans; Verschueren, Karine

2013-06-01

Although forced migration research on refugee family functioning clearly points to the potential breakdown of parental availability and responsiveness in the context of cumulative migration stressors, studies exploring attachment security in refugee children are surprisingly lacking so far. The authors report their findings from a 2-site, small-scale administration of an attachment measure, adapted for use with refugee children aged between 4 and 9 years from a reliable and validated doll-play procedure. We evaluated interrater reliability and conducted a qualitative analysis of refugee children's narrative response to identify migration-specific representational markers of attachment quality. The level of agreement among 3 independent coders ranged between .54 to 1.00 for both study samples, providing initial psychometric evidence of the measure's value in assessing child attachment security in this population. The exploratory analysis of migration-related narrative markers pointed to specific parameters to be used in parent-child observational assessments in future validation of the attachment measure, such as parental withdrawal or trauma-communication within the parent-child dyad. Copyright © 2013 International Society for Traumatic Stress Studies.
CODEMamb - an observational communication behavior assessment tool for use in ambulatory dementia care.

PubMed

Knebel, Maren; Haberstroh, Julia; Kümmel, Anne; Pantel, Johannes; Schröder, Johannes

2016-12-01

Communication improves well-being and quality of life for both people with dementia and their professional and family caregivers. Individualized communication, as required in informed consent procedures and psychosocial interventions, can improve quality of life, especially in ambulatory settings. However, few valid and reliable instruments exist that enable communication to be assessed and communication and behavioral resources to be identified. We, therefore, extended and adapted the newly developed observational instrument CODEM for use in ambulatory settings (CODEM amb ). Reliability and validity of the new instrument were studied in a total of 171 patients, whereby principal component analysis revealed three important factors: relationship aspects, verbal communication behavior and nonverbal communication behavior. CODEM amb [Formula: see text]s internal consistency, interrater and retest reliability were satisfactory to excellent. Convergent validity indices, as shown by examining correlations with similar but not identical constructs (CERAD-NP verbal subscales), were medium-high, while the divergent validity index (constructional praxis) was relatively low. The relationship to peer-rating remained nonsignificant. Criterion validity was investigated in groups of patients in accordance with their cognitive status. As expected, verbal communication abilities deteriorate faster than the relationship aspects of communication as the disease progresses. In summary, CODEM amb is a reliable and valid instrument that can be used to collect important information with the ultimate aim of supporting communication with people with dementia.
Of blind men and elephants: suggesting SDM-MASS as a compound measure for shared decision making integrating patient, physician and observer views.

PubMed

Geiger, Friedemann; Kasper, Jürgen

2012-01-01

Shared decision making (SDM) between patient and physician is an interpersonal process. Most SDM measures use the view of one party (patient, physician or observer) as a proxy to capture this process although these views typically diverge. This study suggests the compound measure SDM(MASS) (SDM Meeting its concept's ASSumptions) integrating these three perspectives in one single index. SDM(MASS) was derived theoretically and compared empirically to unilateral perspectives of patients, physicians and observers by application to a data set of 10 physicians (40 consultations) receiving an SDM training. The constituting parts of SDM(MASS) were highly reliable (Cronbach's alpha .94; interrater reliability .74-.87). Unilateral appraisal of training effects was divergent. SDM(MASS) revealed no effect. SDM(MASS) combines noteworthy information about SDM processes from different viewpoints and thereby delivers plausible assessments. It could overcome immanent shortcomings of unilateral approaches. However, it is a complex measure needing further validation. Copyright © 2012. Published by Elsevier GmbH.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Yue; Reeves, Geoffrey D.; Cunningham, Gregory S.

Our study demonstrates the feasibility and reliability of using observations from low Earth orbit (LEO) to forecast and nowcast relativistic electrons in the outer radiation belt. Furthermore, we first report a high cross-energy, cross-pitch-angle coherence discovered between the trapped MeV electrons and precipitating approximately hundreds (~100s) of keV electrons—observed by satellites with very different altitudes—with correlation coefficients as high as ≳ 0.85. We then tested the feasibility of applying linear prediction filters to LEO data to predict the arrival of new MeV electrons during geomagnetic storms, as well as their evolving distributions afterward, based on the coherence. Reliability of thesemore » predictive filters is quantified by the performance efficiency with values as high as 0.74 when driven merely by LEO observations (or up to 0.94 with the inclusion of in situ MeV electron measurements). Finally, a hypothesis based upon the wave-particle resonance theory is proposed to explain the coherence, and a first-principle electron tracing model yields supporting evidence.« less
Diagnosing prosopagnosia in East Asian individuals: Norms for the Cambridge Face Memory Test-Chinese.

PubMed

McKone, Elinor; Wan, Lulu; Robbins, Rachel; Crookes, Kate; Liu, Jia

2017-07-01

The Cambridge Face Memory Test (CFMT) is widely accepted as providing a valid and reliable tool in diagnosing prosopagnosia (inability to recognize people's faces). Previously, large-sample norms have been available only for Caucasian-face versions, suitable for diagnosis in Caucasian observers. These are invalid for observers of different races due to potentially severe other-race effects. Here, we provide large-sample norms (N = 306) for East Asian observers on an Asian-face version (CFMT-Chinese). We also demonstrate methodological suitability of the CFMT-Chinese for prosopagnosia diagnosis (high internal reliability, approximately normal distribution, norm-score range sufficiently far above chance). Additional findings were a female advantage on mean performance, plus a difference between participants living in the East (China) or the West (international students, second-generation children of immigrants), which we suggest might reflect personality differences associated with willingness to emigrate. Finally, we demonstrate suitability of the CFMT-Chinese for individual differences studies that use correlations within the normal range.
Reliability associated with the Roter Interaction Analysis System (RIAS) adapted for the telemedicine context.

PubMed

Nelson, Eve-Lynn; Miller, Edward Alan; Larson, Kiley A

2010-01-01

This study's purpose was to adapt the Roter Interaction Analysis System (RIAS) for telemedicine clinics and to investigate the adapted measure's reliability. The study also sought to better understand the volume of technology-related utterance in established telemedicine clinics and the feasibility of using the measure within the telemedicine setting. This initial evaluation is a first step before broadly using the adapted measure across technologies and raters. An expert panel adapted the RIAS for the telemedicine context. This involved accounting for all consultation participants (patient, provider, presenter, family) and adding technology-specific subcategories. Ten new and 36 follow-up telemedicine encounters were videotaped and double coded using the adapted RIAS. These consisted primarily of follow-up visits (78.0%) involving patients, providers, presenters, and other parties. Reliability was calculated for those categories with 15 or more utterances. Traditional RIAS categories related to socioemotional and task-focused clusters had fair to excellent levels of reliability in the telemedicine setting. Although there were too few utterances to calculate the reliability of the specific technology-related subcategories, the summary technology-related category proved reliable for patients, providers, and presenters. Overall patterns seen in traditional patient-provider interactions were observed, with the number of provider utterances far exceeding patient, presenter, and family utterances, and few technology-specific utterances. The traditional RIAS is reliable when applied across multiple participants in the telemedicine context. Reliability of technology-related subcategories could not be evaluated; however, the aggregate technology-related cluster was found to be reliable and may be especially relevant in understanding communication patterns with patients new to the telemedicine setting. Use of the RIAS instrument is encouraged to facilitate comparison between traditional, face-to-face clinics and telemedicine; among diverse consultation mediums and technologies; and across different specialties. Future research is necessary to further investigate the reliability and validity of adding technology-related subcategories to the RIAS. The limited number of technology-related utterances, however, implies a certain degree of comfort with two-way interactive video consultation among study participants. Telemedicine continues to increase access to healthcare. The technology-related categories of the adapted RIAS were reliable when aggregated, thereby providing a tool to better understand how telemedicine affects provider-patient communication and outcomes.
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies

PubMed Central

Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry

2017-01-01

Objectives To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Design Systematic review and narrative synthesis of reproducibility studies. Data sources Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Eligibility criteria Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations.Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. Results From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ−0.10). Inter-rater reliability was poor in six studies (37%) and excellent in only two (13%). This contrasts with studies conducted in the research setting, where the median inter-rater reliability was 0.76 (range 0.91-0.53), and 71% (5/7) studies achieved excellent inter-rater reliability. Reliability between assessing professionals was higher when the evaluation was guided by a standardised instrument (23 studies, P=0.006). No such association was detected for subjective or chronic health conditions or the studies’ generalisability to real world evaluation of disability (P=0.46, 0.45, and 0.65, respectively). Conclusions Despite their common use and far reaching consequences for workers claiming disabling injury or illness, research on the reliability of medical evaluations of disability for work is limited and indicates high variation in judgments among assessing professionals. Standardising the evaluation process could improve reliability. Development and testing of instruments and structured approaches to improve reliability in evaluation of disability are urgently needed. PMID:28122727
Inter-observer agreement, diagnostic sensitivity and specificity of animal-based indicators of young lamb welfare.

PubMed

Phythian, C J; Toft, N; Cripps, P J; Michalopoulou, E; Winter, A C; Jones, P H; Grove-White, D; Duncan, J S

2013-07-01

A scientific literature review and consensus of expert opinion used the welfare definitions provided by the Farm Animal Welfare Council (FAWC) Five Freedoms as the framework for selecting a set of animal-based indicators that were sensitive to the current on-farm welfare issues of young lambs (aged ≤ 6 weeks). Ten animal-based indicators assessed by observation - demeanour, response to stimulation, shivering, standing ability, posture, abdominal fill, body condition, lameness, eye condition and salivation were tested as part of the objective of developing valid, reliable and feasible animal-based measures of lamb welfare The indicators were independently tested on 966 young lambs from 17 sheep flocks across Northwest England and Wales during December 2008 to April 2009 by four trained observers. Inter-observer reliability was assessed using Fleiss's kappa (κ), and the pair-wise agreement with an experienced, observer designated as the 'test standard observer' (TSO) was examined using Cohen's κ. Latent class analysis (LCA) estimated the sensitivity (Se) and specificity (Sp) of each observer without assuming a gold standard and predicted the Se and Sp of randomly selected observers who may apply the indicators in the future. Overall, good levels of inter-observer reliability, and high levels of Sp were identified for demeanour (κ = 0.54, Se ≥ 0.70, Sp ≥ 0.98), stimulation (κ = 0.57, Se = 0.30 to 0.77, Sp ≥ 0.98), shivering (κ = 0.55, Se = 0.37 to 0.85, Sp ≥ 0.99), standing ability (0.54, Se ≥ 0.80, Sp ≥ 0.99), posture (κ = 0.45, Se ≥ 0.56, Sp = 0.99), abdominal fill (κ = 0.44, Se = 0.39 to 0.98, Sp = 0.99), body condition (κ = 0.72, Se ⩾ 0.38 to 0.90, Sp = 0.99), lameness (κ = 0.68, Se > 0.73, Sp = 1.00), and eye condition (κ = 0.72, Se ≥ 0.86, Sp = 0.99). LCA predicted that randomly selected observers had Se > 0.77 (acceptable), and Sp ≥ 0.98 (high) for assessments of demeanour, lameness, abdominal fill posture, body condition and eye condition. The diagnostic performance of some indicators was influenced by the composition of the study population, and it would be useful to test the indicators on lambs with a greater level of outcomes associated with poor welfare. The findings presented in this paper could be applied in the selection of valid, reliable and feasible indicators used for the purposes of on-farm assessments of lamb welfare.
Intra and Inter-Rater Reliability of Screening for Movement Impairments: Movement Control Tests from The Foundation Matrix

PubMed Central

Mischiati, Carolina R.; Comerford, Mark; Gosford, Emma; Swart, Jacqueline; Ewings, Sean; Botha, Nadine; Stokes, Maria; Mottram, Sarah L.

2015-01-01

Pre-season screening is well established within the sporting arena, and aims to enhance performance and reduce injury risk. With the increasing need to identify potential injury with greater accuracy, a new risk assessment process has been produced; The Performance Matrix (battery of movement control tests). As with any new method of objective testing, it is fundamental to establish whether the same results can be reproduced between examiners and by the same examiner on consecutive occasions. This study aimed to determine the intra-rater test re-test and inter-rater reliability of tests from a component of The Performance Matrix, The Foundation Matrix. Twenty participants were screened by two experienced musculoskeletal therapists using nine tests to assess the ability to control movement during specific tasks. Movement evaluation criteria for each test were rated as pass or fail. The therapists observed participants real-time and tests were recorded on video to enable repeated ratings four months later to examine intra-rater reliability (videos rated two weeks apart). Overall test percentage agreement was 87% for inter-rater reliability; 98% Rater 1, 94% Rater 2 for test re-test reliability; and 75% for real-time versus video. Intraclass-correlation coefficients (ICCs) were excellent between raters (0.81) and within raters (Rater 1, 0.96; Rater 2, 0.88) but poor for real-time versus video (0.23). Reliability for individual components of each test was more variable: inter-rater, 68-100%; intra-rater, 88-100% Rater 1, 75-100% Rater 2; and real-time versus video 31-100%. Cohen’s Kappa values for inter-rater reliability were 0.0-1.0; intra-rater 0.6-1.0 for Rater 1; -0.1-1.0 for Rater 2; and -0.1-1 for real-time versus video. It is concluded that both inter and intra-rater reliability of tests in The Foundation Matrix are acceptable when rated by experienced therapists. Recommendations are made for modifying some of the criteria to improve reliability where excellence was not reached. Key points The movement control tests of The Foundation Matrix had acceptable reliability between raters and within raters on different days Agreement between observations made on tests performed real-time and on video recordings was low, indicating poor validity of use of video recordings Some movement evaluation criteria related to specific tests that did not achieve excellent agreement could be modified to improve reliability PMID:25983594
Effects of Sterilization Cycles on PEEK for Medical Device Application

PubMed Central

Yap, Wai Teng; Foo, Soo Leong; Lee, Teck Kheng

2018-01-01

The effects of the sterilization process have been studied on medical grade thermoplastic polyetheretherketone (PEEK). For a reusable medical device, material reliability is an important parameter to decide its lifetime, as it will be subjected to the continuous steam sterilization process. A spring nature, clip component was selected out of a newly designed medical device (patented) to perform this reliability study. This clip component was sterilized for a predetermined number of cycles (2, 4, 6, 8, 10, 20…100) at 121 °C for 30 min. A significant decrease of ~20% in the compression force of the spring was observed after 30 cycles, and a ~6% decrease in the lateral dimension of the clip was observed after 50 cycles. No further significant change in the compression force or dimension was observed for the subsequent sterilization cycles. Vickers hardness and differential scanning calorimetry (DSC) techniques were used to characterize the effects of sterilization. DSC results exhibited no significant change in the degree of cure and melting behavior of PEEK before and after the sterilization. Hardness measurement exhibited an increase of ~49% in hardness after just 20 cycles. When an unsterilized sample was heated for repetitive cycles without the presence of moisture (121 °C, 10 and 20 cycles), only ~7% of the maximum change in hardness was observed. PMID:29466289
Effects of Sterilization Cycles on PEEK for Medical Device Application.

PubMed

Kumar, Amit; Yap, Wai Teng; Foo, Soo Leong; Lee, Teck Kheng

2018-02-21

The effects of the sterilization process have been studied on medical grade thermoplastic polyetheretherketone (PEEK). For a reusable medical device, material reliability is an important parameter to decide its lifetime, as it will be subjected to the continuous steam sterilization process. A spring nature, clip component was selected out of a newly designed medical device (patented) to perform this reliability study. This clip component was sterilized for a predetermined number of cycles (2, 4, 6, 8, 10, 20…100) at 121 °C for 30 min. A significant decrease of ~20% in the compression force of the spring was observed after 30 cycles, and a ~6% decrease in the lateral dimension of the clip was observed after 50 cycles. No further significant change in the compression force or dimension was observed for the subsequent sterilization cycles. Vickers hardness and differential scanning calorimetry (DSC) techniques were used to characterize the effects of sterilization. DSC results exhibited no significant change in the degree of cure and melting behavior of PEEK before and after the sterilization. Hardness measurement exhibited an increase of ~49% in hardness after just 20 cycles. When an unsterilized sample was heated for repetitive cycles without the presence of moisture (121 °C, 10 and 20 cycles), only ~7% of the maximum change in hardness was observed.
Automation in visual inspection tasks: X-ray luggage screening supported by a system of direct, indirect or adaptable cueing with low and high system reliability.

PubMed

Chavaillaz, Alain; Schwaninger, Adrian; Michel, Stefan; Sauer, Juergen

2018-05-25

The present study evaluated three automation modes for improving performance in an X-ray luggage screening task. 140 participants were asked to detect the presence of prohibited items in X-ray images of cabin luggage. Twenty participants conducted this task without automatic support (control group), whereas the others worked with either indirect cues (system indicated the target presence without specifying its location), or direct cues (system pointed out the exact target location) or adaptable automation (participants could freely choose between no cue, direct and indirect cues). Furthermore, automatic support reliability was manipulated (low vs. high). The results showed a clear advantage for direct cues regarding detection performance and response time. No benefits were observed for adaptable automation. Finally, high automation reliability led to better performance and higher operator trust. The findings overall confirmed that automatic support systems for luggage screening should be designed such that they provide direct, highly reliable cues.
Reliability and validity of a physical activity scale among urban pregnant women in eastern China.

PubMed

Jiang, Hong; He, Gengsheng; Li, Mu; Fan, Yanyan; Jiang, Hongyi; Bauman, Adrian; Qian, Xu

2015-03-01

This study aimed to determine the reliability and validity of the physical activity scale adapted from a Danish scale for assessing physical activity among urban pregnant women in eastern China. Participants recruited in an urban setting of eastern China were asked to complete the physical activity scale, the activity diary, and to wear a pedometer for the same 4 days, followed by repeating the activity scale for another 4 days within 2 weeks. A total of 109 pregnant women completed data recording. Good reliability of the physical activity scale was observed (intraclass correlation coefficient = .87). There was also a good comparability between the activity scale and the activity diary (Spearman's r = .75 for total energy expenditure). The agreement between the scale and pedometer reading was acceptable (Spearman's r = .45). The adapted physical activity scale is a reliable and reasonably accurate instrument for estimating physical activity among urban pregnant women in eastern China. © 2012 APJPH.
Failure Analysis Study and Long-Term Reliability of Optical Assemblies with End-Face Damage

NASA Technical Reports Server (NTRS)

Kichak, Robert A.; Ott, Melanie N.; Leidecker, Henning W.; Chuska, Richard F.; Greenwell, Christopher J.

2008-01-01

In June 2005, the NESC received a multi-faceted request to determine the long term reliability of fiber optic termini on the ISS that exhibited flaws not manufactured to best workmanship practices. There was a lack of data related to fiber optic workmanship as it affects the long term reliability of optical fiber assemblies in a harsh environment. A fiber optic defect analysis was requested which would find and/or create various types of chips, spalls, scratches, etc., that were identified by the ISS personnel. Once the defects and causes were identified the next step would be to perform long term reliability testing of similar assemblies with similar defects. The goal of the defect analysis would be for the defects to be observed and documented for deterioration of fiber optic performance. Though this report mostly discusses what has been determined as evidence of poor manufacturing processes, it also concludes the majority of the damage could have been avoided with a rigorous process in place.
Synaptic and Network Mechanisms of Sparse and Reliable Visual Cortical Activity during Nonclassical Receptive Field Stimulation

PubMed Central

Haider, Bilal; Krause, Matthew R.; Duque, Alvaro; Yu, Yuguo; Touryan, Jonathan; Mazer, James A.; McCormick, David A.

2011-01-01

SUMMARY During natural vision, the entire visual field is stimulated by images rich in spatiotemporal structure. Although many visual system studies restrict stimuli to the classical receptive field (CRF), it is known that costimulation of the CRF and the surrounding nonclassical receptive field (nCRF) increases neuronal response sparseness. The cellular and network mechanisms underlying increased response sparseness remain largely unexplored. Here we show that combined CRF + nCRF stimulation increases the sparseness, reliability, and precision of spiking and membrane potential responses in classical regular spiking (RSC) pyramidal neurons of cat primary visual cortex. Conversely, fast-spiking interneurons exhibit increased activity and decreased selectivity during CRF + nCRF stimulation. The increased sparseness and reliability of RSC neuron spiking is associated with increased inhibitory barrages and narrower visually evoked synaptic potentials. Our experimental observations were replicated with a simple computational model, suggesting that network interactions among neuronal subtypes ultimately sharpen recurrent excitation, producing specific and reliable visual responses. PMID:20152117
Earth Observing System/Advanced Microwave Sounding Unit-A (EOS/AMSU-A): Reliability prediction report for module A1 (channels 3 through 15) and module A2 (channels 1 and 2)

NASA Technical Reports Server (NTRS)

Geimer, W.

1995-01-01

This report documents the final reliability prediction performed on the Earth Observing System/Advanced Microwave Sounding Unit-A (EOS/AMSU-A). The A1 Module contains Channels 3 through 15, and is referred to herein as 'EOS/AMSU-A1'. The A2 Module contains Channels 1 and 2, and is referred herein as 'EOS/AMSU-A2'. The 'specified' figures were obtained from Aerojet Reports 8897-1 and 9116-1. The predicted reliability figure for the EOS/AMSU-A1 meets the specified value and provides a Mean Time Between Failures (MTBF) of 74,390 hours. The predicted reliability figure for the EOS/AMSU-A2 meets the specified value and provides a MTBF of 193,110 hours.
Reliability of the Superimposed-Burst Technique in Patients With Patellofemoral Pain: A Technical Report.

PubMed

Norte, Grant E; Frye, Jamie L; Hart, Joseph M

2015-11-01

The superimposed-burst (SIB) technique is commonly used to quantify central activation failure after knee-joint injury, but its reliability has not been established in pathologic cohorts. To assess within-session and between-sessions reliability of the SIB technique in patients with patellofemoral pain. Descriptive laboratory study. University laboratory. A total of 10 patients with self-reported patellofemoral pain (1 man, 9 women; age = 24.1 ± 3.8 years, height = 167.8 ± 15.2 cm, mass = 71.6 ± 17.5 kg) and 10 healthy control participants (3 men, 7 women; age = 27.4 ± 5.0 years, height = 173.5 ± 9.9 cm, mass = 78.2 ± 16.5 kg) volunteered. Participants were assessed at 6 intervals spanning 21 days. Intraclass correlation coefficients (ICCs [3,3]) were used to assess reliability. Quadriceps central activation ratio, knee-extension maximal voluntary isometric contraction force, and SIB force. The quadriceps central activation ratio was highly reliable within session (ICC [3,3] = 0.97) and between sessions through day 21 (ICC [3,3] = 0.90-0.95). Acceptable reliability of knee extension (ICC [3,3] = 0.75-0.91) and SIB force (ICC [3,3] = 0.77-0.89) was observed through day 21. The SIB technique was reliable for clinical research up to 21 days in patients with patellofemoral pain.

Inter- and intraobserver reliability assessment of the axial trunk rotation: manual versus smartphone-aided measurement tools.

PubMed

Qiao, Jun; Xu, Leilei; Zhu, Zezhang; Zhu, Feng; Liu, Zhen; Qian, Bangping; Qiu, Yong

2014-10-11

Scoliogauge, has been developed for the measurement of ATR on iPhone smartphones. This study was to evaluate the reliability for the smartphone-aided ATR measurement method and to compare its reliability with that of the manual method. Sixty-four AIS patients with single thoracic or lumbar curve participated in this study. Of these patients, thirty-two patients had main thoracic scoliosis while other thirty-two had main thoracolumbar/lumbar scoliosis. Two spine surgeons performed the measurements with Scoliometer and Scoliogauge. The Scoliogauge measurements were conducted on an iPhone 4 smartphone. The intraclass correlation coefficient (ICC) 2-way mixed model on absolute agreement was used to analyze the reliability categorized according to regions: thoracic or lumbar, and Cobb angles: <20 degrees and >40 degrees. ICC < 0.40 is considered as poor, 0.40-0.59 as fair, 0.60-0.74 as good, and 0.75-1.00 as excellent. The overall intraobserver variability was 0.954 and the overall interobserver variability was 0.943 for the scoliometer set, whereas the intraobserver variability was 0.965 and interobserver variability was 0.964 for the scoliogauge set. Both the intraobserver and interobserver ICCs reached the excellent value in the 2 sets for both observers. The mean Cobb angle of thoracic curves in patients with main thoracic scoliosis was similar to that of lumbar curves in those with main thoracolumbar/lumbar scoliosis (35.7 degrees vs. 36.1 degrees). The intraobserver and interobserver reliability was similar between two groups (thoracic vs. lumbar) in the 2 sets. There were 21 patients having Cobb angles < 20 degrees, while 20 patients >40 degrees. The intraobserver and interobserver reliability was better in severe curve(>40 degrees) group. Smartphone-aided measurement for ATR showed excellent reliability, and the reliability of measurement with either scoliometer or scoliogauge could be influenced by Cobb angle that reliability was better for curves with larger Cobb angles.
Expert Reliability for the World Health Organization Standardized Ultrasound Classification of Cystic Echinococcosis

PubMed Central

Solomon, Nadia; Fields, Paul J.; Tamarozzi, Francesca; Brunetti, Enrico; Macpherson, Calum N. L.

2017-01-01

Cystic echinococcosis (CE), a parasitic zoonosis, results in cyst formation in the viscera. Cyst morphology depends on developmental stage. In 2003, the World Health Organization (WHO) published a standardized ultrasound (US) classification for CE, for use among experts as a standard of comparison. This study examined the reliability of this classification. Eleven international CE and US experts completed an assessment of eight WHO classification images and 88 test images representing cyst stages. Inter- and intraobserver reliability and observer performance were assessed using Fleiss' and Cohen's kappa. Interobserver reliability was moderate for WHO images (κ = 0.600, P < 0.0001) and substantial for test images (κ = 0.644, P < 0.0001), with substantial to almost perfect interobserver reliability for stages with pathognomonic signs (CE1, CE2, and CE3) for WHO (0.618 < κ < 0.904) and test images (0.642 < κ < 0.768). Comparisons of expert performances against the majority classification for each image were significant for WHO (0.413 < κ < 1.000, P < 0.005) and test images (0.718 < κ < 0.905, P < 0.0001); and intraobserver reliability was significant for WHO (0.520 < κ < 1.000, P < 0.005) and test images (0.690 < κ < 0.896, P < 0.0001). Findings demonstrate moderate to substantial interobserver and substantial to almost perfect intraobserver reliability for the WHO classification, with substantial to almost perfect interobserver reliability for pathognomonic stages. This confirms experts' abilities to reliably identify WHO-defined pathognomonic signs of CE, demonstrating that the WHO classification provides a reproducible way of staging CE. PMID:28070008
Inter-rater reliability of a food store checklist to assess availability of healthier alternatives to the energy-dense snacks and beverages commonly consumed by children.

PubMed

Izumi, Betty T; Findholt, Nancy E; Pickus, Hayley A; Nguyen, Thuan; Cuneo, Monica K

2014-06-01

Food stores have gained attention as potential intervention targets for improving children's eating habits. There is a need for valid and reliable instruments to evaluate changes in food store snack and beverage availability secondary to intervention. The aim of this study was to develop a valid, reliable, and resource-efficient instrument to evaluate the healthfulness of food store environments faced by children. The SNACZ food store checklist was developed to assess availability of healthier alternatives to the energy-dense snacks and beverages commonly consumed by children. After pretesting, two trained observers independently assessed the availability of 48 snack and beverage items in 50 food stores located near elementary and middle schools in Portland, Oregon, over a 2-week period in summer 2012. Inter-rater reliability was calculated using the kappa statistic. Overall, the instrument had mostly high inter-rater reliability. Seventy-three percent of items assessed had almost perfect or substantial reliability. Two items had moderate reliability (0.41-0.60), and no items had a reliability score less than 0.41. Eleven items occurred too infrequently to generate a kappa score. The SNACZ food store checklist is a first-step toward developing a valid and reliable tool to evaluate the healthfulness of food store environments faced by children. The tool can be used to compare availability of healthier snack and beverage alternatives across communities and measure change secondary to intervention. As a wider variety of healthier snack and beverage alternatives become available in food stores, the checklist should be updated.
Estimating functional cognition in older adults using observational assessments of task performance in complex everyday activities: A systematic review and evaluation of measurement properties.

PubMed

Wesson, Jacqueline; Clemson, Lindy; Brodaty, Henry; Reppermund, Simone

2016-09-01

Functional cognition is a relatively new concept in assessment of older adults with mild cognitive impairment or dementia. Instruments need to be reliable and valid, hence we conducted a systematic review of observational assessments of task performance used to estimate functional cognition in this population. Two separate database searches were conducted: firstly to identify instruments; and secondly to identify studies reporting on the psychometric properties of the instruments. Studies were analysed using a published checklist and their quality reviewed according to specific published criteria. Clinical utility was reviewed and the information formulated into a best evidence synthesis. We found 21 instruments and included 58 studies reporting on measurement properties. The majority of studies were rated as being of fair methodological quality and the range of properties investigated was restricted. Most instruments had studies reporting on construct validity (hypothesis testing), none on content validity and there were few studies reporting on reliability. Overall the evidence on psychometric properties is lacking and there is an urgent need for further evaluation of instruments. Copyright © 2016 Elsevier Ltd. All rights reserved.
Development and Initial Psychometrics of Counseling Supervisor's Behavior Questionnaire

ERIC Educational Resources Information Center

Lee, Ahram; Park, Eun Hye; Byeon, Eunji; Lee, Sang Min

2016-01-01

This study describes the development and psychometric properties of the Counseling Supervisor's Behavior Questionnaire, designed to assess the specific behaviors of supervisors, which can be observed by supervisees during supervision sessions. Factor structure, construct and concurrent validity, and internal consistency reliability of the…
78 FR 35038 - Proposed Information Collection Activity; Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-11

..., reliable, and transparent method for identifying high-quality programs that can receive continuing five... the system is working. The study will employ a mixed-methods design that integrates and layers administrative and secondary data sources, observational measures, and interviews to develop a rich knowledge...
Empirical Evidence for Childhood Depression.

ERIC Educational Resources Information Center

Lachar, David

Although several theoretical positions deal with the concept of childhood depression, accurate measurement of depression can only occur if valid and reliable measures are available. Current efforts emphasize direct questioning of the child and quantification of parents' observations. One scale used to study childhood depression, the Personality…
Evaluating the test-retest reliability of symptom indices associated with the ImPACT post-concussion symptom scale (PCSS).

PubMed

Merritt, Victoria C; Bradson, Megan L; Meyer, Jessica E; Arnett, Peter A

2018-05-01

The Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) is a commonly used tool in sports concussion assessment. While test-retest reliabilities have been established for the ImPACT cognitive composites, few studies have evaluated the psychometric properties of the ImPACT's Post-Concussion Symptom Scale (PCSS). The purpose of this study was to establish the test-retest reliability of symptom indices associated with the PCSS. Participants included 38 undergraduate students (50.0% male) who underwent neuropsychological testing as part of their participation in their psychology department's research subject pool. The majority of the participants were Caucasian (94.7%) and had no history of concussion (73.7%). All participants completed the ImPACT at two time points, approximately 6 weeks apart. The PCSS was the main outcome measure, and eight symptom indices were calculated (a total symptom score, three symptom summary indices, and four symptom clusters). Pearson correlations (r) and intraclass correlation coefficients (ICCs) were computed as measures of test-retest reliability. Overall, reliabilities ranged from low to high (r = .44 to .80; ICC = .44 to .77). The cognitive symptom cluster exhibited the highest test-retest reliability (r = .80, ICC = .77), followed by the positive symptom total (PST) index, an indicator of the total number of symptoms endorsed (r = .71, ICC = .69). In contrast, the commonly used total symptom score showed lower test-retest reliability (r = .67, ICC = .62). Paired-samples t tests revealed no significant differences between test and retest for any of the symptom variables (all p > .01). Finally, reliable change indices (RCI) were computed to determine whether differences observed between test and retest represented clinically significant change. RCI values were provided for each symptom index at the 80%, 90%, and 95% confidence intervals. These results suggest that evaluating additional symptom indices beyond the total symptom score from the PCSS is beneficial. Findings from this study can be applied to athlete samples to assess reliable change in symptoms following concussion.
Reliability of the Q Force; a mobile instrument for measuring isometric quadriceps muscle strength.

PubMed

Douma, K W; Regterschot, G R H; Krijnen, W P; Slager, G E C; van der Schans, C P; Zijlstra, W

2016-01-01

The ability to generate muscle strength is a pre-requisite for all human movement. Decreased quadriceps muscle strength is frequently observed in older adults and is associated with a decreased performance and activity limitations. To quantify the quadriceps muscle strength and to monitor changes over time, instruments and procedures with a sufficient reliability are needed. The Q Force is an innovative mobile muscle strength measurement instrument suitable to measure in various degrees of extension. Measurements between 110 and 130° extension present the highest values and the most significant increase after training. The objective of this study is to determine the test-retest reliability of muscle strength measurements by the Q Force in older adults in 110° extension. Forty-one healthy older adults, 13 males and 28 females were included in the study. Mean (SD) age was 81.9 (4.89) years. Isometric muscle strength of the Quadriceps muscle was assessed with the Q Force at 110° of knee extension. Participants were measured at two sessions with a three to eight day interval between sessions. To determine relative reliability, the intraclass correlation coefficient (ICC) was calculated. To determine absolute reliability, Bland and Altman Limits of Agreement (LOA) were calculated and t-tests were performed. Relative reliability of the Q Force is good to excellent as all ICC coefficients are higher than 0.75. Generally a large 95 % LOA, reflecting only moderate absolute reliability, is found as exemplified for the peak torque left leg of -18.6 N to 33.8 N and the right leg of -9.2 N to 26.4 N was between 15.7 and 23.6 Newton representing 25.2 % to 39.9 % of the size of the mean. Small systematic differences in mean were found between measurement session 1 and 2. The present study shows that the Q Force has excellent relative test-retest reliability, but limited absolute test-retest reliability. Since the Q Force is relatively cheap and mobile it is suitable for application in various clinical settings, however, its capability to detect changes in muscle force over time is limited but comparable to existing instruments.
Psychometric performance of the brazilian version of the Mini-cuestionario de calidad de vida en la hipertensión arterial (MINICHAL).

PubMed

Soutello, Ana Lúcia Soares; Rodrigues, Roberta Cunha Matheus; Jannuzzi, Fernanda Freire; Spana, Thaís Moreira; Gallani, Maria Cecília Bueno Jayme; Nadruz Junior, Wilson

2011-01-01

This study aimed to evaluate the feasibility, acceptability, ceiling and floor effects, reliability, and convergent construct validity of the Brazilian version of the Mini Cuestionario de Calidad de Vida en la Hipertensión Arterial (MINICHAL). The study included 200 hypertensive outpatients in a university hospital and a primary healthcare unit. The MINICHAL was applied in 3.0 (± 1.0) minutes with 100% of the items answered. A "ceiling effect" was observed in both dimensions and in the total score, as well as evidence of measurement stability (ICC=0.74). The convergent validity was confirmed by significant positive correlations between similar dimensions of the MINICHAL and the SF-36, and significant negative correlations with the Minnesota Living with Heart Failure Questionnaire - MLHFQ, however, correlations between dissimilar constructs were also observed. It was concluded that the Brazilian version of the MINICHAL presents evidence of reliability and validity when applied to hypertensive outpatients.
Reliability and concurrent validity of a peripheral pulse oximeter and health-app system for the quantification of heart rate in healthy adults.

PubMed

Losa-Iglesias, Marta Elena; Becerro-de-Bengoa-Vallejo, Ricardo; Becerro-de-Bengoa-Losa, Klark Ricardo

2016-06-01

There are downloadable applications (Apps) for cell phones that can measure heart rate in a simple and painless manner. The aim of this study was to assess the reliability of this type of App for a Smartphone using an Android system, compared to the radial pulse and a portable pulse oximeter. We performed a pilot observational study of diagnostic accuracy, randomized in 46 healthy volunteers. The patients' demographic data and cardiac pulse were collected. Radial pulse was measured by palpation of the radial artery with three fingers at the wrist over the radius; a low-cost portable, liquid crystal display finger pulse oximeter; and a Heart Rate Plus for Samsung Galaxy Note®. This study demonstrated high reliability and consistency between systems with respect to the heart rate parameter of healthy adults using three systems. For all parameters, ICC was > 0.93, indicating excellent reliability. Moreover, CVME values for all parameters were between 1.66-4.06 %. We found significant correlation coefficients and no systematic differences between radial pulse palpation and pulse oximeter and a high precision. Low-cost pulse oximeter and App systems can serve as valid instruments for the assessment of heart rate in healthy adults. © The Author(s) 2014.
Mother-child bonding assessment tools☆

PubMed Central

Perrelli, Jaqueline Galdino Albuquerque; Zambaldi, Carla Fonseca; Cantilino, Amaury; Sougey, Everton Botelho

2014-01-01

Objective: To identify and describe research tools used to evaluate bonding between mother and child up to one year of age, as well as to provide information on reliability and validity measures related to these tools. Data source: Research studies available on PUBMED, LILACS, ScienceDirect, PsycINFO and CINAHL databases with the following descriptors: mother-child relations and mother infant relationship, as well as the expressions validity, reliability and scale. Data synthesis: 23 research studies were selected and fully analyzed. Thirteen evaluation research tools were identified concerning mother and child attachment: seven scales, three questionnaires, two inventories and one observation method. From all tools analyzed, the Prenatal Attachment Inventory presented the higher validity and reliability measures to assess mother and fetus relation during pregnancy. Concerning the puerperal period, better consistency coefficients were found for Maternal Attachment Inventory and Postpartum Bonding Questionnaire. Besides, the last one revealed a higher sensibility to identify amenable and severe disorders in the affective relations between mother and child. Conclusions: The majority of research tools are reliable to study the phenomenon presented, although there are some limitations regarding the construct and criterion related to validity. In addition to this, only two of them are translated into Portuguese and adapted to women and children populations in Brazil, being a decisive gap to scientific production in this area. PMID:25479859
Reliability testing of the Larsen and Sharp classifications for rheumatoid arthritis of the elbow.

PubMed

Jew, Nicholas B; Hollins, Anthony M; Mauck, Benjamin M; Smith, Richard A; Azar, Frederick M; Miller, Robert H; Throckmorton, Thomas W

2017-01-01

Two popular systems for classifying rheumatoid arthritis affecting the elbow are the Larsen and Sharp schemes. To our knowledge, no study has investigated the reliability of these 2 systems. We compared the intraobserver and interobserver agreement of the 2 systems to determine whether one is more reliable than the other. The radiographs of 45 patients diagnosed with rheumatoid arthritis affecting the elbow were evaluated. Anteroposterior and lateral radiographs were deidentified and distributed to 6 evaluators (4 fellowship-trained upper extremity surgeons and 2 orthopedic trainees). Each evaluator graded all 45 radiographs according to the Larsen and Sharp scoring methods on 2 occasions, at least 2 weeks apart. Overall intraobserver reliability was 0.93 (95% confidence interval [CI], 0.90-0.95) for the Larsen system and 0.92 (95% CI, 0.86-0.96) for the Sharp classification, both indicating substantial agreement. Overall interobserver reliability was 0.70 (95% CI, 0.60-0.80) for the Larsen classification and 0.68 (95% CI, 0.54-0.81) for the Sharp system, both indicating good agreement. There were no significant differences in the intraobserver or interobserver reliability of the systems overall and no significant differences in reliability between attending surgeons and trainees for either classification system. The Larsen and Sharp systems both show substantial intraobserver reliability and good interobserver agreement for the radiographic classification of rheumatoid arthritis affecting the elbow. Differences in training level did not result in substantial variances in reliability for either system. We conclude that both systems can be reliably used to evaluate rheumatoid arthritis of the elbow by observers of varying training levels. Copyright © 2017 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
How Many Sleep Diary Entries Are Needed to Reliably Estimate Adolescent Sleep?

PubMed

Short, Michelle A; Arora, Teresa; Gradisar, Michael; Taheri, Shahrad; Carskadon, Mary A

2017-03-01

To investigate (1) how many nights of sleep diary entries are required for reliable estimates of five sleep-related outcomes (bedtime, wake time, sleep onset latency [SOL], sleep duration, and wake after sleep onset [WASO]) and (2) the test-retest reliability of sleep diary estimates of school night sleep across 12 weeks. Data were drawn from four adolescent samples (Australia [n = 385], Qatar [n = 245], United Kingdom [n = 770], and United States [n = 366]), who provided 1766 eligible sleep diary weeks for reliability analyses. We performed reliability analyses for each cohort using complete data (7 days), one to five school nights, and one to two weekend nights. We also performed test-retest reliability analyses on 12-week sleep diary data available from a subgroup of 55 US adolescents. Intraclass correlation coefficients for bedtime, SOL, and sleep duration indicated good-to-excellent reliability from five weekday nights of sleep diary entries across all adolescent cohorts. Four school nights was sufficient for wake times in the Australian and UK samples, but not the US or Qatari samples. Only Australian adolescents showed good reliability for two weekend nights of bedtime reports; estimates of SOL were adequate for UK adolescents based on two weekend nights. WASO was not reliably estimated using 1 week of sleep diaries. We observed excellent test-rest reliability across 12 weeks of sleep diary data in a subsample of US adolescents. We recommend at least five weekday nights of sleep dairy entries to be made when studying adolescent bedtimes, SOL, and sleep duration. Adolescent sleep patterns were stable across 12 consecutive school weeks. © Sleep Research Society 2017. Published by Oxford University Press on behalf of the Sleep Research Society. All rights reserved. For permissions, please e-mail journals.permissions@oup.com.
Narrative review: should teaching of the respiratory physical examination be restricted only to signs with proven reliability and validity?

PubMed

Benbassat, Jochanan; Baumal, Reuben

2010-08-01

To review the reported reliability (reproducibility, inter-examiner agreement) and validity (sensitivity, specificity and likelihood ratios) of respiratory physical examination (PE) signs, and suggest an approach to teaching these signs to medical students. Review of the literature. We searched Paper Chase between 1966 and June 2009 to identify and evaluate published studies on the diagnostic accuracy of respiratory PE signs. Most studies have reported low to fair reliability and sensitivity values. However, some studies have found high specificites for selected PE signs. None of the studies that we reviewed adhered to all of the STARD criteria for reporting diagnostic accuracy. Possible flaws in study designs may have led to underestimates of the observed diagnostic accuracy of respiratory PE signs. The reported poor reliabilities may have been due to differences in the PE skills of the participating examiners, while the sensitivities may have been confounded by variations in the severity of the diseases of the participating patients. IMPLICATION FOR PRACTICE AND MEDICAL EDUCATION: Pending the results of properly controlled studies, the reported poor reliability and sensitivity of most respiratory PE signs do not necessarily detract from their clinical utility. Therefore, we believe that a meticulously performed respiratory PE, which aims to explore a diagnostic hypothesis, as opposed to a PE that aims to detect a disease in an asymptomatic person, remains a cornerstone of clinical practice. We propose teaching the respiratory PE signs according to their importance, beginning with signs of life-threatening conditions and those that have been reported to have a high specificity, and ending with signs that are "nice to know," but are no longer employed because of the availability of more easily performed tests.
Narrative Review: Should Teaching of the Respiratory Physical Examination Be Restricted Only to Signs with Proven Reliability and Validity?

PubMed Central

Baumal, Reuben

2010-01-01

OBJECTIVE To review the reported reliability (reproducibility, inter-examiner agreement) and validity (sensitivity, specificity and likelihood ratios) of respiratory physical examination (PE) signs, and suggest an approach to teaching these signs to medical students. METHODS Review of the literature. We searched Paper Chase between 1966 and June 2009 to identify and evaluate published studies on the diagnostic accuracy of respiratory PE signs. RESULTS Most studies have reported low to fair reliability and sensitivity values. However, some studies have found high specificites for selected PE signs. None of the studies that we reviewed adhered to all of the STARD criteria for reporting diagnostic accuracy. CONCLUSIONS Possible flaws in study designs may have led to underestimates of the observed diagnostic accuracy of respiratory PE signs. The reported poor reliabilities may have been due to differences in the PE skills of the participating examiners, while the sensitivities may have been confounded by variations in the severity of the diseases of the participating patients. IMPLICATION FOR PRACTICE AND MEDICAL EDUCATION Pending the results of properly controlled studies, the reported poor reliability and sensitivity of most respiratory PE signs do not necessarily detract from their clinical utility. Therefore, we believe that a meticulously performed respiratory PE, which aims to explore a diagnostic hypothesis, as opposed to a PE that aims to detect a disease in an asymptomatic person, remains a cornerstone of clinical practice. We propose teaching the respiratory PE signs according to their importance, beginning with signs of life-threatening conditions and those that have been reported to have a high specificity, and ending with signs that are "nice to know," but are no longer employed because of the availability of more easily performed tests. PMID:20349154
Direct maldi-tof mass spectrometry assay of blood culture broths for rapid identification of Candida species causing bloodstream infections: an observational study in two large microbiology laboratories.

PubMed

Spanu, Teresa; Posteraro, Brunella; Fiori, Barbara; D'Inzeo, Tiziana; Campoli, Serena; Ruggeri, Alberto; Tumbarello, Mario; Canu, Giulia; Trecarichi, Enrico Maria; Parisi, Gabriella; Tronci, Mirella; Sanguinetti, Maurizio; Fadda, Giovanni

2012-01-01

We evaluated the reliability of the Bruker Daltonik's MALDI Biotyper system in species-level identification of yeasts directly from blood culture bottles. Identification results were concordant with those of the conventional culture-based method for 95.9% of Candida albicans (187/195) and 86.5% of non-albicans Candida species (128/148). Results were available in 30 min (median), suggesting that this approach is a reliable, time-saving tool for routine identification of Candida species causing bloodstream infection.
Assessment of Reliable Change Using 95% Credible Intervals for the Differences in Proportions: A Statistical Analysis for Case-Study Methodology

ERIC Educational Resources Information Center

Unicomb, Rachael; Colyvas, Kim; Harrison, Elisabeth; Hewat, Sally

2015-01-01

Purpose: Case-study methodology studying change is often used in the field of speech-language pathology, but it can be criticized for not being statistically robust. Yet with the heterogeneous nature of many communication disorders, case studies allow clinicians and researchers to closely observe and report on change. Such information is valuable…
System reliability approaches for advanced propulsion system structures

NASA Technical Reports Server (NTRS)

Cruse, T. A.; Mahadevan, S.

1991-01-01

This paper identifies significant issues that pertain to the estimation and use of system reliability in the design of advanced propulsion system structures. Linkages between the reliabilities of individual components and their effect on system design issues such as performance, cost, availability, and certification are examined. The need for system reliability computation to address the continuum nature of propulsion system structures and synergistic progressive damage modes has been highlighted. Available system reliability models are observed to apply only to discrete systems. Therefore a sequential structural reanalysis procedure is formulated to rigorously compute the conditional dependencies between various failure modes. The method is developed in a manner that supports both top-down and bottom-up analyses in system reliability.
Visual-search model observer for assessing mass detection in CT

NASA Astrophysics Data System (ADS)

Karbaschi, Zohreh; Gifford, Howard C.

2017-03-01

Our aim is to devise model observers (MOs) to evaluate acquisition protocols in medical imaging. To optimize protocols for human observers, an MO must reliably interpret images containing quantum and anatomical noise under aliasing conditions. In this study of sampling parameters for simulated lung CT, the lesion-detection performance of human observers was compared with that of visual-search (VS) observers, a channelized nonprewhitening (CNPW) observer, and a channelized Hoteling (CH) observer. Scans of a mathematical torso phantom modeled single-slice parallel-hole CT with varying numbers of detector pixels and angular projections. Circular lung lesions had a fixed radius. Twodimensional FBP reconstructions were performed. A localization ROC study was conducted with the VS, CNPW and human observers, while the CH observer was applied in a location-known ROC study. Changing the sampling parameters had negligible effect on the CNPW and CH observers, whereas several VS observers demonstrated a sensitivity to sampling artifacts that was in agreement with how the humans performed.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.