Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz
2011-07-01
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data
Ji, Yuan; Xu, Yanxun; Zhang, Qiong; Tsui, Kam-Wah; Yuan, Yuan; Norris, Clift; Liang, Shoudan; Liang, Han
2011-01-01
Summary Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software. PMID:21517792
Power calculation for comparing diagnostic accuracies in a multi-reader, multi-test design.
Kim, Eunhee; Zhang, Zheng; Wang, Youdan; Zeng, Donglin
2014-12-01
Receiver operating characteristic (ROC) analysis is widely used to evaluate the performance of diagnostic tests with continuous or ordinal responses. A popular study design for assessing the accuracy of diagnostic tests involves multiple readers interpreting multiple diagnostic test results, called the multi-reader, multi-test design. Although several different approaches to analyzing data from this design exist, few methods have discussed the sample size and power issues. In this article, we develop a power formula to compare the correlated areas under the ROC curves (AUC) in a multi-reader, multi-test design. We present a nonparametric approach to estimate and compare the correlated AUCs by extending DeLong et al.'s (1988, Biometrics 44, 837-845) approach. A power formula is derived based on the asymptotic distribution of the nonparametric AUCs. Simulation studies are conducted to demonstrate the performance of the proposed power formula and an example is provided to illustrate the proposed procedure. © 2014, The International Biometric Society.
Paired split-plot designs of multireader multicase studies.
Chen, Weijie; Gong, Qi; Gallas, Brandon D
2018-07-01
The widely used multireader multicase ROC study design for comparing imaging modalities is the fully crossed (FC) design: every reader reads every case of both modalities. We investigate paired split-plot (PSP) designs that may allow for reduced cost and increased flexibility compared with the FC design. In the PSP design, case images from two modalities are read by the same readers, thereby the readings are paired across modalities. However, within each modality, not every reader reads every case. Instead, both the readers and the cases are partitioned into a fixed number of groups and each group of readers reads its own group of cases-a split-plot design. Using a [Formula: see text]-statistic based variance analysis for AUC (i.e., area under the ROC curve), we show analytically that precision can be gained by the PSP design as compared with the FC design with the same number of readers and readings. Equivalently, we show that the PSP design can achieve the same statistical power as the FC design with a reduced number of readings. The trade-off for the increased precision in the PSP design is the cost of collecting a larger number of truth-verified patient cases than the FC design. This means that one can trade-off between different sources of cost and choose a least burdensome design. We provide a validation study to show the iMRMC software can be reliably used for analyzing data from both FC and PSP designs. Finally, we demonstrate the advantages of the PSP design with a reader study comparing full-field digital mammography with screen-film mammography.
Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.
Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L
2012-12-01
Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.
Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis.
Hillis, Stephen L; Berbaum, Kevin S; Metz, Charles E
2008-05-01
The Dorfman-Berbaum-Metz (DBM) method has been one of the most popular methods for analyzing multireader receiver-operating characteristic (ROC) studies since it was proposed in 1992. Despite its popularity, the original procedure has several drawbacks: it is limited to jackknife accuracy estimates, it is substantially conservative, and it is not based on a satisfactory conceptual or theoretical model. Recently, solutions to these problems have been presented in three papers. Our purpose is to summarize and provide an overview of these recent developments. We present and discuss the recently proposed solutions for the various drawbacks of the original DBM method. We compare the solutions in a simulation study and find that they result in improved performance for the DBM procedure. We also compare the solutions using two real data studies and find that the modified DBM procedure that incorporates these solutions yields more significant results and clearer interpretations of the variance component parameters than the original DBM procedure. We recommend using the modified DBM procedure that incorporates the recent developments.
Chen, Weijie; Wunderlich, Adam; Petrick, Nicholas; Gallas, Brandon D
2014-10-01
We treat multireader multicase (MRMC) reader studies for which a reader's diagnostic assessment is converted to binary agreement (1: agree with the truth state, 0: disagree with the truth state). We present a mathematical model for simulating binary MRMC data with a desired correlation structure across readers, cases, and two modalities, assuming the expected probability of agreement is equal for the two modalities ([Formula: see text]). This model can be used to validate the coverage probabilities of 95% confidence intervals (of [Formula: see text], [Formula: see text], or [Formula: see text] when [Formula: see text]), validate the type I error of a superiority hypothesis test, and size a noninferiority hypothesis test (which assumes [Formula: see text]). To illustrate the utility of our simulation model, we adapt the Obuchowski-Rockette-Hillis (ORH) method for the analysis of MRMC binary agreement data. Moreover, we use our simulation model to validate the ORH method for binary data and to illustrate sizing in a noninferiority setting. Our software package is publicly available on the Google code project hosting site for use in simulation, analysis, validation, and sizing of MRMC reader studies with binary agreement data.
Chen, Weijie; Wunderlich, Adam; Petrick, Nicholas; Gallas, Brandon D.
2014-01-01
Abstract. We treat multireader multicase (MRMC) reader studies for which a reader’s diagnostic assessment is converted to binary agreement (1: agree with the truth state, 0: disagree with the truth state). We present a mathematical model for simulating binary MRMC data with a desired correlation structure across readers, cases, and two modalities, assuming the expected probability of agreement is equal for the two modalities (P1=P2). This model can be used to validate the coverage probabilities of 95% confidence intervals (of P1, P2, or P1−P2 when P1−P2=0), validate the type I error of a superiority hypothesis test, and size a noninferiority hypothesis test (which assumes P1=P2). To illustrate the utility of our simulation model, we adapt the Obuchowski–Rockette–Hillis (ORH) method for the analysis of MRMC binary agreement data. Moreover, we use our simulation model to validate the ORH method for binary data and to illustrate sizing in a noninferiority setting. Our software package is publicly available on the Google code project hosting site for use in simulation, analysis, validation, and sizing of MRMC reader studies with binary agreement data. PMID:26158051
Coping Strategies in Reading: Multi-Readers in the Norwegian General Education System
ERIC Educational Resources Information Center
Vik, Astrid Kristin; Fellenius, Kerstin
2007-01-01
Six primary school-aged braille students were taught to name 4 to 10 braille letters as phonemes and another 4 to 10 braille letters as graphemes (Study 1). They were then taught to name 10 braille words as onset-rimes and another 10 braille words as whole words (Study 2). Instruction in phonemes and onset rimes resulted in fewer trials and a…
Multi-Reader ROC studies with Split-Plot Designs: A Comparison of Statistical Methods
Obuchowski, Nancy A.; Gallas, Brandon D.; Hillis, Stephen L.
2012-01-01
Rationale and Objectives Multi-reader imaging trials often use a factorial design, where study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of the design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper we compare three methods of analysis for the split-plot design. Materials and Methods Three statistical methods are presented: Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean ANOVA approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power and confidence interval coverage of the three test statistics. Results The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% CIs fall close to the nominal coverage for small and large sample sizes. Conclusions The split-plot MRMC study design can be statistically efficient compared with the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rate, similar power, and nominal CI coverage, are available for this study design. PMID:23122570
The average receiver operating characteristic curve in multireader multicase imaging studies
Samuelson, F W
2014-01-01
Objective: In multireader, multicase (MRMC) receiver operating characteristic (ROC) studies for evaluating medical imaging systems, the area under the ROC curve (AUC) is often used as a summary metric. Owing to the limitations of AUC, plotting the average ROC curve to accompany the rigorous statistical inference on AUC is recommended. The objective of this article is to investigate methods for generating the average ROC curve from ROC curves of individual readers. Methods: We present both a non-parametric method and a parametric method for averaging ROC curves that produce a ROC curve, the area under which is equal to the average AUC of individual readers (a property we call area preserving). We use hypothetical examples, simulated data and a real-world imaging data set to illustrate these methods and their properties. Results: We show that our proposed methods are area preserving. We also show that the method of averaging the ROC parameters, either the conventional bi-normal parameters (a, b) or the proper bi-normal parameters (c, da), is generally not area preserving and may produce a ROC curve that is intuitively not an average of multiple curves. Conclusion: Our proposed methods are useful for making plots of average ROC curves in MRMC studies as a companion to the rigorous statistical inference on the AUC end point. The software implementing these methods is freely available from the authors. Advances in knowledge: Methods for generating the average ROC curve in MRMC ROC studies are formally investigated. The area-preserving criterion we defined is useful to evaluate such methods. PMID:24884728
Glemser, Philip A; Pfleiderer, Michael; Heger, Anna; Tremper, Jan; Krauskopf, Astrid; Schlemmer, Heinz-Peter; Yen, Kathrin; Simons, David
2017-03-01
The aim of this multi-reader feasibility study was to evaluate new post-processing CT imaging tools in rib fracture assessment of forensic cases by analyzing detection time and diagnostic accuracy. Thirty autopsy cases (20 with and 10 without rib fractures in autopsy) were randomly selected and included in this study. All cases received a native whole body CT scan prior to the autopsy procedure, which included dissection and careful evaluation of each rib. In addition to standard transverse sections (modality A), CT images were subjected to a reconstruction algorithm to compute axial labelling of the ribs (modality B) as well as "unfolding" visualizations of the rib cage (modality C, "eagle tool"). Three radiologists with different clinical and forensic experience who were blinded to autopsy results evaluated all cases in a random manner of modality and case. Rib fracture assessment of each reader was evaluated compared to autopsy and a CT consensus read as radiologic reference. A detailed evaluation of relevant test parameters revealed a better accordance to the CT consensus read as to the autopsy. Modality C was the significantly quickest rib fracture detection modality despite slightly reduced statistic test parameters compared to modalities A and B. Modern CT post-processing software is able to shorten reading time and to increase sensitivity and specificity compared to standard autopsy alone. The eagle tool as an easy to use tool is suited for an initial rib fracture screening prior to autopsy and can therefore be beneficial for forensic pathologists.
Ghobadi, Comeron W; Hayman, Emily L; Finkle, Joshua H; Walter, Jessica R; Xu, Shuai
2017-01-01
The aim of this study was to critically assess the clinical evidence leading to radiologic medical device approvals via the premarket approval pathway from 2000 to 2015. This study used the publically available FDA premarket database for radiologic device approvals over the past 15 years (September 1, 2000, to August 31, 2015). Approval characteristics were collected for each device, and statistical analysis was performed on the data for each pivotal trial. Additionally, methodological quality of the pivotal trial was determined using the Quality Assessment of Diagnostic Accuracy Studies tool. Twenty-three class III radiologic device approvals were identified, with breast imaging accounting for 16 (70%) and computer-aided detection software accounting for 9 (39%) approvals. The median premarket approval time was 475 days (range, 180-1,116). Twenty-one devices were approved on the basis of multireader, multicenter studies, one on the basis of a randomized controlled trial, and one on the basis of a preclinical technical equivalence trial. The median number of patients per pivotal trial was 201 (range, 25-3,946). Twenty-six of the 34 pivotal trials (76%) had at least one methodologic bias. Breast imaging devices had a greater number of patients per pivotal trial (P = .009) and more prospective studies. With regard to all modalities, increased time to device approval correlated with weaker trial quality (r = 0.600, P < .001). Radiologic devices are largely approved by multireader, multicenter studies, the recommended standard for assessing diagnostic technologies. Given that radiologic devices play a key role in modern medicine, further efforts should be made to increase transparency of clinical data leading to approval. Copyright © 2016 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Digital Breast Tomosynthesis: State of the Art
Vedantham, Srinivasan; Vijayaraghavan, Gopal R.; Kopans, Daniel B.
2015-01-01
This topical review on digital breast tomosynthesis (DBT) is provided with the intent of describing the state of the art in terms of technology, results from recent clinical studies, advanced applications, and ongoing efforts to develop multimodality imaging systems that include DBT. Particular emphasis is placed on clinical studies. The observations of increase in cancer detection rates, particularly for invasive cancers, and the reduction in false-positive rates with DBT in prospective trials indicate its benefit for breast cancer screening. Retrospective multireader multicase studies show either noninferiority or superiority of DBT compared with mammography. Methods to curtail radiation dose are of importance. © RSNA, 2015 PMID:26599926
Fallenberg, Eva M; Schmitzberger, Florian F; Amer, Heba; Ingold-Heppner, Barbara; Balleyguier, Corinne; Diekmann, Felix; Engelken, Florian; Mann, Ritse M; Renz, Diane M; Bick, Ulrich; Hamm, Bernd; Dromain, Clarisse
2017-07-01
To compare the diagnostic performance of contrast-enhanced spectral mammography (CESM) to digital mammography (MG) and magnetic resonance imaging (MRI) in a prospective two-centre, multi-reader study. One hundred seventy-eight women (mean age 53 years) with invasive breast cancer and/or DCIS were included after ethics board approval. MG, CESM and CESM + MG were evaluated by three blinded radiologists based on amended ACR BI-RADS criteria. MRI was assessed by another group of three readers. Receiver-operating characteristic (ROC) curves were compared. Size measurements for the 70 lesions detected by all readers in each modality were correlated with pathology. Reading results for 604 lesions were available (273 malignant, 4 high-risk, 327 benign). The area under the ROC curve was significantly larger for CESM alone (0.84) and CESM + MG (0.83) compared to MG (0.76) (largest advantage in dense breasts) while it was not significantly different from MRI (0.85). Pearson correlation coefficients for size comparison were 0.61 for MG, 0.69 for CESM, 0.70 for CESM + MG and 0.79 for MRI. This study showed that CESM, alone and in combination with MG, is as accurate as MRI but is superior to MG for lesion detection. Patients with dense breasts benefitted most from CESM with the smallest additional dose compared to MG. • CESM has comparable diagnostic performance (ROC-AUC) to MRI for breast cancer diagnostics. • CESM in combination with MG does not improve diagnostic performance. • CESM has lower sensitivity but higher specificity than MRI. • Sensitivity differences are more pronounced in dense and not significant in non-dense breasts. • CESM and MRI are significantly superior to MG, particularly in dense breasts.
NASA Astrophysics Data System (ADS)
Obuchowski, Nancy A.; Bullen, Jennifer A.
2018-04-01
Receiver operating characteristic (ROC) analysis is a tool used to describe the discrimination accuracy of a diagnostic test or prediction model. While sensitivity and specificity are the basic metrics of accuracy, they have many limitations when characterizing test accuracy, particularly when comparing the accuracies of competing tests. In this article we review the basic study design features of ROC studies, illustrate sample size calculations, present statistical methods for measuring and comparing accuracy, and highlight commonly used ROC software. We include descriptions of multi-reader ROC study design and analysis, address frequently seen problems of verification and location bias, discuss clustered data, and provide strategies for testing endpoints in ROC studies. The methods are illustrated with a study of transmission ultrasound for diagnosing breast lesions.
Mallett, Susan; Halligan, Steve; Collins, Gary S.; Altman, Doug G.
2014-01-01
Background Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. Methods In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Results Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. Conclusions The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests. PMID:25353643
Mallett, Susan; Halligan, Steve; Collins, Gary S; Altman, Doug G
2014-01-01
Different methods of evaluating diagnostic performance when comparing diagnostic tests may lead to different results. We compared two such approaches, sensitivity and specificity with area under the Receiver Operating Characteristic Curve (ROC AUC) for the evaluation of CT colonography for the detection of polyps, either with or without computer assisted detection. In a multireader multicase study of 10 readers and 107 cases we compared sensitivity and specificity, using radiological reporting of the presence or absence of polyps, to ROC AUC calculated from confidence scores concerning the presence of polyps. Both methods were assessed against a reference standard. Here we focus on five readers, selected to illustrate issues in design and analysis. We compared diagnostic measures within readers, showing that differences in results are due to statistical methods. Reader performance varied widely depending on whether sensitivity and specificity or ROC AUC was used. There were problems using confidence scores; in assigning scores to all cases; in use of zero scores when no polyps were identified; the bimodal non-normal distribution of scores; fitting ROC curves due to extrapolation beyond the study data; and the undue influence of a few false positive results. Variation due to use of different ROC methods exceeded differences between test results for ROC AUC. The confidence scores recorded in our study violated many assumptions of ROC AUC methods, rendering these methods inappropriate. The problems we identified will apply to other detection studies using confidence scores. We found sensitivity and specificity were a more reliable and clinically appropriate method to compare diagnostic tests.
Hostetter, Jason; Khanna, Nishanth; Mandell, Jacob C
2018-06-01
The purpose of this study was to integrate web-based forms with a zero-footprint cloud-based Picture Archiving and Communication Systems (PACS) to create a tool of potential benefit to radiology research and education. Web-based forms were created with a front-end and back-end architecture utilizing common programming languages including Vue.js, Node.js and MongoDB, and integrated into an existing zero-footprint cloud-based PACS. The web-based forms application can be accessed in any modern internet browser on desktop or mobile devices and allows the creation of customizable forms consisting of a variety of questions types. Each form can be linked to an individual DICOM examination or a collection of DICOM examinations. Several uses are demonstrated through a series of case studies, including implementation of a research platform for multi-reader multi-case (MRMC) studies and other imaging research, and creation of an online Objective Structure Clinical Examination (OSCE) and an educational case file. Copyright © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Kawashima, Hiroki; Hayashi, Norio; Ohno, Naoki; Matsuura, Yukihiro; Sanada, Shigeru
2015-08-01
To evaluate the patient identification ability of radiographers, previous and current chest radiographs were assessed with observer study utilizing a receiver operating characteristics (ROCs) analysis. This study included portable and conventional chest radiographs from 43 same and 43 different patients. The dataset used in this study was divided into the three following groups: (1) a pair of portable radiographs, (2) a pair of conventional radiographs, and (3) a combination of each type of radiograph. Seven observers participated in this ROC study, which aimed to identify same or different patients, using these datasets. ROC analysis was conducted to calculate the average area under ROC curve obtained by each observer (AUCave), and a statistical test was performed using the multi-reader multi-case method. Comparable results were obtained with pairs of portable (AUCave: 0.949) and conventional radiographs (AUCave: 0.951). In a comparison between the same modality, there were no significant differences. In contrast, the ability to identify patients by comparing a portable and conventional radiograph (AUCave: 0.873) was lower than with the matching datasets (p=0.002 and p=0.004, respectively). In conclusion, the use of different imaging modalities reduces radiographers' ability to identify their patients.
Balleyguier, Corinne; Arfi-Rouche, Julia; Levy, Laurent; Toubiana, Patrick R; Cohen-Scali, Franck; Toledano, Alicia Y; Boyer, Bruno
2017-12-01
Evaluate concurrent Computer-Aided Detection (CAD) with Digital Breast Tomosynthesis (DBT) to determine impact on radiologist performance and reading time. The CAD system detects and extracts suspicious masses, architectural distortions and asymmetries from DBT planes that are blended into corresponding synthetic images to form CAD-enhanced synthetic images. Review of CAD-enhanced images and navigation to corresponding planes to confirm or dismiss potential lesions allows radiologists to more quickly review DBT planes. A retrospective, crossover study with and without CAD was conducted with six radiologists who read an enriched sample of 80 DBT cases including 23 malignant lesions in 21 women. Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) compared the readings with and without CAD to determine the effect of CAD on overall interpretation performance. Sensitivity, specificity, recall rate and reading time were also assessed. Multi-reader, multi-case (MRMC) methods accounting for correlation and requiring correct lesion localization were used to analyze all endpoints. AUCs were based on a 0-100% probability of malignancy (POM) score. Sensitivity and specificity were based on BI-RADS scores, where 3 or higher was positive. Average AUC across readers without CAD was 0.854 (range: 0.785-0.891, 95% confidence interval (CI): 0.769,0.939) and 0.850 (range: 0.746-0.905, 95% CI: 0.751,0.949) with CAD (95% CI for difference: -0.046,0.039), demonstrating non-inferiority of AUC. Average reduction in reading time with CAD was 23.5% (95% CI: 7.0-37.0% improvement), from an average 48.2 (95% CI: 39.1,59.6) seconds without CAD to 39.1 (95% CI: 26.2,54.5) seconds with CAD. Per-patient sensitivity was the same with and without CAD (0.865; 95% CI for difference: -0.070,0.070), and there was a small 0.022 improvement (95% CI for difference: -0.046,0.089) in per-lesion sensitivity from 0.790 without CAD to 0.812 with CAD. A slight reduction in specificity with a -0.014 difference (95% CI for difference: -0.079,0.050) and a small 0.025 increase (95% CI for difference: -0.036,0.087) in recall rate in non-cancer cases were observed with CAD. Concurrent CAD resulted in faster reading time with non-inferiority of radiologist interpretation performance. Radiologist sensitivity, specificity and recall rate were similar with and without CAD. Copyright © 2017 Elsevier B.V. All rights reserved.
ABMapper: a suffix array-based tool for multi-location searching and splice-junction mapping.
Lou, Shao-Ke; Ni, Bing; Lo, Leung-Yau; Tsui, Stephen Kwok-Wing; Chan, Ting-Fung; Leung, Kwong-Sak
2011-02-01
Sequencing reads generated by RNA-sequencing (RNA-seq) must first be mapped back to the genome through alignment before they can be further analyzed. Current fast and memory-saving short-read mappers could give us a quick view of the transcriptome. However, they are neither designed for reads that span across splice junctions nor for repetitive reads, which can be mapped to multiple locations in the genome (multi-reads). Here, we describe a new software package: ABMapper, which is specifically designed for exploring all putative locations of reads that are mapped to splice junctions or repetitive in nature. The software is freely available at: http://abmapper.sourceforge.net/. The software is written in C++ and PERL. It runs on all major platforms and operating systems including Windows, Mac OS X and LINUX.
A multiple reader scoring system for Nasal Potential Difference parameters.
Solomon, George M; Liu, Bo; Sermet-Gaudelus, Isabelle; Fajac, Isabelle; Wilschanski, Michael; Vermeulen, Francois; Rowe, Steven M
2017-09-01
Nasal Potential Difference (NPD) is a biomarker of CFTR activity used to diagnose CF and monitor experimental therapies. Limited studies have been performed to assess agreement between expert readers of NPD interpretation using a scoring algorithm. We developed a standardized scoring algorithm for "interpretability" and "confidence" for PD (potential difference) measures, and sought to determine the degree of agreement on NPD parameters between trained readers. There was excellent agreement for interpretability between NPD readers for CF and fair agreement for normal tracings but slight agreement of interpretability in indeterminate tracings. Amongst interpretable tracings, excellent correlation of mean scores for Ringer's Baseline PD, Δ amiloride , and Δ Cl-free+Isoproterenol was observed. There was slight agreement regarding confidence of the interpretable PD tracings, resulting in divergence of the Ringers and Δ amiloride , and ΔCl -free+Isoproterenol PDs between "high" and "low" confidence CF tracings. A multi-reader process with adjudication is important for scoring NPDs for diagnosis and in monitoring of CF clinical trials. Copyright © 2017 European Cystic Fibrosis Society. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Platisa, Ljiljana; Vansteenkiste, Ewout; Goossens, Bart; Marchessoux, Cédric; Kimpe, Tom; Philips, Wilfried
2009-02-01
Medical-imaging systems are designed to aid medical specialists in a specific task. Therefore, the physical parameters of a system need to optimize the task performance of a human observer. This requires measurements of human performance in a given task during the system optimization. Typically, psychophysical studies are conducted for this purpose. Numerical observer models have been successfully used to predict human performance in several detection tasks. Especially, the task of signal detection using a channelized Hotelling observer (CHO) in simulated images has been widely explored. However, there are few studies done for clinically acquired images that also contain anatomic noise. In this paper, we investigate the performance of a CHO in the task of detecting lung nodules in real radiographic images of the chest. To evaluate variability introduced by the limited available data, we employ a commonly used study of a multi-reader multi-case (MRMC) scenario. It accounts for both case and reader variability. Finally, we use the "oneshot" methods to estimate the MRMC variance of the area under the ROC curve (AUC). The obtained AUC compares well to those reported for human observer study on a similar data set. Furthermore, the "one-shot" analysis implies a fairly consistent performance of the CHO with the variance of AUC below 0.002. This indicates promising potential for numerical observers in optimization of medical imaging displays and encourages further investigation on the subject.
Incorporation of a clinical history into the interpretation process in a PACS environment
NASA Astrophysics Data System (ADS)
Cooperstein, Lawrence A.; Good, Barbara C.; Miketic, Linda M.; Tabor, Ellen K.; Yousem, Samuel A.; King, Jill L.; Gennari, Rose C.; Felice, Marc A.; Sidorovich, Kathleen
1990-08-01
In a large-scale, multi-reader study to investigate questions surrounding the issue of the implementation of picture archiving and communications systems (PACS) into the modern radiology environment, we examined the effect that the incorporation of a clinical history into the reading process would have on levels of diagnostic accuracy. Because we wanted to test the inclusion of the clinical history in an environment as close to that of the clinical situation as possible, we defined "history" to be a concise, objective, and potentially computer-extractable version of what appears in the patient records, including a statement from the referring physician when this is available. In a series of studies, four radiologists interpreted 247 posteroanterior normal and abnormal chest images on conventional film both with and without accompanying patient histories; five radiologists read the same number of images presented on a high-resolution video workstation with and without clinical histories. There were no significant differences (p = .05) in diagnostic accuracy rates with or without clinical history for either the film or the workstation in cases of interstitial disease, nodules, or pneumothorax. Diagnostic accuracy for the radiologists as a group was not affected by the presence of the clinical history. We concluded that for the interpretation of these abnormalities, the incorporation of clinical history with images in the PACS environment should not be a major goal.
MRMC analysis of agreement studies
NASA Astrophysics Data System (ADS)
Gallas, Brandon D.; Anam, Amrita; Chen, Weijie; Wunderlich, Adam; Zhang, Zhiwei
2016-03-01
The purpose of this work is to present and evaluate methods based on U-statistics to compare intra- or inter-reader agreement across different imaging modalities. We apply these methods to multi-reader multi-case (MRMC) studies. We measure reader-averaged agreement and estimate its variance accounting for the variability from readers and cases (an MRMC analysis). In our application, pathologists (readers) evaluate patient tissue mounted on glass slides (cases) in two ways. They evaluate the slides on a microscope (reference modality) and they evaluate digital scans of the slides on a computer display (new modality). In the current work, we consider concordance as the agreement measure, but many of the concepts outlined here apply to other agreement measures. Concordance is the probability that two readers rank two cases in the same order. Concordance can be estimated with a U-statistic and thus it has some nice properties: it is unbiased, asymptotically normal, and its variance is given by an explicit formula. Another property of a U-statistic is that it is symmetric in its inputs; it doesn't matter which reader is listed first or which case is listed first, the result is the same. Using this property and a few tricks while building the U-statistic kernel for concordance, we get a mathematically tractable problem and efficient software. Simulations show that our variance and covariance estimates are unbiased.
Shih, Joanna H; Greer, Matthew D; Turkbey, Baris
2018-03-16
To point out the problems with Cohen kappa statistic and to explore alternative metrics to determine interobserver agreement on lesion detection when locations are not prespecified. Use of kappa and two alternative methods, namely index of specific agreement (ISA) and modified kappa, for measuring interobserver agreement on the location of detected lesions are presented. These indices of agreement are illustrated by application to a retrospective multireader study in which nine readers detected and scored prostate cancer lesions in 163 consecutive patients (n = 110 cases, n = 53 controls) using the guideline of Prostate Imaging Reporting and Data System version 2 on multiparametric magnetic resonance imaging. The proposed modified kappa, which properly corrects for the amount of agreement by chance, is shown to be approximately equivalent to the ISA. In the prostate cancer data, average kappa, modified kappa, and ISA equaled 30%, 55%, and 57%, respectively, for all lesions and 20%, 87%, and 87%, respectively, for index lesions. The application of kappa could result in a substantial downward bias in reader agreement on lesion detection when locations are not prespecified. ISA is recommended for assessment of reader agreement on lesion detection. Published by Elsevier Inc.
Demonstration of Multi- and Single-Reader Sample Size Program for Diagnostic Studies software.
Hillis, Stephen L; Schartz, Kevin M
2015-02-01
The recently released software Multi- and Single-Reader Sample Size Sample Size Program for Diagnostic Studies , written by Kevin Schartz and Stephen Hillis, performs sample size computations for diagnostic reader-performance studies. The program computes the sample size needed to detect a specified difference in a reader performance measure between two modalities, when using the analysis methods initially proposed by Dorfman, Berbaum, and Metz (DBM) and Obuchowski and Rockette (OR), and later unified and improved by Hillis and colleagues. A commonly used reader performance measure is the area under the receiver-operating-characteristic curve. The program can be used with typical common reader-performance measures which can be estimated parametrically or nonparametrically. The program has an easy-to-use step-by-step intuitive interface that walks the user through the entry of the needed information. Features of the software include the following: (1) choice of several study designs; (2) choice of inputs obtained from either OR or DBM analyses; (3) choice of three different inference situations: both readers and cases random, readers fixed and cases random, and readers random and cases fixed; (4) choice of two types of hypotheses: equivalence or noninferiority; (6) choice of two output formats: power for specified case and reader sample sizes, or a listing of case-reader combinations that provide a specified power; (7) choice of single or multi-reader analyses; and (8) functionality in Windows, Mac OS, and Linux.
Lee, S Y; Lee, K J
2001-04-01
To develop a personal optically stimulated luminescence (OSL) dosimetry system for mixed radiation fields using alpha-Al2O3:C, a discriminating badge filter system was designed by taking advantage of its optically stimulable properties and energy dependencies. This was done by designing a multi-element badge system for powder layered alpha-Al2O3:C material and an optical reader system based on high-intensity blue light-emitting diode (LED). The design of the multielement OSL dosimeter badge system developed allows the measurement of a personal dose equivalent value Hp(d) in mixed radiation fields of beta and gamma. Dosimetric properties of the personal OSL dosimeter badge system investigated here were the dose response, energy response and multi-readability. Based on the computational simulations and experiments of the proposed dosimeter design, it was demonstrated that a multi-element dosimeter system with an OSL technology based on alpha-Al2O3:C is suitable to obtain personal dose equivalent information in mixed radiation fields.
NASA Astrophysics Data System (ADS)
Abbey, Craig K.; Samuelson, Frank W.; Gallas, Brandon D.; Boone, John M.; Niklason, Loren T.
2013-03-01
The receiver operating characteristic (ROC) curve has become a common tool for evaluating diagnostic imaging technologies, and the primary endpoint of such evaluations is the area under the curve (AUC), which integrates sensitivity over the entire false positive range. An alternative figure of merit for ROC studies is expected utility (EU), which focuses on the relevant region of the ROC curve as defined by disease prevalence and the relative utility of the task. However if this measure is to be used, it must also have desirable statistical properties keep the burden of observer performance studies as low as possible. Here, we evaluate effect size and variability for EU and AUC. We use two observer performance studies recently submitted to the FDA to compare the EU and AUC endpoints. The studies were conducted using the multi-reader multi-case methodology in which all readers score all cases in all modalities. ROC curves from the study were used to generate both the AUC and EU values for each reader and modality. The EU measure was computed assuming an iso-utility slope of 1.03. We find mean effect sizes, the reader averaged difference between modalities, to be roughly 2.0 times as big for EU as AUC. The standard deviation across readers is roughly 1.4 times as large, suggesting better statistical properties for the EU endpoint. In a simple power analysis of paired comparison across readers, the utility measure required 36% fewer readers on average to achieve 80% statistical power compared to AUC.
Re-use of pilot data and interim analysis of pivotal data in MRMC studies: a simulation study
NASA Astrophysics Data System (ADS)
Chen, Weijie; Samuelson, Frank; Sahiner, Berkman; Petrick, Nicholas
2017-03-01
Novel medical imaging devices are often evaluated with multi-reader multi-case (MRMC) studies in which radiologists read images of patient cases for a specified clinical task (e.g., cancer detection). A pilot study is often used to measure the effect size and variance parameters that are necessary for sizing a pivotal study (including sizing readers, non-diseased and diseased cases). Due to the practical difficulty of collecting patient cases or recruiting clinical readers, some investigators attempt to include the pilot data as part of their pivotal study. In other situations, some investigators attempt to perform an interim analysis of their pivotal study data based upon which the sample sizes may be re-estimated. Re-use of the pilot data or interim analyses of the pivotal data may inflate the type I error of the pivotal study. In this work, we use the Roe and Metz model to simulate MRMC data under the null hypothesis (i.e., two devices have equal diagnostic performance) and investigate the type I error rate for several practical designs involving re-use of pilot data or interim analysis of pivotal data. Our preliminary simulation results indicate that, under the simulation conditions we investigated, the inflation of type I error is none or only marginal for some design strategies (e.g., re-use of patient data without re-using readers, and size re-estimation without using the effect-size estimated in the interim analysis). Upon further verifications, these are potentially useful design methods in that they may help make a study less burdensome and have a better chance to succeed without substantial loss of the statistical rigor.
Probing the Boundaries of Orthology: The Unanticipated Rapid Evolution of Drosophila centrosomin
Eisman, Robert C.; Kaufman, Thomas C.
2013-01-01
The rapid evolution of essential developmental genes and their protein products is both intriguing and problematic. The rapid evolution of gene products with simple protein folds and a lack of well-characterized functional domains typically result in a low discovery rate of orthologous genes. Additionally, in the absence of orthologs it is difficult to study the processes and mechanisms underlying rapid evolution. In this study, we have investigated the rapid evolution of centrosomin (cnn), an essential gene encoding centrosomal protein isoforms required during syncytial development in Drosophila melanogaster. Until recently the rapid divergence of cnn made identification of orthologs difficult and questionable because Cnn violates many of the assumptions underlying models for protein evolution. To overcome these limitations, we have identified a group of insect orthologs and present conserved features likely to be required for the functions attributed to cnn in D. melanogaster. We also show that the rapid divergence of Cnn isoforms is apparently due to frequent coding sequence indels and an accelerated rate of intronic additions and eliminations. These changes appear to be buffered by multi-exon and multi-reading frame maximum potential ORFs, simple protein folds, and the splicing machinery. These buffering features also occur in other genes in Drosophila and may help prevent potentially deleterious mutations due to indels in genes with large coding exons and exon-dense regions separated by small introns. This work promises to be useful for future investigations of cnn and potentially other rapidly evolving genes and proteins. PMID:23749319
An MILP-based cross-layer optimization for a multi-reader arbitration in the UHF RFID system.
Choi, Jinchul; Lee, Chaewoo
2011-01-01
In RFID systems, the performance of each reader such as interrogation range and tag recognition rate may suffer from interferences from other readers. Since the reader interference can be mitigated by output signal power control, spectral and/or temporal separation among readers, the system performance depends on how to adapt the various reader arbitration metrics such as time, frequency, and output power to the system environment. However, complexity and difficulty of the optimization problem increase with respect to the variety of the arbitration metrics. Thus, most proposals in previous study have been suggested to primarily prevent the reader collision with consideration of one or two arbitration metrics. In this paper, we propose a novel cross-layer optimization design based on the concept of combining time division, frequency division, and power control not only to solve the reader interference problem, but also to achieve the multiple objectives such as minimum interrogation delay, maximum reader utilization, and energy efficiency. Based on the priority of the multiple objectives, our cross-layer design optimizes the system sequentially by means of the mixed-integer linear programming. In spite of the multi-stage optimization, the optimization design is formulated as a concise single mathematical form by properly assigning a weight to each objective. Numerical results demonstrate the effectiveness of the proposed optimization design.
An MILP-Based Cross-Layer Optimization for a Multi-Reader Arbitration in the UHF RFID System
Choi, Jinchul; Lee, Chaewoo
2011-01-01
In RFID systems, the performance of each reader such as interrogation range and tag recognition rate may suffer from interferences from other readers. Since the reader interference can be mitigated by output signal power control, spectral and/or temporal separation among readers, the system performance depends on how to adapt the various reader arbitration metrics such as time, frequency, and output power to the system environment. However, complexity and difficulty of the optimization problem increase with respect to the variety of the arbitration metrics. Thus, most proposals in previous study have been suggested to primarily prevent the reader collision with consideration of one or two arbitration metrics. In this paper, we propose a novel cross-layer optimization design based on the concept of combining time division, frequency division, and power control not only to solve the reader interference problem, but also to achieve the multiple objectives such as minimum interrogation delay, maximum reader utilization, and energy efficiency. Based on the priority of the multiple objectives, our cross-layer design optimizes the system sequentially by means of the mixed-integer linear programming. In spite of the multi-stage optimization, the optimization design is formulated as a concise single mathematical form by properly assigning a weight to each objective. Numerical results demonstrate the effectiveness of the proposed optimization design. PMID:22163743
Barr, R Graham; Berkowitz, Eugene A; Bigazzi, Francesca; Bode, Frederick; Bon, Jessica; Bowler, Russell P; Chiles, Caroline; Crapo, James D; Criner, Gerard J; Curtis, Jeffrey L; Dass, Chandra; Dirksen, Asger; Dransfield, Mark T; Edula, Goutham; Erikkson, Leif; Friedlander, Adam; Galperin-Aizenberg, Maya; Gefter, Warren B; Gierada, David S; Grenier, Philippe A; Goldin, Jonathan; Han, MeiLan K; Hanania, Nicola A; Hansel, Nadia N; Jacobson, Francine L; Kauczor, Hans-Ulrich; Kinnula, Vuokko L; Lipson, David A; Lynch, David A; MacNee, William; Make, Barry J; Mamary, A James; Mann, Howard; Marchetti, Nathaniel; Mascalchi, Mario; McLennan, Geoffrey; Murphy, James R; Naidich, David; Nath, Hrudaya; Newell, John D; Pistolesi, Massimo; Regan, Elizabeth A; Reilly, John J; Sandhaus, Robert; Schroeder, Joyce D; Sciurba, Frank; Shaker, Saher; Sharafkhaneh, Amir; Silverman, Edwin K; Steiner, Robert M; Strange, Charlton; Sverzellati, Nicola; Tashjian, Joseph H; van Beek, Edwin J R; Washington, Lacey; Washko, George R; Westney, Gloria; Wood, Susan A; Woodruff, Prescott G
2012-04-01
The purposes of this study were: to describe chest CT findings in normal non-smoking controls and cigarette smokers with and without COPD; to compare the prevalence of CT abnormalities with severity of COPD; and to evaluate concordance between visual and quantitative chest CT (QCT) scoring. Volumetric inspiratory and expiratory CT scans of 294 subjects, including normal non-smokers, smokers without COPD, and smokers with GOLD Stage I-IV COPD, were scored at a multi-reader workshop using a standardized worksheet. There were 58 observers (33 pulmonologists, 25 radiologists); each scan was scored by 9-11 observers. Interobserver agreement was calculated using kappa statistic. Median score of visual observations was compared with QCT measurements. Interobserver agreement was moderate for the presence or absence of emphysema and for the presence of panlobular emphysema; fair for the presence of centrilobular, paraseptal, and bullous emphysema subtypes and for the presence of bronchial wall thickening; and poor for gas trapping, centrilobular nodularity, mosaic attenuation, and bronchial dilation. Agreement was similar for radiologists and pulmonologists. The prevalence on CT readings of most abnormalities (e.g. emphysema, bronchial wall thickening, mosaic attenuation, expiratory gas trapping) increased significantly with greater COPD severity, while the prevalence of centrilobular nodularity decreased. Concordances between visual scoring and quantitative scoring of emphysema, gas trapping and airway wall thickening were 75%, 87% and 65%, respectively. Despite substantial inter-observer variation, visual assessment of chest CT scans in cigarette smokers provides information regarding lung disease severity; visual scoring may be complementary to quantitative evaluation.
Lalji, U C; Houben, I P L; Prevos, R; Gommers, S; van Goethem, M; Vanwetswinkel, S; Pijnappel, R; Steeman, R; Frotscher, C; Mok, W; Nelemans, P; Smidt, M L; Beets-Tan, R G; Wildberger, J E; Lobbes, M B I
2016-12-01
Contrast-enhanced spectral mammography (CESM) is a promising problem-solving tool in women referred from a breast cancer screening program. We aimed to study the validity of preliminary results of CESM using a larger panel of radiologists with different levels of CESM experience. All women referred from the Dutch breast cancer screening program were eligible for CESM. 199 consecutive cases were viewed by ten radiologists. Four had extensive CESM experience, three had no CESM experience but were experienced breast radiologists, and three were residents. All readers provided a BI-RADS score for the low-energy CESM images first, after which the score could be adjusted when viewing the entire CESM exam. BI-RADS 1-3 were considered benign and BI-RADS 4-5 malignant. With this cutoff, we calculated sensitivity, specificity and area under the ROC curve. CESM increased diagnostic accuracy in all readers. The performance for all readers using CESM was: sensitivity 96.9 % (+3.9 %), specificity 69.7 % (+33.8 %) and area under the ROC curve 0.833 (+0.188). CESM is superior to conventional mammography, with excellent problem-solving capabilities in women referred from the breast cancer screening program. Previous results were confirmed even in a larger panel of readers with varying CESM experience. • CESM is consistently superior to conventional mammography • CESM increases diagnostic accuracy regardless of a reader's experience • CESM is an excellent problem-solving tool in recalls from screening programs.
Rcount: simple and flexible RNA-Seq read counting.
Schmid, Marc W; Grossniklaus, Ueli
2015-02-01
Analysis of differential gene expression by RNA sequencing (RNA-Seq) is frequently done using feature counts, i.e. the number of reads mapping to a gene. However, commonly used count algorithms (e.g. HTSeq) do not address the problem of reads aligning with multiple locations in the genome (multireads) or reads aligning with positions where two or more genes overlap (ambiguous reads). Rcount specifically addresses these issues. Furthermore, Rcount allows the user to assign priorities to certain feature types (e.g. higher priority for protein-coding genes compared to rRNA-coding genes) or to add flanking regions. Rcount provides a fast and easy-to-use graphical user interface requiring no command line or programming skills. It is implemented in C++ using the SeqAn (www.seqan.de) and the Qt libraries (qt-project.org). Source code and 64 bit binaries for (Ubuntu) Linux, Windows (7) and MacOSX are released under the GPLv3 license and are freely available on github.com/MWSchmid/Rcount. marcschmid@gmx.ch Test data, genome annotation files, useful Python and R scripts and a step-by-step user guide (including run-time and memory usage tests) are available on github.com/MWSchmid/Rcount. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
2012-01-01
Introduction The purpose of this study was to compare the diagnostic accuracy of dual-energy contrast-enhanced digital mammography (CEDM) as an adjunct to mammography (MX) ± ultrasonography (US) with the diagnostic accuracy of MX ± US alone. Methods One hundred ten consenting women with 148 breast lesions (84 malignant, 64 benign) underwent two-view dual-energy CEDM in addition to MX and US using a specially modified digital mammography system (Senographe DS, GE Healthcare). Reference standard was histology for 138 lesions and follow-up for 12 lesions. Six radiologists from 4 institutions interpreted the images using high-resolution softcopy workstations. Confidence of presence (5-point scale), probability of cancer (7-point scale), and BI-RADS scores were evaluated for each finding. Sensitivity, specificity and ROC curve areas were estimated for each reader and overall. Visibility of findings on MX ± CEDM and MX ± US was evaluated with a Likert scale. Results The average per-lesion sensitivity across all readers was significantly higher for MX ± US ± CEDM than for MX ± US (0.78 vs. 0.71 using BIRADS, p = 0.006). All readers improved their clinical performance and the average area under the ROC curve was significantly superior for MX ± US ± CEDM than for MX ± US ((0.87 vs 0.83, p = 0.045). Finding visibility was similar or better on MX ± CEDM than MX ± US in 80% of cases. Conclusions Dual-energy contrast-enhanced digital mammography as an adjunct to MX ± US improves diagnostic accuracy compared to MX ± US alone. Addition of iodinated contrast agent to MX facilitates the visualization of breast lesions. PMID:22697607
NASA Astrophysics Data System (ADS)
Welge, Weston A.; DeMarco, Andrew T.; Watson, Jennifer M.; Rice, Photini S.; Barton, Jennifer K.; Kupinski, Matthew A.
2014-03-01
Ovarian cancer is particularly deadly because it is usually diagnosed after it has begun to spread. Transvaginal sonography (TVS) is the most common imaging screening technique. However, routine use of TVS has not reduced ovarian cancer mortality. The superior resolution of optical imaging techniques may make them attractive alternatives to TVS. We have previously identified features of ovarian cancer using optical coherence tomography (OCT) and secondharmonic generation (SHG) microscopy (with collagen as the targeted fluorophore). OCT provides a gross anatomical image of the ovary while SHG provides a closer look at a particular region. Knowing these anatomical features, we sought to investigate the diagnostic potential of OCT and SHG. We conducted a fully crossed, multi-reader, multi-case study using seven human observers. Each observer classified 44 ex vivo mouse ovaries as normal or abnormal from OCT, SHG, and simultaneous, co-registered OCT and SHG images and provided a confidence rating on a three-point ordinal scale. We determined the average receiver operating characteristic (ROC) curves, area under the ROC curves (AUC), and other quantitative figures of merit. The results show that OCT has diagnostic potential with an average AUC of 0.91 +/- 0.03. The average AUC for SHG was less promising at 0.71 +/- 0.06. Interestingly, the average AUC for simultaneous, co-registered OCT and SHG was not significantly different from OCT alone. This suggests that collagen may not be a useful fluorophore for ovarian cancer screening. The high performance of OCT warrants further investigation.
Comparison of Breast Density Between Synthesized Versus Standard Digital Mammography.
Haider, Irfanullah; Morgan, Matthew; McGow, Anna; Stein, Matthew; Rezvani, Maryam; Freer, Phoebe; Hu, Nan; Fajardo, Laurie; Winkler, Nicole
2018-06-12
To evaluate perceptual difference in breast density classification using synthesized mammography (SM) compared with standard or full-field digital mammography (FFDM) for screening. This institutional review board-approved, retrospective, multireader study evaluated breast density on 200 patients who underwent baseline screening mammogram during which both SM and FFDM were obtained contemporaneously from June 1, 2016, through November 30, 2016. Qualitative breast density was independently assigned by seven readers initially evaluating FFDM alone. Then, in a separate session, these same readers assigned breast density using synthetic views alone on the same 200 patients. The readers were again blinded to each other's assignment. Qualitative density assessment was based on BI-RADS fifth edition. Interreader agreement was evaluated with κ statistic using 95% confidence intervals. Testing for homogeneity in paired proportions was performed using McNemar's test with a level of significance of .05. For patients across the SM and standard 2-D data set, diagnostic testing with McNemar's test with P = 0.32 demonstrates that the minimal density transitions across FFDM and SM are not statistically significant density shifts. Taking clinical significance into account, only 8 of 200 (4%) patients had clinically significant transition (dense versus not dense). There was substantial interreader agreement with overall κ in FFDM of 0.71 (minimum 0.53, maximum 0.81) and overall SM κ average of 0.63 (minimum 0.56, maximum 0.87). Overall subjective breast density assignment by radiologists on SM is similar to density assignment on standard 2-D mammogram. Copyright © 2018 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Chirindel, Alin; Adebahr, Sonja; Schuster, Daniel; Schimek-Jasch, Tanja; Schanne, Daniel H; Nemer, Ursula; Mix, Michael; Meyer, Philipp; Grosu, Anca-Ligia; Brunner, Thomas; Nestle, Ursula
2015-06-01
Evaluation of the effect of co-registered 4D-(18)FDG-PET/CT for SBRT target delineation in patients with central versus peripheral lung tumors. Analysis of internal target volume (ITV) delineation of central and peripheral lung lesions in 21 SBRT-patients. Manual delineation was performed by 4 observers in 2 contouring phases: on respiratory gated 4DCT with diagnostic 3DPET available aside (CT-ITV) and on co-registered 4DPET/CT (PET/CT-ITV). Comparative analysis of volumes and inter-reader agreement. 11 cases of peripheral and 10 central lesions were evaluated. In peripheral lesions, average CT-ITV was 6.2 cm(3) and PET/CT-ITV 8.6 cm(3), resembling a mean change in hypothetical radius of 2 mm. For both CT-ITVs and PET/CT-ITVs inter reader agreement was good and unchanged (0.733 and 0.716; p=0.58). All PET/CT-ITVs stayed within the PTVs derived from CT-ITVs. In central lesions, average CT-ITVs were 42.1 cm(3), PET/CT-ITVs 44.2 cm(3), without significant overall volume changes. Inter-reader agreement improved significantly (0.665 and 0.750; p<0.05). 2/10 PET/CT-ITVs exceeded the PTVs derived from CT-ITVs by >1 ml in average for all observers. The addition of co-registered 4DPET data to 4DCT based target volume delineation for SBRT of centrally located lung tumors increases the inter-observer agreement and may help to avoid geographic misses. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Chan, Heang-Ping; Helvie, Mark A.; Petrick, Nicholas; Sahiner, Berkman; Adler, Dorit D.; Blane, Caroline E.; Joynt, Lynn K.; Paramagul, Chintana; Roubidoux, Marilyn A.; Wilson, Todd E.; Hadjiiski, Lubomir M.; Goodsitt, Mitchell M.
1999-05-01
A receiver operating characteristic (ROC) experiment was conducted to evaluate the effects of pixel size on the characterization of mammographic microcalcifications. Digital mammograms were obtained by digitizing screen-film mammograms with a laser film scanner. One hundred twelve two-view mammograms with biopsy-proven microcalcifications were digitized at a pixel size of 35 micrometer X 35 micrometer. A region of interest (ROI) containing the microcalcifications was extracted from each image. ROI images with pixel sizes of 70 micrometers, 105 micrometers, and 140 micrometers were derived from the ROI of 35 micrometer pixel size by averaging 2 X 2, 3 X 3, and 4 X 4 neighboring pixels, respectively. The ROI images were printed on film with a laser imager. Seven MQSA-approved radiologists participated as observers. The likelihood of malignancy of the microcalcifications was rated on a 10-point confidence rating scale and analyzed with ROC methodology. The classification accuracy was quantified by the area, Az, under the ROC curve. The statistical significance of the differences in the Az values for different pixel sizes was estimated with the Dorfman-Berbaum-Metz (DBM) method for multi-reader, multi-case ROC data. It was found that five of the seven radiologists demonstrated a higher classification accuracy with the 70 micrometer or 105 micrometer images. The average Az also showed a higher classification accuracy in the range of 70 to 105 micrometer pixel size. However, the differences in A(subscript z/ between different pixel sizes did not achieve statistical significance. The low specificity of image features of microcalcifications an the large interobserver and intraobserver variabilities may have contributed to the relatively weak dependence of classification accuracy on pixel size.
Østergaard, Mikkel; Eshed, Iris; Althoff, Christian E; Poggenborg, Rene P; Diekhoff, Torsten; Krabbe, Simon; Weckbach, Sabine; Lambert, Robert G W; Pedersen, Susanne J; Maksymowych, Walter P; Peterfy, Charles G; Freeston, Jane; Bird, Paul; Conaghan, Philip G; Hermann, Kay-Geert A
2017-11-01
Whole-body magnetic resonance imaging (WB-MRI) is a relatively new technique that can enable assessment of the overall inflammatory status of people with arthritis, but standards for image acquisition, definitions of key pathologies, and a quantification system are required. Our aim was to perform a systematic literature review (SLR) and to develop consensus definitions of key pathologies, anatomical locations for assessment, a set of MRI sequences and imaging planes for the different body regions, and a preliminary scoring system for WB-MRI in inflammatory arthritis. An SLR was initially performed, searching for WB-MRI studies in arthritis, osteoarthritis, spondyloarthritis, or enthesitis. These results were presented to a meeting of the MRI in Arthritis Working Group together with an MR image review. Following this, preliminary standards for WB-MRI in inflammatory arthritides were developed with further iteration at the Working Group meetings at the Outcome Measures in Rheumatology (OMERACT) 2016. The SLR identified 10 relevant original articles (7 cross-sectional and 3 longitudinal, mostly focusing on synovitis and/or enthesitis in spondyloarthritis, 4 with reproducibility data). The Working Group decided on inflammation in peripheral joints and entheses as primary focus areas, and then developed consensus MRI definitions for these pathologies, selected anatomical locations for assessment, agreed on a core set of MRI sequences and imaging planes for the different regions, and proposed a preliminary scoring system. It was decided to test and further develop the system by iterative multireader exercises. These first steps in developing an OMERACT WB-MRI scoring system for use in inflammatory arthritides offer a framework for further testing and refinement.
Non-localization and localization ROC analyses using clinically based scoring
NASA Astrophysics Data System (ADS)
Paquerault, Sophie; Samuelson, Frank W.; Myers, Kyle J.; Smith, Robert C.
2009-02-01
We are investigating the potential for differences in study conclusions when assessing the estimated impact of a computer-aided detection (CAD) system on readers' performance. The data utilized in this investigation were derived from a multi-reader multi-case observer study involving one hundred mammographic background images to which fixed-size and fixed-intensity Gaussian signals were added, generating a low- and high-intensity signal sets. The study setting allowed CAD assessment in two situations: when CAD sensitivity was 1) superior or 2) lower than the average reader. Seven readers were asked to review each set in the unaided and CAD-aided reading modes, mark and rate their findings. Using this data, we studied the effect on study conclusion of three clinically-based receiver operating characteristic (ROC) scoring definitions. These scoring definitions included both location-specific and non-location-specific rules. The results showed agreement in the estimated impact of CAD on the overall reader performance. In the study setting where CAD sensitivity is superior to the average reader, the mean difference in AUC between the CAD-aided read and unaided read was 0.049 (95%CIs: -0.027; 0.130) for the image scoring definition that is based on non-location-specific rules, and 0.104 (95%CIs: 0.036; 0.174) and 0.090 (95%CIs: 0.031; 0.155) for image scoring definitions that are based on location-specific rules. The increases in AUC were statistically significant for the location-specific scoring definitions. It was further observed that the variance on these estimates was reduced when using the location-specific scoring definitions compared to that using a non-location-specific scoring definition. In the study setting where CAD sensitivity is equivalent or lower than the average reader, the mean differences in AUC are slightly above 0.01 for all image scoring definitions. These increases in AUC were not statistical significant for any of the image scoring definitions. The results on the variance analysis differed from those observed in the other study setting. This investigation furthers our understanding of the relationships between non-localization-specific and localization-specific ROC assessment methodologies and their relevance to clinical practice.
Contrast-enhanced spectral mammography improves diagnostic accuracy in the symptomatic setting.
Tennant, S L; James, J J; Cornford, E J; Chen, Y; Burrell, H C; Hamilton, L J; Girio-Fragkoulakis, C
2016-11-01
To assess the diagnostic accuracy of contrast-enhanced spectral mammography (CESM), and gauge its "added value" in the symptomatic setting. A retrospective multi-reader review of 100 consecutive CESM examinations was performed. Anonymised low-energy (LE) images were reviewed and given a score for malignancy. At least 3 weeks later, the entire examination (LE and recombined images) was reviewed. Histopathology data were obtained for all cases. Differences in performance were assessed using receiver operator characteristic (ROC) analysis. Sensitivity, specificity, and lesion size (versus MRI or histopathology) differences were calculated. Seventy-three percent of cases were malignant at final histology, 27% were benign following standard triple assessment. ROC analysis showed improved overall performance of CESM over LE alone, with area under the curve of 0.93 versus 0.83 (p<0.025). CESM showed increased sensitivity (95% versus 84%, p<0.025) and specificity (81% versus 63%, p<0.025) compared to LE alone, with all five readers showing improved accuracy. Tumour size estimation at CESM was significantly more accurate than LE alone, the latter tending to undersize lesions. In 75% of cases, CESM was deemed a useful or significant aid to diagnosis. CESM provides immediately available, clinically useful information in the symptomatic clinic in patients with suspicious palpable abnormalities. Radiologist sensitivity, specificity, and size accuracy for breast cancer detection and staging are all improved using CESM as the primary mammographic investigation. Copyright © 2016 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
Kim, Hyunji; Cha, Joo Hee; Oh, Ha-Yeun; Kim, Hak Hee; Shin, Hee Jung; Chae, Eun Young
2014-07-01
To compare the performance of radiologists in the use of conventional ultrasound (US) and automated breast volume ultrasound (ABVU) for the characterization of benign and malignant solid breast masses based on breast imaging and reporting data system (BI-RADS) criteria. Conventional US and ABVU images were obtained in 87 patients with 106 solid breast masses (52 cancers, 54 benign lesions). Three experienced radiologists who were blinded to all examination results independently characterized the lesions and reported a BI-RADS assessment category and a level of suspicion of malignancy. The results were analyzed by calculation of Cohen's κ coefficient and by receiver operating characteristic (ROC) analysis. Assessment of the agreement of conventional US and ABVU indicated that the posterior echo feature was the most discordant feature of seven features (κ = 0.371 ± 0.225) and that orientation had the greatest agreement (κ = 0.608 ± 0.210). The final assessment showed substantial agreement (κ = 0.773 ± 0.104). The areas under the ROC curves (Az) for conventional US and ABVU were not statistically significant for each reader, but the mean Az values of conventional US and ABVU by multi-reader multi-case analysis were significantly different (conventional US 0.991, ABVU 0.963; 95 % CI -0.0471 to -0.0097). The means for sensitivity, specificity, positive predictive value, and negative predictive value of conventional US and ABVU did not differ significantly. There was substantial inter-observer agreement in the final assessment of solid breast masses by conventional US and ABVU. ROC analysis comparing the performance of conventional US and ABVU indicated a marginally significant difference in mean Az, but not in mean sensitivity, specificity, positive predictive value, or negative predictive value.
Sakabe, N; Sakabe, K; Sasaki, K
2004-01-01
Galaxy is a Weissenberg-type high-speed high-resolution and highly accurate fully automatic data-collection system using two cylindrical IP-cassettes each with a radius of 400 mm and a width of 450 mm. It was originally developed for static three-dimensional analysis using X-ray diffraction and was installed on bending-magnet beamline BL6C at the Photon Factory. It was found, however, that Galaxy was also very useful for time-resolved protein crystallography on a time scale of minutes. This has prompted us to design a new IP-conveyor-belt Weissenberg-mode data-collection system called Super Galaxy for time-resolved crystallography with improved time and crystallographic resolution over that achievable with Galaxy. Super Galaxy was designed with a half-cylinder-shaped cassette with a radius of 420 mm and a width of 690 mm. Using 1.0 A incident X-rays, these dimensions correspond to a maximum resolutions of 0.71 A in the vertical direction and 1.58 A in the horizontal. Upper and lower screens can be used to set the frame size of the recorded image. This function is useful not only to reduce the frame exchange time but also to save disk space on the data server. The use of an IP-conveyor-belt and many IP-readers make Super Galaxy well suited for time-resolved, monochromatic X-ray crystallography at a very intense third-generation SR beamline. Here, Galaxy and a conceptual design for Super Galaxy are described, and their suitability for use as data-collection systems for macromolecular time-resolved monochromatic X-ray crystallography are compared.
Lim, Hyun-ju; Chung, Myung Jin; Lee, Geewon; Yie, Miyeon; Shin, Kyung Eun; Moon, Jung Won; Lee, Kyung Soo
2013-01-01
To compare the diagnostic performance of light emitting diode (LED) backlight monitors and cold cathode fluorescent lamp (CCFL) monitors for the interpretation of digital chest radiographs. We selected 130 chest radiographs from health screening patients. The soft copy image data were randomly sorted and displayed on a 3.5 M LED (2560 × 1440 pixels) monitor and a 3 M CCFL (2048 × 1536 pixels) monitor. Eight radiologists rated their confidence in detecting nodules and abnormal interstitial lung markings (ILD). Low dose chest CT images were used as a reference standard. The performance of the monitor systems was assessed by analyzing 2080 observations and comparing them by multi-reader, multi-case receiver operating characteristic analysis. The observers reported visual fatigue and a sense of heat. Radiant heat and brightness of the monitors were measured. Measured brightness was 291 cd/m(2) for the LED and 354 cd/m(2) for the CCFL monitor. Area under curves for nodule detection were 0.721 ± 0.072 and 0.764 ± 0.098 for LED and CCFL (p = 0.173), whereas those for ILD were 0.871 ± 0.073 and 0.844 ± 0.068 (p = 0.145), respectively. There were no significant differences in interpretation time (p = 0.446) or fatigue score (p = 0.102) between the two monitors. Sense of heat was lower for the LED monitor (p = 0.024). The temperature elevation was 6.7℃ for LED and 12.4℃ for the CCFL monitor. Although the LED monitor had lower maximum brightness compared with the CCFL monitor, soft copy reading of the digital chest radiographs on LED and CCFL showed no difference in terms of diagnostic performance. In addition, LED emitted less heat.
The Americleft Speech Project: A Training and Reliability Study.
Chapman, Kathy L; Baylis, Adriane; Trost-Cardamone, Judith; Cordero, Kelly Nett; Dixon, Angela; Dobbelsteyn, Cindy; Thurmes, Anna; Wilson, Kristina; Harding-Bell, Anne; Sweeney, Triona; Stoddard, Gregory; Sell, Debbie
2016-01-01
To describe the results of two reliability studies and to assess the effect of training on interrater reliability scores. The first study (1) examined interrater and intrarater reliability scores (weighted and unweighted kappas) and (2) compared interrater reliability scores before and after training on the use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A) with British English-speaking children. The second study examined interrater and intrarater reliability on a modified version of the CAPS-A (CAPS-A Americleft Modification) with American and Canadian English-speaking children. Finally, comparisons were made between the interrater and intrarater reliability scores obtained for Study 1 and Study 2. The participants were speech-language pathologists from the Americleft Speech Project. In Study 1, interrater reliability scores improved for 6 of the 13 parameters following training on the CAPS-A protocol. Comparison of the reliability results for the two studies indicated lower scores for Study 2 compared with Study 1. However, this appeared to be an artifact of the kappa statistic that occurred due to insufficient variability in the reliability samples for Study 2. When percent agreement scores were also calculated, the ratings appeared similar across Study 1 and Study 2. The findings of this study suggested that improvements in interrater reliability could be obtained following a program of systematic training. However, improvements were not uniform across all parameters. Acceptable levels of reliability were achieved for those parameters most important for evaluation of velopharyngeal function.
The Americleft Speech Project: A Training and Reliability Study
Chapman, Kathy L.; Baylis, Adriane; Trost-Cardamone, Judith; Cordero, Kelly Nett; Dixon, Angela; Dobbelsteyn, Cindy; Thurmes, Anna; Wilson, Kristina; Harding-Bell, Anne; Sweeney, Triona; Stoddard, Gregory; Sell, Debbie
2017-01-01
Objective To describe the results of two reliability studies and to assess the effect of training on interrater reliability scores. Design The first study (1) examined interrater and intrarater reliability scores (weighted and unweighted kappas) and (2) compared interrater reliability scores before and after training on the use of the Cleft Audit Protocol for Speech–Augmented (CAPS-A) with British English-speaking children. The second study examined interrater and intrarater reliability on a modified version of the CAPS-A (CAPS-A Americleft Modification) with American and Canadian English-speaking children. Finally, comparisons were made between the interrater and intrarater reliability scores obtained for Study 1 and Study 2. Participants The participants were speech-language pathologists from the Americleft Speech Project. Results In Study 1, interrater reliability scores improved for 6 of the 13 parameters following training on the CAPS-A protocol. Comparison of the reliability results for the two studies indicated lower scores for Study 2 compared with Study 1. However, this appeared to be an artifact of the kappa statistic that occurred due to insufficient variability in the reliability samples for Study 2. When percent agreement scores were also calculated, the ratings appeared similar across Study 1 and Study 2. Conclusion The findings of this study suggested that improvements in interrater reliability could be obtained following a program of systematic training. However, improvements were not uniform across all parameters. Acceptable levels of reliability were achieved for those parameters most important for evaluation of velopharyngeal function. PMID:25531738
Takasaki, Hiroshi; Okuyama, Kousuke; Rosedale, Richard
2017-02-01
Mechanical Diagnosis and Therapy (MDT) is used in the treatment of extremity problems. Classifying clinical problems is one method of providing effective treatment to a target population. Classification reliability is a key factor to determine the precise clinical problem and to direct an appropriate intervention. To explore inter-examiner reliability of the MDT classification for extremity problems in three reliability designs: 1) vignette reliability using surveys with patient vignettes, 2) concurrent reliability, where multiple assessors decide a classification by observing someone's assessment, 3) successive reliability, where multiple assessors independently assess the same patient at different times. Systematic review with data synthesis in a quantitative format. Agreement of MDT subgroups was examined using the Kappa value, with the operational definition of acceptable reliability set at ≥ 0.6. The level of evidence was determined considering the methodological quality of the studies. Six studies were included and all studies met the criteria for high quality. Kappa values for the vignette reliability design (five studies) were ≥ 0.7. There was data from two cohorts in one study for the concurrent reliability design and the Kappa values ranged from 0.45 to 1.0. Kappa values for the successive reliability design (data from three cohorts in one study) were < 0.6. The current review found strong evidence of acceptable inter-examiner reliability of MDT classification for extremity problems in the vignette reliability design, limited evidence of acceptable reliability in the concurrent reliability design and unacceptable reliability in the successive reliability design. Copyright © 2017 Elsevier Ltd. All rights reserved.
The reliability of a quality appraisal tool for studies of diagnostic reliability (QAREL).
Lucas, Nicholas; Macaskill, Petra; Irwig, Les; Moran, Robert; Rickards, Luke; Turner, Robin; Bogduk, Nikolai
2013-09-09
The aim of this project was to investigate the reliability of a new 11-item quality appraisal tool for studies of diagnostic reliability (QAREL). The tool was tested on studies reporting the reliability of any physical examination procedure. The reliability of physical examination is a challenging area to study given the complex testing procedures, the range of tests, and lack of procedural standardisation. Three reviewers used QAREL to independently rate 29 articles, comprising 30 studies, published during 2007. The articles were identified from a search of relevant databases using the following string: "Reproducibility of results (MeSH) OR reliability (t.w.) AND Physical examination (MeSH) OR physical examination (t.w.)." A total of 415 articles were retrieved and screened for inclusion. The reviewers undertook an independent trial assessment prior to data collection, followed by a general discussion about how to score each item. At no time did the reviewers discuss individual papers. Reliability was assessed for each item using multi-rater kappa (κ). Multi-rater reliability estimates ranged from κ = 0.27 to 0.92 across all items. Six items were recorded with good reliability (κ > 0.60), three with moderate reliability (κ = 0.41 - 0.60), and two with fair reliability (κ = 0.21 - 0.40). Raters found it difficult to agree about the spectrum of patients included in a study (Item 1) and the correct application and interpretation of the test (Item 10). In this study, we found that QAREL was a reliable assessment tool for studies of diagnostic reliability when raters agreed upon criteria for the interpretation of each item. Nine out of 11 items had good or moderate reliability, and two items achieved fair reliability. The heterogeneity in the tests included in this study may have resulted in an underestimation of the reliability of these two items. We discuss these and other factors that could affect our results and make recommendations for the use of QAREL.
Reliability studies of diagnostic methods in Indian traditional Ayurveda medicine: An overview
Kurande, Vrinda Hitendra; Waagepetersen, Rasmus; Toft, Egon; Prasad, Ramjee
2013-01-01
Recently, a need to develop supportive new scientific evidence for contemporary Ayurveda has emerged. One of the research objectives is an assessment of the reliability of diagnoses and treatment. Reliability is a quantitative measure of consistency. It is a crucial issue in classification (such as prakriti classification), method development (pulse diagnosis), quality assurance for diagnosis and treatment and in the conduct of clinical studies. Several reliability studies are conducted in western medicine. The investigation of the reliability of traditional Chinese, Japanese and Sasang medicine diagnoses is in the formative stage. However, reliability studies in Ayurveda are in the preliminary stage. In this paper, examples are provided to illustrate relevant concepts of reliability studies of diagnostic methods and their implication in practice, education, and training. An introduction to reliability estimates and different study designs and statistical analysis is given for future studies in Ayurveda. PMID:23930037
Du, Han; Wang, Lijuan
2018-04-23
Intraindividual variability can be measured by the intraindividual standard deviation ([Formula: see text]), intraindividual variance ([Formula: see text]), estimated hth-order autocorrelation coefficient ([Formula: see text]), and mean square successive difference ([Formula: see text]). Unresolved issues exist in the research on reliabilities of intraindividual variability indicators: (1) previous research only studied conditions with 0 autocorrelations in the longitudinal responses; (2) the reliabilities of [Formula: see text] and [Formula: see text] have not been studied. The current study investigates reliabilities of [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and the intraindividual mean, with autocorrelated longitudinal data. Reliability estimates of the indicators were obtained through Monte Carlo simulations. The impact of influential factors on reliabilities of the intraindividual variability indicators is summarized, and the reliabilities are compared across the indicators. Generally, all the studied indicators of intraindividual variability were more reliable with a more reliable measurement scale and more assessments. The reliabilities of [Formula: see text] were generally lower than those of [Formula: see text] and [Formula: see text], the reliabilities of [Formula: see text] were usually between those of [Formula: see text] and [Formula: see text] unless the scale reliability was large and/or the interindividual standard deviation in autocorrelation coefficients was large, and the reliabilities of the intraindividual mean were generally the highest. An R function is provided for planning longitudinal studies to ensure sufficient reliabilities of the intraindividual indicators are achieved.
Wells, Michael L; Froemming, Adam T; Kawashima, Akira; Vrtiska, Terri J; Kim, Bohyun; Hartman, Robert P; Holmes, David R; Carter, Rickey E; Bartley, Adam C; Leng, Shuai; McCollough, Cynthia H; Fletcher, Joel G
2017-08-01
Background Detection of small renal calculi has benefitted from recent advances in computed tomography (CT) scanner design. Information regarding observer performance when using state-of-the-art CT scanners for this application is needed. Purpose To assess observer performance and the impact of radiation dose for detection and size measurement of <4 mm renal stones using CT with integrated circuit detectors and iterative reconstruction. Material and Methods Twenty-nine <4 mm calcium oxalate stones were randomly placed in 20 porcine kidneys in an anthropomorphic phantom. Four radiologists used a workstation to record each calculus detection and size. JAFROC Figure of Merit (FOM), sensitivity, false positive detections, and calculus size were calculated. Results Mean calculus size was 2.2 ± 0.7 mm. The CTDI vol values corresponding to the automatic exposure control settings of 160, 80, 40, 25, and 10 Quality Reference mAs (QRM) were 15.2, 7.9, 4.2, 2.7, and 1.3 mGy, respectively. JAFROC FOM was ≥ 0.97 at ≥ 80 QRM, ≥ 0.89 at ≥ 25 QRM, and was inferior to routine dose (160 QRM) at 10 QRM (0.72, P < 0.05). Per-calculus sensitivity remained ≥ 85% for every reader at ≥ 25 QRM. Mean total false positive detections per reader were ≤ 3 at ≥ 80 QRM, but increased substantially for two readers ( ≥ 12) at ≤ 40 QRM. Measured calculus size significantly decreased at ≤ 25 QRM ( P ≤ 0.01). Conclusion Using low dose renal CT with iterative reconstruction and ≥ 25 QRM results in high sensitivity, but false positive detections increase for some readers at very low dose levels (≤ 40 QRM). At very low doses with iterative reconstruction, measured calculus size will artifactually decrease.
The reliability of the Glasgow Coma Scale: a systematic review.
Reith, Florence C M; Van den Brande, Ruben; Synnot, Anneliese; Gruen, Russell; Maas, Andrew I R
2016-01-01
The Glasgow Coma Scale (GCS) provides a structured method for assessment of the level of consciousness. Its derived sum score is applied in research and adopted in intensive care unit scoring systems. Controversy exists on the reliability of the GCS. The aim of this systematic review was to summarize evidence on the reliability of the GCS. A literature search was undertaken in MEDLINE, EMBASE and CINAHL. Observational studies that assessed the reliability of the GCS, expressed by a statistical measure, were included. Methodological quality was evaluated with the consensus-based standards for the selection of health measurement instruments checklist and its influence on results considered. Reliability estimates were synthesized narratively. We identified 52 relevant studies that showed significant heterogeneity in the type of reliability estimates used, patients studied, setting and characteristics of observers. Methodological quality was good (n = 7), fair (n = 18) or poor (n = 27). In good quality studies, kappa values were ≥0.6 in 85%, and all intraclass correlation coefficients indicated excellent reliability. Poor quality studies showed lower reliability estimates. Reliability for the GCS components was higher than for the sum score. Factors that may influence reliability include education and training, the level of consciousness and type of stimuli used. Only 13% of studies were of good quality and inconsistency in reported reliability estimates was found. Although the reliability was adequate in good quality studies, further improvement is desirable. From a methodological perspective, the quality of reliability studies needs to be improved. From a clinical perspective, a renewed focus on training/education and standardization of assessment is required.
Measuring eating disorder attitudes and behaviors: a reliability generalization study
2014-01-01
Background Although score reliability is a sample-dependent characteristic, researchers often only report reliability estimates from previous studies as justification for employing particular questionnaires in their research. The present study followed reliability generalization procedures to determine the mean score reliability of the Eating Disorder Inventory and its most commonly employed subscales (Drive for Thinness, Bulimia, and Body Dissatisfaction) and the Eating Attitudes Test as a way to better identify those characteristics that might impact score reliability. Methods Published studies that used these measures were coded based on their reporting of reliability information and additional study characteristics that might influence score reliability. Results Score reliability estimates were included in 26.15% of studies using the EDI and 36.28% of studies using the EAT. Mean Cronbach’s alphas for the EDI (total score = .91; subscales = .75 to .89), EAT-40 (total score = .81) and EAT-26 (total score = .86; subscales = .56 to .80) suggested variability in estimated internal consistency. Whereas some EDI subscales exhibited higher score reliability in clinical eating disorder samples than in nonclinical samples, other subscales did not exhibit these differences. Score reliability information for the EAT was primarily reported for nonclinical samples, making it difficult to characterize the effect of type of sample on these measures. However, there was a tendency for mean score reliability to be higher in the adult (vs. adolescent) samples and in female (vs. male) samples. Conclusions Overall, this study highlights the importance of assessing and reporting internal consistency during every test administration because reliability is affected by characteristics of the participants being examined. PMID:24764530
General Aviation Aircraft Reliability Study
NASA Technical Reports Server (NTRS)
Pettit, Duane; Turnbull, Andrew; Roelant, Henk A. (Technical Monitor)
2001-01-01
This reliability study was performed in order to provide the aviation community with an estimate of Complex General Aviation (GA) Aircraft System reliability. To successfully improve the safety and reliability for the next generation of GA aircraft, a study of current GA aircraft attributes was prudent. This was accomplished by benchmarking the reliability of operational Complex GA Aircraft Systems. Specifically, Complex GA Aircraft System reliability was estimated using data obtained from the logbooks of a random sample of the Complex GA Aircraft population.
Dobbins, James T; McAdams, H Page; Sabol, John M; Chakraborty, Dev P; Kazerooni, Ella A; Reddy, Gautham P; Vikgren, Jenny; Båth, Magnus
2017-01-01
Purpose To conduct a multi-institutional, multireader study to compare the performance of digital tomosynthesis, dual-energy (DE) imaging, and conventional chest radiography for pulmonary nodule detection and management. Materials and Methods In this binational, institutional review board-approved, HIPAA-compliant prospective study, 158 subjects (43 subjects with normal findings) were enrolled at four institutions. Informed consent was obtained prior to enrollment. Subjects underwent chest computed tomography (CT) and imaging with conventional chest radiography (posteroanterior and lateral), DE imaging, and tomosynthesis with a flat-panel imaging device. Three experienced thoracic radiologists identified true locations of nodules (n = 516, 3-20-mm diameters) with CT and recommended case management by using Fleischner Society guidelines. Five other radiologists marked nodules and indicated case management by using images from conventional chest radiography, conventional chest radiography plus DE imaging, tomosynthesis, and tomosynthesis plus DE imaging. Sensitivity, specificity, and overall accuracy were measured by using the free-response receiver operating characteristic method and the receiver operating characteristic method for nodule detection and case management, respectively. Results were further analyzed according to nodule diameter categories (3-4 mm, >4 mm to 6 mm, >6 mm to 8 mm, and >8 mm to 20 mm). Results Maximum lesion localization fraction was higher for tomosynthesis than for conventional chest radiography in all nodule size categories (3.55-fold for all nodules, P < .001; 95% confidence interval [CI]: 2.96, 4.15). Case-level sensitivity was higher with tomosynthesis than with conventional chest radiography for all nodules (1.49-fold, P < .001; 95% CI: 1.25, 1.73). Case management decisions showed better overall accuracy with tomosynthesis than with conventional chest radiography, as given by the area under the receiver operating characteristic curve (1.23-fold, P < .001; 95% CI: 1.15, 1.32). There were no differences in any specificity measures. DE imaging did not significantly affect nodule detection when paired with either conventional chest radiography or tomosynthesis. Conclusion Tomosynthesis outperformed conventional chest radiography for lung nodule detection and determination of case management; DE imaging did not show significant differences over conventional chest radiography or tomosynthesis alone. These findings indicate performance likely achievable with a range of reader expertise. © RSNA, 2016 Online supplemental material is available for this article.
McAdams, H. Page; Sabol, John M.; Chakraborty, Dev P.; Kazerooni, Ella A.; Reddy, Gautham P.; Vikgren, Jenny; Båth, Magnus
2017-01-01
Purpose To conduct a multi-institutional, multireader study to compare the performance of digital tomosynthesis, dual-energy (DE) imaging, and conventional chest radiography for pulmonary nodule detection and management. Materials and Methods In this binational, institutional review board–approved, HIPAA-compliant prospective study, 158 subjects (43 subjects with normal findings) were enrolled at four institutions. Informed consent was obtained prior to enrollment. Subjects underwent chest computed tomography (CT) and imaging with conventional chest radiography (posteroanterior and lateral), DE imaging, and tomosynthesis with a flat-panel imaging device. Three experienced thoracic radiologists identified true locations of nodules (n = 516, 3–20-mm diameters) with CT and recommended case management by using Fleischner Society guidelines. Five other radiologists marked nodules and indicated case management by using images from conventional chest radiography, conventional chest radiography plus DE imaging, tomosynthesis, and tomosynthesis plus DE imaging. Sensitivity, specificity, and overall accuracy were measured by using the free-response receiver operating characteristic method and the receiver operating characteristic method for nodule detection and case management, respectively. Results were further analyzed according to nodule diameter categories (3–4 mm, >4 mm to 6 mm, >6 mm to 8 mm, and >8 mm to 20 mm). Results Maximum lesion localization fraction was higher for tomosynthesis than for conventional chest radiography in all nodule size categories (3.55-fold for all nodules, P < .001; 95% confidence interval [CI]: 2.96, 4.15). Case-level sensitivity was higher with tomosynthesis than with conventional chest radiography for all nodules (1.49-fold, P < .001; 95% CI: 1.25, 1.73). Case management decisions showed better overall accuracy with tomosynthesis than with conventional chest radiography, as given by the area under the receiver operating characteristic curve (1.23-fold, P < .001; 95% CI: 1.15, 1.32). There were no differences in any specificity measures. DE imaging did not significantly affect nodule detection when paired with either conventional chest radiography or tomosynthesis. Conclusion Tomosynthesis outperformed conventional chest radiography for lung nodule detection and determination of case management; DE imaging did not show significant differences over conventional chest radiography or tomosynthesis alone. These findings indicate performance likely achievable with a range of reader expertise. © RSNA, 2016 Online supplemental material is available for this article. PMID:27439324
ERIC Educational Resources Information Center
Fazeli, Seyed Hossein
2010-01-01
The purpose of research described in the current study is the psychological reliability, its importance, application, and more to investigate on the impact analysis of psychological reliability of population pilot study for selection of particular reliable multi-choice item test in foreign language research work. The population for subject…
Field reliability of competency and sanity opinions: A systematic review and meta-analysis.
Guarnera, Lucy A; Murrie, Daniel C
2017-06-01
We know surprisingly little about the interrater reliability of forensic psychological opinions, even though courts and other authorities have long called for known error rates for scientific procedures admitted as courtroom testimony. This is particularly true for opinions produced during routine practice in the field, even for some of the most common types of forensic evaluations-evaluations of adjudicative competency and legal sanity. To address this gap, we used meta-analytic procedures and study space methodology to systematically review studies that examined the interrater reliability-particularly the field reliability-of competency and sanity opinions. Of 59 identified studies, 9 addressed the field reliability of competency opinions and 8 addressed the field reliability of sanity opinions. These studies presented a wide range of reliability estimates; pairwise percentage agreements ranged from 57% to 100% and kappas ranged from .28 to 1.0. Meta-analytic combinations of reliability estimates obtained by independent evaluators returned estimates of κ = .49 (95% CI: .40-.58) for competency opinions and κ = .41 (95% CI: .29-.53) for sanity opinions. This wide range of reliability estimates underscores the extent to which different evaluation contexts tend to produce different reliability rates. Unfortunately, our study space analysis illustrates that available field reliability studies typically provide little information about contextual variables crucial to understanding their findings. Given these concerns, we offer suggestions for improving research on the field reliability of competency and sanity opinions, as well as suggestions for improving reliability rates themselves. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Lange, Toni; Freiberg, Alice; Dröge, Patrik; Lützner, Jörg; Schmitt, Jochen; Kopkow, Christian
2015-06-01
Systematic literature review. Despite their frequent application in routine care, a systematic review on the reliability of clinical examination tests to evaluate the integrity of the ACL is missing. To summarize and evaluate intra- and interrater reliability research on physical examination tests used for the diagnosis of ACL tears. A comprehensive systematic literature search was conducted in MEDLINE, EMBASE and AMED until May 30th 2013. Studies were included if they assessed the intra- and/or interrater reliability of physical examination tests for the integrity of the ACL. Methodological quality was evaluated with the Quality Appraisal of Reliability Studies (QAREL) tool by two independent reviewers. 110 hits were achieved of which seven articles finally met the inclusion criteria. These studies examined the reliability of four physical examination tests. Intrarater reliability was assessed in three studies and ranged from fair to almost perfect (Cohen's k = 0.22-1.00). Interrater reliability was assessed in all included studies and ranged from slight to almost perfect (Cohen's k = 0.02-0.81). The Lachman test is the physical tests with the highest intrarater reliability (Cohen's k = 1.00), the Lachman test performed in prone position the test with the highest interrater reliability (Cohen's k = 0.81). Included studies were partly of low methodological quality. A meta-analysis could not be performed due to the heterogeneity in study populations, reliability measures and methodological quality of included studies. Systematic investigations on the reliability of physical examination tests to assess the integrity of the ACL are scarce and of varying methodological quality. Copyright © 2014 Elsevier Ltd. All rights reserved.
Estimating Between-Person and Within-Person Subscore Reliability with Profile Analysis.
Bulut, Okan; Davison, Mark L; Rodriguez, Michael C
2017-01-01
Subscores are of increasing interest in educational and psychological testing due to their diagnostic function for evaluating examinees' strengths and weaknesses within particular domains of knowledge. Previous studies about the utility of subscores have mostly focused on the overall reliability of individual subscores and ignored the fact that subscores should be distinct and have added value over the total score. This study introduces a profile reliability approach that partitions the overall subscore reliability into within-person and between-person subscore reliability. The estimation of between-person reliability and within-person reliability coefficients is demonstrated using subscores from number-correct scoring, unidimensional and multidimensional item response theory scoring, and augmented scoring approaches via a simulation study and a real data study. The effects of various testing conditions, such as subtest length, correlations among subscores, and the number of subtests, are examined. Results indicate that there is a substantial trade-off between within-person and between-person reliability of subscores. Profile reliability coefficients can be useful in determining the extent to which subscores provide distinct and reliable information under various testing conditions.
The Balanced Inventory of Desirable Responding (BIDR): A Reliability Generalization Study
ERIC Educational Resources Information Center
Li, Andrew; Bagger, Jessica
2007-01-01
The Balanced Inventory of Desirable Responding (BIDR) is one of the most widely used social desirability scales. The authors conducted a reliability generalization study to examine the typical reliability coefficients of BIDR scores and explored factors that explained the variability of reliability estimates across studies. The results indicated…
ERIC Educational Resources Information Center
Lane, Ginny G.; White, Amy E.; Henson, Robin K.
2002-01-01
Conducted a reliability generalizability study on the Coopersmith Self-Esteem Inventory (CSEI; S. Coopersmith, 1967) to examine the variability of reliability estimates across studies and to identify study characteristics that may predict this variability. Results show that reliability for CSEI scores can vary considerably, especially at the…
Gorgos, Kara S; Wasylyk, Nicole T; Van Lunen, Bonnie L; Hoch, Matthew C
2014-04-01
Joint mobilizations are commonly used by clinicians to decrease pain and restore joint arthrokinematics following musculoskeletal injury. The force applied during a joint mobilization treatment is subjective to the individual clinician but may have an effect on patient outcomes. The purpose of this systematic review was to critically appraise and synthesize the studies which examined the reliability of clinicians' force application during joint mobilization. A systematic search of PubMed and EBSCO Host databases from inception to March 1, 2013 was conducted to identify studies assessing the reliability of force application during joint mobilizations. Two reviewers utilized the Quality Appraisal of Reliability Studies (QAREL) assessment tool to determine the quality of included studies. The relative reliability of the included studies was examined through intraclass correlation coefficients (ICC) to synthesize study findings. All results were collated qualitatively with a level of evidence approach. A total of seven studies met the eligibility and were included. Five studies were included that assessed inter-clinician reliability, and six studies were included that assessed intra-clinician reliability. The overall level of evidence for inter-clinician reliability was strong for poor-to-moderate reliability (ICC = -0.04 to 0.70). The overall level of evidence for intra-clinician reliability was strong for good reliability (ICC = 0.75-0.99). This systematic review indicates there is variability in force application between clinicians but individual clinicians apply forces consistently. The results of this systematic review suggest innovative instructional methods are needed to improve consistency and validate the forces applied during of joint mobilization treatments. This is particularly evident for improving the consistency of force application across clinicians. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
White, Mark; Huang, Bing; Qin, Jin; Gur, Zvi; Talmor, Michael; Chen, Yuan; Heidecker, Jason; Nguyen, Duc; Bernstein, Joseph
2005-01-01
As microelectronics are scaled in to the deep sub-micron regime, users of advanced technology CMOS, particularly in high-reliability applications, should reassess how scaling effects impact long-term reliability. An experimental based reliability study of industrial grade SRAMs, consisting of three different technology nodes, is proposed to substantiate current acceleration models for temperature and voltage life-stress relationships. This reliability study utilizes step-stress techniques to evaluate memory technologies (0.25mum, 0.15mum, and 0.13mum) embedded in many of today's high-reliability space/aerospace applications. Two acceleration modeling approaches are presented to relate experimental FIT calculations to Mfr's qualification data.
Reliability generalization: a viable key for establishing validity generalization
NASA Technical Reports Server (NTRS)
Kennedy, R. S.; Turnage, J. J.
1991-01-01
Even with radical restriction of range, reliability coefficients from 10 studies gave an average interstudy value of .74, suggesting constancy of reliability over diverse experiments. A value from a new test can help index reliability of tests not previously studied.
Clayson, Peter E; Miller, Gregory A
2017-01-01
Failing to consider psychometric issues related to reliability and validity, differential deficits, and statistical power potentially undermines the conclusions of a study. In research using event-related brain potentials (ERPs), numerous contextual factors (population sampled, task, data recording, analysis pipeline, etc.) can impact the reliability of ERP scores. The present review considers the contextual factors that influence ERP score reliability and the downstream effects that reliability has on statistical analyses. Given the context-dependent nature of ERPs, it is recommended that ERP score reliability be formally assessed on a study-by-study basis. Recommended guidelines for ERP studies include 1) reporting the threshold of acceptable reliability and reliability estimates for observed scores, 2) specifying the approach used to estimate reliability, and 3) justifying how trial-count minima were chosen. A reliability threshold for internal consistency of at least 0.70 is recommended, and a threshold of 0.80 is preferred. The review also advocates the use of generalizability theory for estimating score dependability (the generalizability theory analog to reliability) as an improvement on classical test theory reliability estimates, suggesting that the latter is less well suited to ERP research. To facilitate the calculation and reporting of dependability estimates, an open-source Matlab program, the ERP Reliability Analysis Toolbox, is presented. Copyright © 2016 Elsevier B.V. All rights reserved.
Citronberg, Jessica S; Wilkens, Lynne R; Lim, Unhee; Hullar, Meredith A J; White, Emily; Newcomb, Polly A; Le Marchand, Loïc; Lampe, Johanna W
2016-09-01
Plasma lipopolysaccharide-binding protein (LBP), a measure of internal exposure to bacterial lipopolysaccharide, has been associated with several chronic conditions and may be a marker of chronic inflammation; however, no studies have examined the reliability of this biomarker in a healthy population. We examined the temporal reliability of LBP measured in archived samples from participants in two studies. In Study one, 60 healthy participants had blood drawn at two time points: baseline and follow-up (either three, six, or nine months). In Study two, 24 individuals had blood drawn three to four times over a seven-month period. We measured LBP in archived plasma by ELISA. Test-retest reliability was estimated by calculating the intraclass correlation coefficient (ICC). Plasma LBP concentrations showed moderate reliability in Study one (ICC 0.60, 95 % CI 0.43-0.75) and Study two (ICC 0.46, 95 % CI 0.26-0.69). Restricting the follow-up period improved reliability. In Study one, the reliability of LBP over a three-month period was 0.68 (95 % CI: 0.41-0.87). In Study two, the ICC of samples taken ≤seven days apart was 0.61 (95 % CI 0.29-0.86). Plasma LBP concentrations demonstrated moderate test-retest reliability in healthy individuals with reliability improving over a shorter follow-up period.
Score Reliability: A Retrospective Look Back at 12 Years of Reliability Generalization Studies
ERIC Educational Resources Information Center
Vacha-Haase, Tammi; Thompson, Bruce
2011-01-01
The present study was conducted to characterize (a) the features of the thousands of primary reports synthesized in 47 reliability generalization (RG) measurement meta-analysis studies and (b) typical methodological practice within the RG literature to date. With respect to the treatment of score reliability in the literature, in an astounding…
A Reliability Generalization Study of the Marlowe-Crowne Social Desirability Scale.
ERIC Educational Resources Information Center
Beretvas, S, Natasha; Meyers, Jason L.; Leite, Walter L.
2002-01-01
Conducted a reliability generalization study of the Marlowe-Crowne Social Desirability Scale (D. Crowne and D. Marlowe, 1960). Analysis of 93 studies show that the predicted score reliability for male adolescents was 0.53, and reliability for men's responses was lower than for women's. Discusses the need for further analysis of the scale. (SLD)
Lucas, Nicholas; Macaskill, Petra; Irwig, Les; Moran, Robert; Bogduk, Nikolai
2009-01-01
Trigger points are promoted as an important cause of musculoskeletal pain. There is no accepted reference standard for the diagnosis of trigger points, and data on the reliability of physical examination for trigger points are conflicting. To systematically review the literature on the reliability of physical examination for the diagnosis of trigger points. MEDLINE, EMBASE, and other sources were searched for articles reporting the reliability of physical examination for trigger points. Included studies were evaluated for their quality and applicability, and reliability estimates were extracted and reported. Nine studies were eligible for inclusion. None satisfied all quality and applicability criteria. No study specifically reported reliability for the identification of the location of active trigger points in the muscles of symptomatic participants. Reliability estimates varied widely for each diagnostic sign, for each muscle, and across each study. Reliability estimates were generally higher for subjective signs such as tenderness (kappa range, 0.22-1.0) and pain reproduction (kappa range, 0.57-1.00), and lower for objective signs such as the taut band (kappa range, -0.08-0.75) and local twitch response (kappa range, -0.05-0.57). No study to date has reported the reliability of trigger point diagnosis according to the currently proposed criteria. On the basis of the limited number of studies available, and significant problems with their design, reporting, statistical integrity, and clinical applicability, physical examination cannot currently be recommended as a reliable test for the diagnosis of trigger points. The reliability of trigger point diagnosis needs to be further investigated with studies of high quality that use current diagnostic criteria in clinically relevant patients.
Park, Ji Eun; Han, Kyunghwa; Sung, Yu Sub; Chung, Mi Sun; Koo, Hyun Jung; Yoon, Hee Mang; Choi, Young Jun; Lee, Seung Soo; Kim, Kyung Won; Shin, Youngbin; An, Suah; Cho, Hyo-Min
2017-01-01
Objective To evaluate the frequency and adequacy of statistical analyses in a general radiology journal when reporting a reliability analysis for a diagnostic test. Materials and Methods Sixty-three studies of diagnostic test accuracy (DTA) and 36 studies reporting reliability analyses published in the Korean Journal of Radiology between 2012 and 2016 were analyzed. Studies were judged using the methodological guidelines of the Radiological Society of North America-Quantitative Imaging Biomarkers Alliance (RSNA-QIBA), and COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative. DTA studies were evaluated by nine editorial board members of the journal. Reliability studies were evaluated by study reviewers experienced with reliability analysis. Results Thirty-one (49.2%) of the 63 DTA studies did not include a reliability analysis when deemed necessary. Among the 36 reliability studies, proper statistical methods were used in all (5/5) studies dealing with dichotomous/nominal data, 46.7% (7/15) of studies dealing with ordinal data, and 95.2% (20/21) of studies dealing with continuous data. Statistical methods were described in sufficient detail regarding weighted kappa in 28.6% (2/7) of studies and regarding the model and assumptions of intraclass correlation coefficient in 35.3% (6/17) and 29.4% (5/17) of studies, respectively. Reliability parameters were used as if they were agreement parameters in 23.1% (3/13) of studies. Reproducibility and repeatability were used incorrectly in 20% (3/15) of studies. Conclusion Greater attention to the importance of reporting reliability, thorough description of the related statistical methods, efforts not to neglect agreement parameters, and better use of relevant terminology is necessary. PMID:29089821
Salyers, M P; McHugo, G J; Cook, J A; Razzano, L A; Drake, R E; Mueser, K T
2001-09-01
Reliability of well-known instruments was examined in 202 people with severe mental illness participating in a multisite vocational study. We examined interrater reliability of the Positive and Negative Syndrome Scale (PANSS) and the internal consistency and test-retest reliability of the PANSS, the Rosenberg Self-Esteem Scale, the Medical Outcomes Study Short Form-36 (SF-36), and the Quality of Life Interview. Most scales had good levels of reliability, with intraclass correlation coefficients (ICCs) and coefficient alphas above .70. However, the SF-36 scales were generally less stable over time, particularly Social Functioning (ICC = .55). Test-retest reliability was lower among less educated respondents and among ethnic minorities. We recommend close monitoring of psychometric issues in future multisite studies.
Reliability of intracerebral hemorrhage classification systems: A systematic review.
Rannikmäe, Kristiina; Woodfield, Rebecca; Anderson, Craig S; Charidimou, Andreas; Chiewvit, Pipat; Greenberg, Steven M; Jeng, Jiann-Shing; Meretoja, Atte; Palm, Frederic; Putaala, Jukka; Rinkel, Gabriel Je; Rosand, Jonathan; Rost, Natalia S; Strbian, Daniel; Tatlisumak, Turgut; Tsai, Chung-Fen; Wermer, Marieke Jh; Werring, David; Yeh, Shin-Joe; Al-Shahi Salman, Rustam; Sudlow, Cathie Lm
2016-08-01
Accurately distinguishing non-traumatic intracerebral hemorrhage (ICH) subtypes is important since they may have different risk factors, causal pathways, management, and prognosis. We systematically assessed the inter- and intra-rater reliability of ICH classification systems. We sought all available reliability assessments of anatomical and mechanistic ICH classification systems from electronic databases and personal contacts until October 2014. We assessed included studies' characteristics, reporting quality and potential for bias; summarized reliability with kappa value forest plots; and performed meta-analyses of the proportion of cases classified into each subtype. We included 8 of 2152 studies identified. Inter- and intra-rater reliabilities were substantial to perfect for anatomical and mechanistic systems (inter-rater kappa values: anatomical 0.78-0.97 [six studies, 518 cases], mechanistic 0.89-0.93 [three studies, 510 cases]; intra-rater kappas: anatomical 0.80-1 [three studies, 137 cases], mechanistic 0.92-0.93 [two studies, 368 cases]). Reporting quality varied but no study fulfilled all criteria and none was free from potential bias. All reliability studies were performed with experienced raters in specialist centers. Proportions of ICH subtypes were largely consistent with previous reports suggesting that included studies are appropriately representative. Reliability of existing classification systems appears excellent but is unknown outside specialist centers with experienced raters. Future reliability comparisons should be facilitated by studies following recently published reporting guidelines. © 2016 World Stroke Organization.
Intersession reliability of fMRI activation for heat pain and motor tasks
Quiton, Raimi L.; Keaser, Michael L.; Zhuo, Jiachen; Gullapalli, Rao P.; Greenspan, Joel D.
2014-01-01
As the practice of conducting longitudinal fMRI studies to assess mechanisms of pain-reducing interventions becomes more common, there is a great need to assess the test–retest reliability of the pain-related BOLD fMRI signal across repeated sessions. This study quantitatively evaluated the reliability of heat pain-related BOLD fMRI brain responses in healthy volunteers across 3 sessions conducted on separate days using two measures: (1) intraclass correlation coefficients (ICC) calculated based on signal amplitude and (2) spatial overlap. The ICC analysis of pain-related BOLD fMRI responses showed fair-to-moderate intersession reliability in brain areas regarded as part of the cortical pain network. Areas with the highest intersession reliability based on the ICC analysis included the anterior midcingulate cortex, anterior insula, and second somatosensory cortex. Areas with the lowest intersession reliability based on the ICC analysis also showed low spatial reliability; these regions included pregenual anterior cingulate cortex, primary somatosensory cortex, and posterior insula. Thus, this study found regional differences in pain-related BOLD fMRI response reliability, which may provide useful information to guide longitudinal pain studies. A simple motor task (finger-thumb opposition) was performed by the same subjects in the same sessions as the painful heat stimuli were delivered. Intersession reliability of fMRI activation in cortical motor areas was comparable to previously published findings for both spatial overlap and ICC measures, providing support for the validity of the analytical approach used to assess intersession reliability of pain-related fMRI activation. A secondary finding of this study is that the use of standard ICC alone as a measure of reliability may not be sufficient, as the underlying variance structure of an fMRI dataset can result in inappropriately high ICC values; a method to eliminate these false positive results was used in this study and is recommended for future studies of test–retest reliability. PMID:25161897
López-Pina, José Antonio; Sánchez-Meca, Julio; López-López, José Antonio; Marín-Martínez, Fulgencio; Núñez-Núñez, Rosa Ma; Rosa-Alcázar, Ana I; Gómez-Conesa, Antonia; Ferrer-Requena, Josefa
2015-01-01
The Yale-Brown Obsessive-Compulsive Scale for children and adolescents (CY-BOCS) is a frequently applied test to assess obsessive-compulsive symptoms. We conducted a reliability generalization meta-analysis on the CY-BOCS to estimate the average reliability, search for reliability moderators, and propose a predictive model that researchers and clinicians can use to estimate the expected reliability of the CY-BOCS scores. A total of 47 studies reporting a reliability coefficient with the data at hand were included in the meta-analysis. The results showed good reliability and a large variability associated to the standard deviation of total scores and sample size.
INFLUENCES OF RESPONSE RATE AND DISTRIBUTION ON THE CALCULATION OF INTEROBSERVER RELIABILITY SCORES
Rolider, Natalie U.; Iwata, Brian A.; Bullock, Christopher E.
2012-01-01
We examined the effects of several variations in response rate on the calculation of total, interval, exact-agreement, and proportional reliability indices. Trained observers recorded computer-generated data that appeared on a computer screen. In Study 1, target responses occurred at low, moderate, and high rates during separate sessions so that reliability results based on the four calculations could be compared across a range of values. Total reliability was uniformly high, interval reliability was spuriously high for high-rate responding, proportional reliability was somewhat lower for high-rate responding, and exact-agreement reliability was the lowest of the measures, especially for high-rate responding. In Study 2, we examined the separate effects of response rate per se, bursting, and end-of-interval responding. Response rate and bursting had little effect on reliability scores; however, the distribution of some responses at the end of intervals decreased interval reliability somewhat, proportional reliability noticeably, and exact-agreement reliability markedly. PMID:23322930
The reliability of the Australasian Triage Scale: a meta-analysis
Ebrahimi, Mohsen; Heydari, Abbas; Mazlom, Reza; Mirhaghi, Amir
2015-01-01
BACKGROUND: Although the Australasian Triage Scale (ATS) has been developed two decades ago, its reliability has not been defined; therefore, we present a meta-analyis of the reliability of the ATS in order to reveal to what extent the ATS is reliable. DATA SOURCES: Electronic databases were searched to March 2014. The included studies were those that reported samples size, reliability coefficients, and adequate description of the ATS reliability assessment. The guidelines for reporting reliability and agreement studies (GRRAS) were used. Two reviewers independently examined abstracts and extracted data. The effect size was obtained by the z-transformation of reliability coefficients. Data were pooled with random-effects models, and meta-regression was done based on the method of moment’s estimator. RESULTS: Six studies were included in this study at last. Pooled coefficient for the ATS was substantial 0.428 (95%CI 0.340–0.509). The rate of mis-triage was less than fifty percent. The agreement upon the adult version is higher than the pediatric version. CONCLUSION: The ATS has shown an acceptable level of overall reliability in the emergency department, but it needs more development to reach an almost perfect agreement. PMID:26056538
Duncan, Laura; Comeau, Jinette; Wang, Li; Vitoroulis, Irene; Boyle, Michael H; Bennett, Kathryn
2018-02-19
A better understanding of factors contributing to the observed variability in estimates of test-retest reliability in published studies on standardized diagnostic interviews (SDI) is needed. The objectives of this systematic review and meta-analysis were to estimate the pooled test-retest reliability for parent and youth assessments of seven common disorders, and to examine sources of between-study heterogeneity in reliability. Following a systematic review of the literature, multilevel random effects meta-analyses were used to analyse 202 reliability estimates (Cohen's kappa = ҡ) from 31 eligible studies and 5,369 assessments of 3,344 children and youth. Pooled reliability was moderate at ҡ = .58 (CI 95% 0.53-0.63) and between-study heterogeneity was substantial (Q = 2,063 (df = 201), p < .001 and I 2 = 79%). In subgroup analysis, reliability varied across informants for specific types of psychiatric disorder (ҡ = .53-.69 for parent vs. ҡ = .39-.68 for youth) with estimates significantly higher for parents on attention deficit hyperactivity disorder, oppositional defiant disorder and the broad groupings of externalizing and any disorder. Reliability was also significantly higher in studies with indicators of poor or fair study methodology quality (sample size <50, retest interval <7 days). Our findings raise important questions about the meaningfulness of published evidence on the test-retest reliability of SDIs and the usefulness of these tools in both clinical and research contexts. Potential remedies include the introduction of standardized study and reporting requirements for reliability studies, and exploration of other approaches to assessing and classifying child and adolescent psychiatric disorder. © 2018 Association for Child and Adolescent Mental Health.
A study on reliability of power customer in distribution network
NASA Astrophysics Data System (ADS)
Liu, Liyuan; Ouyang, Sen; Chen, Danling; Ma, Shaohua; Wang, Xin
2017-05-01
The existing power supply reliability index system is oriented to power system without considering actual electricity availability in customer side. In addition, it is unable to reflect outage or customer’s equipment shutdown caused by instantaneous interruption and power quality problem. This paper thus makes a systematic study on reliability of power customer. By comparing with power supply reliability, reliability of power customer is defined and extracted its evaluation requirements. An indexes system, consisting of seven customer indexes and two contrast indexes, are designed to describe reliability of power customer from continuity and availability. In order to comprehensively and quantitatively evaluate reliability of power customer in distribution networks, reliability evaluation method is proposed based on improved entropy method and the punishment weighting principle. Practical application has proved that reliability index system and evaluation method for power customer is reasonable and effective.
Seeking high reliability in primary care: Leadership, tools, and organization.
Weaver, Robert R
2015-01-01
Leaders in health care increasingly recognize that improving health care quality and safety requires developing an organizational culture that fosters high reliability and continuous process improvement. For various reasons, a reliability-seeking culture is lacking in most health care settings. Developing a reliability-seeking culture requires leaders' sustained commitment to reliability principles using key mechanisms to embed those principles widely in the organization. The aim of this study was to examine how key mechanisms used by a primary care practice (PCP) might foster a reliability-seeking, system-oriented organizational culture. A case study approach was used to investigate the PCP's reliability culture. The study examined four cultural artifacts used to embed reliability-seeking principles across the organization: leadership statements, decision support tools, and two organizational processes. To decipher their effects on reliability, the study relied on observations of work patterns and the tools' use, interactions during morning huddles and process improvement meetings, interviews with clinical and office staff, and a "collective mindfulness" questionnaire. The five reliability principles framed the data analysis. Leadership statements articulated principles that oriented the PCP toward a reliability-seeking culture of care. Reliability principles became embedded in the everyday discourse and actions through the use of "problem knowledge coupler" decision support tools and daily "huddles." Practitioners and staff were encouraged to report unexpected events or close calls that arose and which often initiated a formal "process change" used to adjust routines and prevent adverse events from recurring. Activities that foster reliable patient care became part of the taken-for-granted routine at the PCP. The analysis illustrates the role leadership, tools, and organizational processes play in developing and embedding a reliable-seeking culture across an organization. Progress toward a reliability-seeking, system-oriented approach to care remains ongoing, and movement in that direction requires deliberate and sustained effort by committed leaders in health care.
Barrett, Eva; McCreesh, Karen; Lewis, Jeremy
2014-02-01
A wide array of instruments are available for non-invasive thoracic kyphosis measurement. Guidelines for selecting outcome measures for use in clinical and research practice recommend that properties such as validity and reliability are considered. This systematic review reports on the reliability and validity of non-invasive methods for measuring thoracic kyphosis. A systematic search of 11 electronic databases located studies assessing reliability and/or validity of non-invasive thoracic kyphosis measurement techniques. Two independent reviewers used a critical appraisal tool to assess the quality of retrieved studies. Data was extracted by the primary reviewer. The results were synthesized qualitatively using a level of evidence approach. 27 studies satisfied the eligibility criteria and were included in the review. The reliability, validity and both reliability and validity were investigated by sixteen, two and nine studies respectively. 17/27 studies were deemed to be of high quality. In total, 15 methods of thoracic kyphosis were evaluated in retrieved studies. All investigated methods showed high (ICC ≥ .7) to very high (ICC ≥ .9) levels of reliability. The validity of the methods ranged from low to very high. The strongest levels of evidence for reliability exists in support of the Debrunner kyphometer, Spinal Mouse and Flexicurve index, and for validity supports the arcometer and Flexicurve index. Further reliability and validity studies are required to strengthen the level of evidence for the remaining methods of measurement. This should be addressed by future research. Copyright © 2013 Elsevier Ltd. All rights reserved.
Koch, Michael S; DeSesso, John M; Williams, Amy Lavin; Michalek, Suzanne; Hammond, Bruce
2016-01-01
To determine the reliability of food safety studies carried out in rodents with genetically modified (GM) crops, a Food Safety Study Reliability Tool (FSSRTool) was adapted from the European Centre for the Validation of Alternative Methods' (ECVAM) ToxRTool. Reliability was defined as the inherent quality of the study with regard to use of standardized testing methodology, full documentation of experimental procedures and results, and the plausibility of the findings. Codex guidelines for GM crop safety evaluations indicate toxicology studies are not needed when comparability of the GM crop to its conventional counterpart has been demonstrated. This guidance notwithstanding, animal feeding studies have routinely been conducted with GM crops, but their conclusions on safety are not always consistent. To accurately evaluate potential risks from GM crops, risk assessors need clearly interpretable results from reliable studies. The development of the FSSRTool, which provides the user with a means of assessing the reliability of a toxicology study to inform risk assessment, is discussed. Its application to the body of literature on GM crop food safety studies demonstrates that reliable studies report no toxicologically relevant differences between rodents fed GM crops or their non-GM comparators.
Psychometrics Matter in Health Behavior: A Long-term Reliability Generalization Study.
Pickett, Andrew C; Valdez, Danny; Barry, Adam E
2017-09-01
Despite numerous calls for increased understanding and reporting of reliability estimates, social science research, including the field of health behavior, has been slow to respond and adopt such practices. Therefore, we offer a brief overview of reliability and common reporting errors; we then perform analyses to examine and demonstrate the variability of reliability estimates by sample and over time. Using meta-analytic reliability generalization, we examined the variability of coefficient alpha scores for a well-designed, consistent, nationwide health study, covering a span of nearly 40 years. For each year and sample, reliability varied. Furthermore, reliability was predicted by a sample characteristic that differed among age groups within each administration. We demonstrated that reliability is influenced by the methods and individuals from which a given sample is drawn. Our work echoes previous calls that psychometric properties, particularly reliability of scores, are important and must be considered and reported before drawing statistical conclusions.
Comprehensive Design Reliability Activities for Aerospace Propulsion Systems
NASA Technical Reports Server (NTRS)
Christenson, R. L.; Whitley, M. R.; Knight, K. C.
2000-01-01
This technical publication describes the methodology, model, software tool, input data, and analysis result that support aerospace design reliability studies. The focus of these activities is on propulsion systems mechanical design reliability. The goal of these activities is to support design from a reliability perspective. Paralleling performance analyses in schedule and method, this requires the proper use of metrics in a validated reliability model useful for design, sensitivity, and trade studies. Design reliability analysis in this view is one of several critical design functions. A design reliability method is detailed and two example analyses are provided-one qualitative and the other quantitative. The use of aerospace and commercial data sources for quantification is discussed and sources listed. A tool that was developed to support both types of analyses is presented. Finally, special topics discussed include the development of design criteria, issues of reliability quantification, quality control, and reliability verification.
Rabelo, Michelle; Nunes, Guilherme S; da Costa Amante, Natália Menezes; de Noronha, Marcos; Fachin-Martins, Emerson
2016-02-01
Muscle weakness is the main cause of motor impairment among stroke survivors and is associated with reduced peak muscle torque. To systematically investigate and organize the evidence of the reliability of muscle strength evaluation measures in post-stroke survivors with chronic hemiparesis. Two assessors independently searched four electronic databases in January 2014 (Medline, Scielo, CINAHL, Embase). Inclusion criteria comprised studies on reliability on muscle strength assessment in adult post-stroke patients with chronic hemiparesis. We extracted outcomes from included studies about reliability data, measured by intraclass correlation coefficient (ICC) and/or similar. The meta-analyses were conducted only with isokinetic data. Of 450 articles, eight articles were included for this review. After quality analysis, two studies were considered of high quality. Five different joints were analyzed within the included studies (knee, hip, ankle, shoulder, and elbow). Their reliability results varying from low to very high reliability (ICCs from 0.48 to 0.99). Results of meta-analysis for knee extension varying from high to very high reliability (pooled ICCs from 0.89 to 0.97), for knee flexion varying from high to very high reliability (pooled ICCs from 0.84 to 0.91) and for ankle plantar flexion showed high reliability (pooled ICC = 0.85). Objective muscle strength assessment can be reliably used in lower and upper extremities in post-stroke patients with chronic hemiparesis.
The development of a quality appraisal tool for studies of diagnostic reliability (QAREL).
Lucas, Nicholas P; Macaskill, Petra; Irwig, Les; Bogduk, Nikolai
2010-08-01
In systematic reviews of the reliability of diagnostic tests, no quality assessment tool has been used consistently. The aim of this study was to develop a specific quality appraisal tool for studies of diagnostic reliability. Key principles for the quality of studies of diagnostic reliability were identified with reference to epidemiologic principles, existing quality appraisal checklists, and the Standards for Reporting of Diagnostic Accuracy (STARD) and Quality Assessment of Diagnostic Accuracy Studies (QUADAS) resources. Specific items that encompassed each of the principles were developed. Experts in diagnostic research provided feedback on the items that were to form the appraisal tool. This process was iterative and continued until consensus among experts was reached. The Quality Appraisal of Reliability Studies (QAREL) checklist includes 11 items that explore seven principles. Items cover the spectrum of subjects, spectrum of examiners, examiner blinding, order effects of examination, suitability of the time interval among repeated measurements, appropriate test application and interpretation, and appropriate statistical analysis. QAREL has been developed as a specific quality appraisal tool for studies of diagnostic reliability. The reliability of this tool in different contexts needs to be evaluated. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Suzuki, T; Sato, Y; Sotome, S; Arai, H; Arai, A; Yoshida, H
2017-06-01
This study was designed to investigate the reliability and validity of measurements of finger diameters with a ring gauge. A reliability study enrolled two independent samples (50 participants and seven examiners in Study I; 26 participants and 26 examiners in Study II). The sizes of each participant's little fingers were measured twice with a ring gauge by each examiner. To investigate the validity of the measurements, five hand therapists compared the finger size and hand volume of 30 participants with the ring gauge and with a figure-of-eight technique (Study III). The intra-class correlation coefficient for intra-observer reliability ranged from 0.97 to 0.99 in Study I, and 0.90 to 0.97 in Study II. The intra-class correlation coefficient for inter-observer reliability was 0.95 in Study I and 0.94 in Study II. The validity study showed a Pearson product moment correlation coefficient of 0.75. The ring gauge showed high reliability and validity for measurement of finger size. III, diagnostic.
Downs, Stephen; Marquez, Jodie; Chiarelli, Pauline
2013-06-01
What is the intra-rater and inter-rater relative reliability of the Berg Balance Scale? What is the absolute reliability of the Berg Balance Scale? Does the absolute reliability of the Berg Balance Scale vary across the scale? Systematic review with meta-analysis of reliability studies. Any clinical population that has undergone assessment with the Berg Balance Scale. Relative intra-rater reliability, relative inter-rater reliability, and absolute reliability. Eleven studies involving 668 participants were included in the review. The relative intrarater reliability of the Berg Balance Scale was high, with a pooled estimate of 0.98 (95% CI 0.97 to 0.99). Relative inter-rater reliability was also high, with a pooled estimate of 0.97 (95% CI 0.96 to 0.98). A ceiling effect of the Berg Balance Scale was evident for some participants. In the analysis of absolute reliability, all of the relevant studies had an average score of 20 or above on the 0 to 56 point Berg Balance Scale. The absolute reliability across this part of the scale, as measured by the minimal detectable change with 95% confidence, varied between 2.8 points and 6.6 points. The Berg Balance Scale has a higher absolute reliability when close to 56 points due to the ceiling effect. We identified no data that estimated the absolute reliability of the Berg Balance Scale among participants with a mean score below 20 out of 56. The Berg Balance Scale has acceptable reliability, although it might not detect modest, clinically important changes in balance in individual subjects. The review was only able to comment on the absolute reliability of the Berg Balance Scale among people with moderately poor to normal balance. Copyright © 2013 Australian Physiotherapy Association. Published by .. All rights reserved.
Use of Internal Consistency Coefficients for Estimating Reliability of Experimental Tasks Scores
Green, Samuel B.; Yang, Yanyun; Alt, Mary; Brinkley, Shara; Gray, Shelley; Hogan, Tiffany; Cowan, Nelson
2017-01-01
Reliabilities of scores for experimental tasks are likely to differ from one study to another to the extent that the task stimuli change, the number of trials varies, the type of individuals taking the task changes, the administration conditions are altered, or the focal task variable differs. Given reliabilities vary as a function of the design of these tasks and the characteristics of the individuals taking them, making inferences about the reliability of scores in an ongoing study based on reliability estimates from prior studies is precarious. Thus, it would be advantageous to estimate reliability based on data from the ongoing study. We argue that internal consistency estimates of reliability are underutilized for experimental task data and in many applications could provide this information using a single administration of a task. We discuss different methods for computing internal consistency estimates with a generalized coefficient alpha and the conditions under which these estimates are accurate. We illustrate use of these coefficients using data for three different tasks. PMID:26546100
A study of fault prediction and reliability assessment in the SEL environment
NASA Technical Reports Server (NTRS)
Basili, Victor R.; Patnaik, Debabrata
1986-01-01
An empirical study on estimation and prediction of faults, prediction of fault detection and correction effort, and reliability assessment in the Software Engineering Laboratory environment (SEL) is presented. Fault estimation using empirical relationships and fault prediction using curve fitting method are investigated. Relationships between debugging efforts (fault detection and correction effort) in different test phases are provided, in order to make an early estimate of future debugging effort. This study concludes with the fault analysis, application of a reliability model, and analysis of a normalized metric for reliability assessment and reliability monitoring during development of software.
Reliability of diagnosis and clinical efficacy of visceral osteopathy: a systematic review.
Guillaud, Albin; Darbois, Nelly; Monvoisin, Richard; Pinsault, Nicolas
2018-02-17
In 2010, the World Health Organization published benchmarks for training in osteopathy in which osteopathic visceral techniques are included. The purpose of this study was to identify and critically appraise the scientific literature concerning the reliability of diagnosis and the clinical efficacy of techniques used in visceral osteopathy. Databases MEDLINE, OSTMED.DR, the Cochrane Library, Osteopathic Research Web, Google Scholar, Journal of American Osteopathic Association (JAOA) website, International Journal of Osteopathic Medicine (IJOM) website, and the catalog of Académie d'ostéopathie de France website were searched through December 2017. Only inter-rater reliability studies including at least two raters or the intra-rater reliability studies including at least two assessments by the same rater were included. For efficacy studies, only randomized-controlled-trials (RCT) or crossover studies on unhealthy subjects (any condition, duration and outcome) were included. Risk of bias was determined using a modified version of the quality appraisal tool for studies of diagnostic reliability (QAREL) in reliability studies. For the efficacy studies, the Cochrane risk of bias tool was used to assess their methodological design. Two authors performed data extraction and analysis. Eight reliability studies and six efficacy studies were included. The analysis of reliability studies shows that the diagnostic techniques used in visceral osteopathy are unreliable. Regarding efficacy studies, the least biased study shows no significant difference for the main outcome. The main risks of bias found in the included studies were due to the absence of blinding of the examiners, an unsuitable statistical method or an absence of primary study outcome. The results of the systematic review lead us to conclude that well-conducted and sound evidence on the reliability and the efficacy of techniques in visceral osteopathy is absent. The review is registered PROSPERO 12th of December 2016. Registration number is CRD4201605286 .
A reliability study of the new sensors for movement analysis (SHARIF-HMIS).
Abedi, Mohen; Manshadi, Farideh Dehghan; Zavieh, Minoo Khalkhali; Ashouri, Sajad; Azimi, Hadi; Parnanpour, Mohamad
2016-04-01
SHARIF-HMIS is a new inertial sensor designed for movement analysis. The aim of the present study was to assess the inter-tester and intra-tester reliability of some kinematic parameters in different lumbar motions making use of this sensor. 24 healthy persons and 28 patients with low back pain participated in the current reliability study. The test was performed in five different lumbar motions consisting of lumbar flexion in 0, 15, and 30° in the right and left directions. For measuring inter-tester reliability, all the tests were carried out twice on the same day separately by two physiotherapists. Intra-tester reliability was assessed by reproducing the tests after 3 days by the same physiotherapist. The present study revealed satisfactory inter- and intra-tester reliability indices in different positions. ICCs for intra-tester reliability ranged from 0.65 to 0.98 and 0.59 to 0.81 for healthy and patient participants, respectively. Also, ICCs for inter-tester reliability ranged from 0.65 to 0.92 for the healthy and 0.65 to 0.87 for patient participants. In general, it can be inferred from the results that measuring the kinematic parameters in lumbar movements using inertial sensors enjoys acceptable reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Chiang, Hsin-Yu; Lu, Wen-Shian; Yu, Wan-Hui; Hsueh, I-Ping; Hsieh, Ching-Lin
2018-04-11
To examine the interrater and intrarater reliability of the Balance Computerized Adaptive Test (Balance CAT) in patients with chronic stroke having a wide range of balance functions. Repeated assessments design (1wk apart). Seven teaching hospitals. A pooled sample (N=102) including 2 independent groups of outpatients (n=50 for the interrater reliability study; n=52 for the intrarater reliability study) with chronic stroke. Not applicable. Balance CAT. For the interrater reliability study, the values of intraclass correlation coefficient, minimal detectable change (MDC), and percentage of MDC (MDC%) for the Balance CAT were .84, 1.90, and 31.0%, respectively. For the intrarater reliability study, the values of intraclass correlation coefficient, MDC, and MDC% ranged from .89 to .91, from 1.14 to 1.26, and from 17.1% to 18.6%, respectively. The Balance CAT showed sufficient intrarater reliability in patients with chronic stroke having balance functions ranging from sitting with support to independent walking. Although the Balance CAT may have good interrater reliability, we found substantial random measurement error between different raters. Accordingly, if the Balance CAT is used as an outcome measure in clinical or research settings, same raters are suggested over different time points to ensure reliable assessments. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Taghipour, Morteza; Mohseni-Bandpei, Mohammad Ali; Behtash, Hamid; Abdollahi, Iraj; Rajabzadeh, Fatemeh; Pourahmadi, Mohammad Reza; Emami, Mahnaz
2018-04-24
Rehabilitative ultrasound (US) imaging is one of the popular methods for investigating muscle morphologic characteristics and dimensions in recent years. The reliability of this method has been investigated in different studies. As studies have been performed with different designs and quality, reported values of rehabilitative US have a wide range. The objective of this study was to systematically review the literature conducted on the reliability of rehabilitative US imaging for the assessment of deep abdominal and lumbar trunk muscle dimensions. The PubMed/MEDLINE, Scopus, Google Scholar, Science Direct, Embase, Physiotherapy Evidence, Ovid, and CINAHL databases were searched to identify original research articles conducted on the reliability of rehabilitative US imaging published from June 2007 to August 2017. The articles were qualitatively assessed; reliability data were extracted; and the methodological quality was evaluated by 2 independent reviewers. Of the 26 included studies, 16 were considered of high methodological quality. Except for 2 studies, all high-quality studies reported intraclass correlation coefficients (ICCs) for intra-rater reliability of 0.70 or greater. Also, ICCs reported for inter-rater reliability in high-quality studies were generally greater than 0.70. Among low-quality studies, reported ICCs ranged from 0.26 to 0.99 and 0.68 to 0.97 for intra- and inter-rater reliability, respectively. Also, the reported standard error of measurement and minimal detectable change for rehabilitative US were generally in an acceptable range. Generally, the results of the reviewed studies indicate that rehabilitative US imaging has good levels of both inter- and intra-rater reliability. © 2018 by the American Institute of Ultrasound in Medicine.
Test Reliability at the Individual Level
Hu, Yueqin; Nesselroade, John R.; Erbacher, Monica K.; Boker, Steven M.; Burt, S. Alexandra; Keel, Pamela K.; Neale, Michael C.; Sisk, Cheryl L.; Klump, Kelly
2016-01-01
Reliability has a long history as one of the key psychometric properties of a test. However, a given test might not measure people equally reliably. Test scores from some individuals may have considerably greater error than others. This study proposed two approaches using intraindividual variation to estimate test reliability for each person. A simulation study suggested that the parallel tests approach and the structural equation modeling approach recovered the simulated reliability coefficients. Then in an empirical study, where forty-five females were measured daily on the Positive and Negative Affect Schedule (PANAS) for 45 consecutive days, separate estimates of reliability were generated for each person. Results showed that reliability estimates of the PANAS varied substantially from person to person. The methods provided in this article apply to tests measuring changeable attributes and require repeated measures across time on each individual. This article also provides a set of parallel forms of PANAS. PMID:28936107
Reliability of infrared thermometric measurements of skin temperature in the hand.
Packham, Tara L; Fok, Diana; Frederiksen, Karen; Thabane, Lehana; Buckley, Norman
2012-01-01
Clinical measurement study. Skin temperature asymmetries (STAs) are used in the diagnosis of complex regional pain syndrome (CRPS), but little evidence exists for reliability of the equipment and methods. This study examined the reliability of an inexpensive infrared (IR) thermometer and measurement points in the hand for the study of STA. ST was measured three times at five points on both hands with an IR thermometer by two raters in 20 volunteers (12 normals and 8 CRPS). ST measurement results using IR thermometers support inter-rater reliability: intraclass correlation coefficient (ICC) estimate for single measures 0.80; all ST measurement points were also highly reliable (ICC single measures, 0.83-0.91). The equipment demonstrated excellent reliability, with little difference in the reliability of the five measurement sites. These preliminary findings support their use in future CRPS research. Not applicable. Copyright © 2012 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
2016-03-01
A BOUNCE? A STUDY ON RESILIENCE AND HUMAN RELATIONS IN A HIGH RELIABILITY ORGANIZATION by Robert D. Johns March 2016 Thesis Advisor...RELATIONS IN A HIGH RELIABILITY ORGANIZATION 5. FUNDING NUMBERS 6. AUTHOR(S) Robert D. Johns 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES...200 words) This study analyzes the various resilience factors associated with a military high reliability organization (HRO). The data measuring
Bottema-Beutel, Kristen; Lloyd, Blair; Carter, Erik W; Asmus, Jennifer M
2014-11-01
Attaining reliable estimates of observational measures can be challenging in school and classroom settings, as behavior can be influenced by multiple contextual factors. Generalizability (G) studies can enable researchers to estimate the reliability of observational data, and decision (D) studies can inform how many observation sessions are necessary to achieve a criterion level of reliability. We conducted G and D studies using observational data from a randomized control trial focusing on social and academic participation of students with severe disabilities in inclusive secondary classrooms. Results highlight the importance of anchoring observational decisions to reliability estimates from existing or pilot data sets. We outline steps for conducting G and D studies and address options when reliability estimates are lower than desired.
Assessing reliability and validity measures in managed care studies.
Montoya, Isaac D
2003-01-01
To review the reliability and validity literature and develop an understanding of these concepts as applied to managed care studies. Reliability is a test of how well an instrument measures the same input at varying times and under varying conditions. Validity is a test of how accurately an instrument measures what one believes is being measured. A review of reliability and validity instructional material was conducted. Studies of managed care practices and programs abound. However, many of these studies utilize measurement instruments that were developed for other purposes or for a population other than the one being sampled. In other cases, instruments have been developed without any testing of the instrument's performance. The lack of reliability and validity information may limit the value of these studies. This is particularly true when data are collected for one purpose and used for another. The usefulness of certain studies without reliability and validity measures is questionable, especially in cases where the literature contradicts itself
The reliability of WorkWell Systems Functional Capacity Evaluation: a systematic review
2014-01-01
Background Functional capacity evaluation (FCE) determines a person’s ability to perform work-related tasks and is a major component of the rehabilitation process. The WorkWell Systems (WWS) FCE (formerly known as Isernhagen Work Systems FCE) is currently the most commonly used FCE tool in German rehabilitation centres. Our systematic review investigated the inter-rater, intra-rater and test-retest reliability of the WWS FCE. Methods We performed a systematic literature search of studies on the reliability of the WWS FCE and extracted item-specific measures of inter-rater, intra-rater and test-retest reliability from the identified studies. Intraclass correlation coefficients ≥ 0.75, percentages of agreement ≥ 80%, and kappa coefficients ≥ 0.60 were categorised as acceptable, otherwise they were considered non-acceptable. The extracted values were summarised for the five performance categories of the WWS FCE, and the results were classified as either consistent or inconsistent. Results From 11 identified studies, 150 item-specific reliability measures were extracted. 89% of the extracted inter-rater reliability measures, all of the intra-rater reliability measures and 96% of the test-retest reliability measures of the weight handling and strength tests had an acceptable level of reliability, compared to only 67% of the test-retest reliability measures of the posture/mobility tests and 56% of the test-retest reliability measures of the locomotion tests. Both of the extracted test-retest reliability measures of the balance test were acceptable. Conclusions Weight handling and strength tests were found to have consistently acceptable reliability. Further research is needed to explore the reliability of the other tests as inconsistent findings or a lack of data prevented definitive conclusions. PMID:24674029
Factors Influencing the Reliability of the Glasgow Coma Scale: A Systematic Review.
Reith, Florence Cm; Synnot, Anneliese; van den Brande, Ruben; Gruen, Russell L; Maas, Andrew Ir
2017-06-01
The Glasgow Coma Scale (GCS) characterizes patients with diminished consciousness. In a recent systematic review, we found overall adequate reliability across different clinical settings, but reliability estimates varied considerably between studies, and methodological quality of studies was overall poor. Identifying and understanding factors that can affect its reliability is important, in order to promote high standards for clinical use of the GCS. The aim of this systematic review was to identify factors that influence reliability and to provide an evidence base for promoting consistent and reliable application of the GCS. A comprehensive literature search was undertaken in MEDLINE, EMBASE, and CINAHL from 1974 to July 2016. Studies assessing the reliability of the GCS in adults or describing any factor that influences reliability were included. Two reviewers independently screened citations, selected full texts, and undertook data extraction and critical appraisal. Methodological quality of studies was evaluated with the consensus-based standards for the selection of health measurement instruments checklist. Data were synthesized narratively and presented in tables. Forty-one studies were included for analysis. Factors identified that may influence reliability are education and training, the level of consciousness, and type of stimuli used. Conflicting results were found for experience of the observer, the pathology causing the reduced consciousness, and intubation/sedation. No clear influence was found for the professional background of observers. Reliability of the GCS is influenced by multiple factors and as such is context dependent. This review points to the potential for improvement from training and education and standardization of assessment methods, for which recommendations are presented. Copyright © 2017 by the Congress of Neurological Surgeons.
Feijen, Stef; Kuppens, Kevin; Tate, Angela; Baert, Isabel; Struyf, Thomas; Struyf, Filip
2018-04-17
Measuring thoracic spine mobility can be of interest to competitive swimmers as it has been associated with shoulder girdle function and scapular position in subjects with and without shoulder pain. At present, no reliability data of thoracic spine mobility measurements are available in the swimming population. This study aims to evaluate the within-session intra- and interrater reliability of the "lumbar-locked rotation test" for thoracic spine rotation in competitive swimmers aged 10 to 18 years. This reliability study is part of a larger prospective cohort study investigating potential risk factors for the development of shoulder pain in competitive swimmers. Within-session, intra- and inter-rater reliability. Competitive swimming clubs in Belgium. 21 competitive swimmers. Intra- and inter-rater reliability of the lumbar-locked thoracic rotation test. Intraclass correlation coefficients (ICCs) ranged from 0.91 (95% CI 0.78 to 0.96) to 0.96 (0.89-0.98) for intra-rater reliability. Results for inter-rater reliability ranged from 0.89 (0.72-0.95) to 0.86 (0.65-0.94) respectively for right and left thoracic rotation. Results suggest good to excellent reliability of the lumbar-locked thoracic rotation test, indicating this test can be used reliably in clinical practice. Copyright © 2018 Elsevier Ltd. All rights reserved.
Hulteen, Ryan M; Lander, Natalie J; Morgan, Philip J; Barnett, Lisa M; Robertson, Samuel J; Lubans, David R
2015-10-01
It has been suggested that young people should develop competence in a variety of 'lifelong physical activities' to ensure that they can be active across the lifespan. The primary aim of this systematic review is to report the methodological properties, validity, reliability, and test duration of field-based measures that assess movement skill competency in lifelong physical activities. A secondary aim was to clearly define those characteristics unique to lifelong physical activities. A search of four electronic databases (Scopus, SPORTDiscus, ProQuest, and PubMed) was conducted between June 2014 and April 2015 with no date restrictions. Studies addressing the validity and/or reliability of lifelong physical activity tests were reviewed. Included articles were required to assess lifelong physical activities using process-oriented measures, as well as report either one type of validity or reliability. Assessment criteria for methodological quality were adapted from a checklist used in a previous review of sport skill outcome assessments. Movement skill assessments for eight different lifelong physical activities (badminton, cycling, dance, golf, racquetball, resistance training, swimming, and tennis) in 17 studies were identified for inclusion. Methodological quality, validity, reliability, and test duration (time to assess a single participant), for each article were assessed. Moderate to excellent reliability results were found in 16 of 17 studies, with 71% reporting inter-rater reliability and 41% reporting intra-rater reliability. Only four studies in this review reported test-retest reliability. Ten studies reported validity results; content validity was cited in 41% of these studies. Construct validity was reported in 24% of studies, while criterion validity was only reported in 12% of studies. Numerous assessments for lifelong physical activities may exist, yet only assessments for eight lifelong physical activities were included in this review. Generalizability of results may be more applicable if more heterogeneous samples are used in future research. Moderate to excellent levels of inter- and intra-rater reliability were reported in the majority of studies. However, future work should look to establish test-retest reliability. Validity was less commonly reported than reliability, and further types of validity other than content validity need to be established in future research. Specifically, predictive validity of 'lifelong physical activity' movement skill competency is needed to support the assertion that such activities provide the foundation for a lifetime of activity.
NASA Astrophysics Data System (ADS)
Li, Lin; Zeng, Li; Lin, Zi-Jing; Cazzell, Mary; Liu, Hanli
2015-05-01
Test-retest reliability of neuroimaging measurements is an important concern in the investigation of cognitive functions in the human brain. To date, intraclass correlation coefficients (ICCs), originally used in inter-rater reliability studies in behavioral sciences, have become commonly used metrics in reliability studies on neuroimaging and functional near-infrared spectroscopy (fNIRS). However, as there are six popular forms of ICC, the adequateness of the comprehensive understanding of ICCs will affect how one may appropriately select, use, and interpret ICCs toward a reliability study. We first offer a brief review and tutorial on the statistical rationale of ICCs, including their underlying analysis of variance models and technical definitions, in the context of assessment on intertest reliability. Second, we provide general guidelines on the selection and interpretation of ICCs. Third, we illustrate the proposed approach by using an actual research study to assess intertest reliability of fNIRS-based, volumetric diffuse optical tomography of brain activities stimulated by a risk decision-making protocol. Last, special issues that may arise in reliability assessment using ICCs are discussed and solutions are suggested.
Reliability of 3D laser-based anthropometry and comparison with classical anthropometry.
Kuehnapfel, Andreas; Ahnert, Peter; Loeffler, Markus; Broda, Anja; Scholz, Markus
2016-05-26
Anthropometric quantities are widely used in epidemiologic research as possible confounders, risk factors, or outcomes. 3D laser-based body scans (BS) allow evaluation of dozens of quantities in short time with minimal physical contact between observers and probands. The aim of this study was to compare BS with classical manual anthropometric (CA) assessments with respect to feasibility, reliability, and validity. We performed a study on 108 individuals with multiple measurements of BS and CA to estimate intra- and inter-rater reliabilities for both. We suggested BS equivalents of CA measurements and determined validity of BS considering CA the gold standard. Throughout the study, the overall concordance correlation coefficient (OCCC) was chosen as indicator of agreement. BS was slightly more time consuming but better accepted than CA. For CA, OCCCs for intra- and inter-rater reliability were greater than 0.8 for all nine quantities studied. For BS, 9 of 154 quantities showed reliabilities below 0.7. BS proxies for CA measurements showed good agreement (minimum OCCC > 0.77) after offset correction. Thigh length showed higher reliability in BS while upper arm length showed higher reliability in CA. Except for these issues, reliabilities of CA measurements and their BS equivalents were comparable.
Cutolo, Maurizio; Vanhaecke, Amber; Ruaro, Barbara; Deschepper, Ellen; Ickinger, Claudia; Melsens, Karin; Piette, Yves; Trombetta, Amelia Chiara; De Keyser, Filip; Smith, Vanessa
2018-06-06
A reliable tool to evaluate flow is paramount in systemic sclerosis (SSc). We describe herein on the one hand a systematic literature review on the reliability of laser speckle contrast analysis (LASCA) to measure the peripheral blood perfusion (PBP) in SSc and perform an additional pilot study, investigating the intra- and inter-rater reliability of LASCA. A systematic search was performed in 3 electronic databases, according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. In the pilot study, 30 SSc patients and 30 healthy subjects (HS) underwent LASCA assessment. Intra-rater reliability was assessed by having a first anchor rater performing the measurements at 2 time-points and inter-rater reliability by having the anchor rater and a team of second raters performing the measurements in 15 SSc and 30 HS. The measurements were repeated with a second anchor rater in the other 15 SSc patients, as external validation. Only 1 of the 14 records of interest identified through the systematic search was included in the final analysis. In the additional pilot study: intra-class correlation coefficient (ICC) for intra-rater reliability of the first anchor rater was 0.95 in SSc and 0.93 in HS, the ICC for inter-rater reliability was 0.97 in SSc and 0.93 in HS. Intra- and inter-rater reliability of the second anchor rater was 0.78 and 0.87. The identified literature regarding the reliability of LASCA measurements reports good to excellent inter-rater agreement. This very pilot study could confirm the reliability of LASCA measurements with good to excellent inter-rater agreement and found additionally good to excellent intra-rater reliability. Furthermore, similar results were found in the external validation. Copyright © 2018. Published by Elsevier B.V.
1975-04-01
to 2-16 Category 3: Fabrie’ition Methods and Techniques 3-01 to 3-21 Category 4: ReliabiLity Studies 4-01 to 4-15 Category 5: C,,rputeiized Analysis...RAC icrodrcuit Thesaurus. The ternis are arranged in alphabetical order with sub-term description followinti each main term. Cosvreferencing is...Reliability aspects of vrocircuit manufacturi’. 4. Reliability Studies : Technics) reports !:datig to ;ormal ve isbbty studies and investi- sations
[Study of the relationship between human quality and reliability].
Long, S; Wang, C; Wang, L i; Yuan, J; Liu, H; Jiao, X
1997-02-01
To clarify the relationship between human quality and reliability, 1925 experiments in 20 subjects were carried out to study the relationship between disposition character, digital memory, graphic memory, multi-reaction time and education level and simulated aircraft operation. Meanwhile, effects of task difficulty and enviromental factor on human reliability were also studied. The results showed that human quality can be predicted and evaluated through experimental methods. The better the human quality, the higher the human reliability.
McCreesh, Karen M; Crotty, James M; Lewis, Jeremy S
2015-03-01
Narrowing of the subacromial space has been noted as a common feature of rotator cuff (RC) tendinopathy. It has been implicated in the development of symptoms and forms the basis for some surgical and rehabilitation approaches. Various radiological methods have been used to measure the subacromial space, which is represented by a two-dimensional measurement of acromiohumeral distance (AHD). A reliable method of measurement could be used to assess the impact of rehabilitation or surgical interventions for RC tendinopathy; however, there are no published reviews assessing the reliability of AHD measurement. The aim of this review was to systematically assess the evidence for the intrarater and inter-rater reliability of radiological methods of measuring AHD, in order to identify the most reliable method for use in RC tendinopathy. An electronic literature search was carried out and studies describing the reliability of any radiological method of measuring AHD in either healthy or RC tendinopathy groups were included. Eighteen studies met the inclusion criteria and were appraised by two reviewers using the Quality Appraisal for reliability Studies checklist. Eight studies were deemed to be of high methodological quality. Study weaknesses included lack of tester blinding, inadequate description of tester experience, lack of inclusion of symptomatic populations, poor reporting of statistical methods and unclear diagnosis. There was strong evidence for the reliability of ultrasound for measuring AHD, with moderate evidence for MRI and CT measures and conflicting evidence for radiographic methods. Overall, there was lack of research in RC tendinopathy populations, with only six studies including participants with shoulder pain. The results support the reliability of ultrasound and CT or MRI for the measurement of AHD; however, more studies in symptomatic populations are required. The reliability of AHD measurement using radiographs has not been supported by the studies reviewed. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Study of complete interconnect reliability for a GaAs MMIC power amplifier
NASA Astrophysics Data System (ADS)
Lin, Qian; Wu, Haifeng; Chen, Shan-ji; Jia, Guoqing; Jiang, Wei; Chen, Chao
2018-05-01
By combining the finite element analysis (FEA) and artificial neural network (ANN) technique, the complete prediction of interconnect reliability for a monolithic microwave integrated circuit (MMIC) power amplifier (PA) at the both of direct current (DC) and alternating current (AC) operation conditions is achieved effectively in this article. As a example, a MMIC PA is modelled to study the electromigration failure of interconnect. This is the first time to study the interconnect reliability for an MMIC PA at the conditions of DC and AC operation simultaneously. By training the data from FEA, a high accuracy ANN model for PA reliability is constructed. Then, basing on the reliability database which is obtained from the ANN model, it can give important guidance for improving the reliability design for IC.
What to Do With "Moderate" Reliability and Validity Coefficients?
Post, Marcel W
2016-07-01
Clinimetric studies may use criteria for test-retest reliability and convergent validity such that correlation coefficients as low as .40 are supportive of reliability and validity. It can be argued that moderate (.40-.60) correlations should not be interpreted in this way and that reliability coefficients <.70 should be considered as indicative of unreliability. Convergent validity coefficients in the .40 to .60 or .40 to .70 range should be considered as indications of validity problems, or as inconclusive at best. Studies on reliability and convergent should be designed in such a way that it is realistic to expect high reliability and validity coefficients. Multitrait multimethod approaches are preferred to study construct (convergent-divergent) validity. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Zaki, Rafdzah; Bulgiba, Awang; Nordin, Noorhaire; Azina Ismail, Noor
2013-06-01
Reliability measures precision or the extent to which test results can be replicated. This is the first ever systematic review to identify statistical methods used to measure reliability of equipment measuring continuous variables. This studyalso aims to highlight the inappropriate statistical method used in the reliability analysis and its implication in the medical practice. In 2010, five electronic databases were searched between 2007 and 2009 to look for reliability studies. A total of 5,795 titles were initially identified. Only 282 titles were potentially related, and finally 42 fitted the inclusion criteria. The Intra-class Correlation Coefficient (ICC) is the most popular method with 25 (60%) studies having used this method followed by the comparing means (8 or 19%). Out of 25 studies using the ICC, only 7 (28%) reported the confidence intervals and types of ICC used. Most studies (71%) also tested the agreement of instruments. This study finds that the Intra-class Correlation Coefficient is the most popular method used to assess the reliability of medical instruments measuring continuous outcomes. There are also inappropriate applications and interpretations of statistical methods in some studies. It is important for medical researchers to be aware of this issue, and be able to correctly perform analysis in reliability studies.
Scale for positive aspects of caregiving experience: development, reliability, and factor structure.
Kate, N; Grover, S; Kulhara, P; Nehra, R
2012-06-01
OBJECTIVE. To develop an instrument (Scale for Positive Aspects of Caregiving Experience [SPACE]) that evaluates positive caregiving experience and assess its psychometric properties. METHODS. Available scales which assess some aspects of positive caregiving experience were reviewed and a 50-item questionnaire with a 5-point rating was constructed. In all, 203 primary caregivers of patients with severe mental disorders were asked to complete the questionnaire. Internal consistency, test-retest reliability, cross-language reliability, split-half reliability, and face validity were evaluated. Principal component factor analysis was run to assess the factorial validity of the scale. RESULTS. The scale developed as part of the study was found to have good internal consistency, test-retest reliability, cross-language reliability, split-half reliability, and face validity. Principal component factor analysis yielded a 4-factor structure, which also had good test-retest reliability and cross-language reliability. There was a strong correlation between the 4 factors obtained. CONCLUSION. The SPACE developed as part of this study has good psychometric properties.
Acute Respiratory Distress Syndrome Measurement Error. Potential Effect on Clinical Study Results
Cooke, Colin R.; Iwashyna, Theodore J.; Hofer, Timothy P.
2016-01-01
Rationale: Identifying patients with acute respiratory distress syndrome (ARDS) is a recognized challenge. Experts often have only moderate agreement when applying the clinical definition of ARDS to patients. However, no study has fully examined the implications of low reliability measurement of ARDS on clinical studies. Objectives: To investigate how the degree of variability in ARDS measurement commonly reported in clinical studies affects study power, the accuracy of treatment effect estimates, and the measured strength of risk factor associations. Methods: We examined the effect of ARDS measurement error in randomized clinical trials (RCTs) of ARDS-specific treatments and cohort studies using simulations. We varied the reliability of ARDS diagnosis, quantified as the interobserver reliability (κ-statistic) between two reviewers. In RCT simulations, patients identified as having ARDS were enrolled, and when measurement error was present, patients without ARDS could be enrolled. In cohort studies, risk factors as potential predictors were analyzed using reviewer-identified ARDS as the outcome variable. Measurements and Main Results: Lower reliability measurement of ARDS during patient enrollment in RCTs seriously degraded study power. Holding effect size constant, the sample size necessary to attain adequate statistical power increased by more than 50% as reliability declined, although the result was sensitive to ARDS prevalence. In a 1,400-patient clinical trial, the sample size necessary to maintain similar statistical power increased to over 1,900 when reliability declined from perfect to substantial (κ = 0.72). Lower reliability measurement diminished the apparent effectiveness of an ARDS-specific treatment from a 15.2% (95% confidence interval, 9.4–20.9%) absolute risk reduction in mortality to 10.9% (95% confidence interval, 4.7–16.2%) when reliability declined to moderate (κ = 0.51). In cohort studies, the effect on risk factor associations was similar. Conclusions: ARDS measurement error can seriously degrade statistical power and effect size estimates of clinical studies. The reliability of ARDS measurement warrants careful attention in future ARDS clinical studies. PMID:27159648
NASA Astrophysics Data System (ADS)
Miao, Yongchun; Kang, Rongxue; Chen, Xuefeng
2017-12-01
In recent years, with the gradual extension of reliability research, the study of production system reliability has become the hot topic in various industries. Man-machine-environment system is a complex system composed of human factors, machinery equipment and environment. The reliability of individual factor must be analyzed in order to gradually transit to the research of three-factor reliability. Meanwhile, the dynamic relationship among man-machine-environment should be considered to establish an effective blurry evaluation mechanism to truly and effectively analyze the reliability of such systems. In this paper, based on the system engineering, fuzzy theory, reliability theory, human error, environmental impact and machinery equipment failure theory, the reliabilities of human factor, machinery equipment and environment of some chemical production system were studied by the method of fuzzy evaluation. At last, the reliability of man-machine-environment system was calculated to obtain the weighted result, which indicated that the reliability value of this chemical production system was 86.29. Through the given evaluation domain it can be seen that the reliability of man-machine-environment integrated system is in a good status, and the effective measures for further improvement were proposed according to the fuzzy calculation results.
Mai, Zhi-Ming; Lin, Jia-Huang; Chiang, Shing-Chun; Ngan, Roger Kai-Cheong; Kwong, Dora Lai-Wan; Ng, Wai-Tong; Ng, Alice Wan-Ying; Yuen, Kam-Tong; Ip, Kai-Ming; Chan, Yap-Hang; Lee, Anne Wing-Mui; Ho, Sai-Yin; Lung, Maria Li; Lam, Tai-Hing
2018-05-04
We evaluated the reliability of early life nasopharyngeal carcinoma (NPC) aetiology factors in the questionnaire of an NPC case-control study in Hong Kong during 2014-2017. 140 subjects aged 18+ completed the same computer-assisted questionnaire twice, separated by at least 2 weeks. The questionnaire included most known NPC aetiology factors and the present analysis focused on early life exposure. Test-retest reliability of all the 285 questionnaire items was assessed in all subjects and in 5 subgroups defined by cases/controls, sex, time between 1 st and 2 nd questionnaire (2-29/≥30 weeks), education (secondary or less/postsecondary), and age (25-44/45-59/60+ years) at the first questionnaire. The reliability of items on dietary habits, body figure, skin tone and sun exposure in early life periods (age 6-12 and 13-18) was moderate-to-almost perfect, and most other items had fair-to-substantial reliability in all life periods (age 6-12, 13-18 and 19-30, and 10 years ago). Differences in reliability by strata of the 5 subgroups were only observed in a few items. This study is the first to report the reliability of an NPC questionnaire, and make the questionnaire available online. Overall, our questionnaire had acceptable reliability, suggesting that previous NPC study results on the same risk factors would have similar reliability.
Reliability of joint count assessment in rheumatoid arthritis: a systematic literature review.
Cheung, Peter P; Gossec, Laure; Mak, Anselm; March, Lyn
2014-06-01
Joint counts are central to the assessment of rheumatoid arthritis (RA) but reliability is an issue. To evaluate the reliability and agreement of joint counts (intra-observer and inter-observer) by health care professionals (physicians, nurses, and metrologists) and patients in RA, and the impact of training and standardization on joint count reliability through a systematic literature review. Articles reporting joint count reliability or agreement in RA in PubMed, EMBase, and the Cochrane library between 1960 and 2012 were selected. Data were extracted regarding tender joint counts (TJCs) and swollen joint counts (SJCs) derived by physicians, metrologists, or patients for intra-observer and inter-observer reliability. In addition, methods and effects of training or standardization were extracted. Statistics expressing reliability such as intraclass correlation coefficients (ICCs) were extracted. Data analysis was primarily descriptive due to high heterogeneity. Twenty-eight studies on health care professionals (HCP) and 20 studies on patients were included. Intra-observer reliability for TJCs and SJCs was good for HCPs and patients (range of ICC: 0.49-0.98). Inter-observer reliability between HCPs for TJCs was higher than for SJCs (range of ICC: 0.64-0.88 vs. 0.29-0.98). Patient inter-observer reliability with HCPs as comparators was better for TJCs (range of ICC: 0.31-0.91) compared to SJCs (0.16-0.64). Nine studies (7 with HCPs and 2 with patients) evaluated consensus or training, with improvement in reliability of TJCs but conflicting evidence for SJCs. Intra- and inter-observer reliability was high for TJCs for HCPs and patients: among all groups, reliability was better for TJCs than SJCs. Inter-observer reliability of SJCs was poorer for patients than HCPs. Data were inconclusive regarding the potential for training to improve SJC reliability. Overall, the results support further evaluation for patient-reported joint counts as an outcome measure. © 2013 Published by Elsevier Inc.
The Handbook of Medical Image Perception and Techniques
NASA Astrophysics Data System (ADS)
Samei, Ehsan; Krupinski, Elizabeth
2014-07-01
1. Medical image perception Ehsan Samei and Elizabeth Krupinski; Part I. Historical Reflections and Theoretical Foundations: 2. A short history of image perception in medical radiology Harold Kundel and Calvin Nodine; 3. Spatial vision research without noise Arthur Burgess; 4. Signal detection theory, a brief history Arthur Burgess; 5. Signal detection in radiology Arthur Burgess; 6. Lessons from dinners with the giants of modern image science Robert Wagner; Part II. Science of Image Perception: 7. Perceptual factors in reading medical images Elizabeth Krupinski; 8. Cognitive factors in reading medical images David Manning; 9. Satisfaction of search in traditional radiographic imaging Kevin Berbaum, Edmund Franken, Robert Caldwell and Kevin Schartz; 10. The role of expertise in radiologic image interpretation Calvin Nodine and Claudia Mello-Thoms; 11. A primer of image quality and its perceptual relevance Robert Saunders and Ehsan Samei; 12. Beyond the limitations of human vision Maria Petrou; Part III. Perception Metrology: 13. Logistical issues in designing perception experiments Ehsan Samei and Xiang Li; 14. ROC analysis: basic concepts and practical applications Georgia Tourassi; 15. Multi-reader ROC Steve Hillis; 16. Recent developments in FROC methodology Dev Chakraborty; 17. Observer models as a surrogate to perception experiments Craig Abbey and Miguel Eckstein; 18. Implementation of observer models Matthew Kupinski; Part IV. Decision Support and Computer Aided Detection: 19. CAD: an image perception perspective Maryellen Giger and Weijie Chen; 20. Common designs of CAD studies Yulei Jiang; 21. Perceptual effect of CAD in reading chest images Matthew Freedman and Teresa Osicka; 22. Perceptual issues in mammography and CAD Michael Ulissey; 23. How perceptual factors affect the use and accuracy of CAD for interpretation of CT images Ronald Summers; 24. CAD: risks and benefits for radiologists' decisions Eugenio Alberdi, Andrey Povyakalo, Lorenzo Strigini and Peter Ayton; Part V. Optimization and Practical Issues: 25. Optimization of 2D and 3D radiographic systems Jeff Siewerdson; 26. Applications of AFC methodology in optimization of CT imaging systems Kent Ogden and Walter Huda; 27. Perceptual issues in reading mammograms Margarita Zuley; 28. Perceptual optimization of display processing techniques Richard Van Metter; 29. Optimization of display systems Elizabeth Krupinski and Hans Roehrig; 30. Ergonomic radiologist workplaces in the PACS environment Carl Zylack; Part VI. Epilogue: 31. Future prospects of medical image perception Ehsan Samei and Elizabeth Krupinski; Index.
The Handbook of Medical Image Perception and Techniques
NASA Astrophysics Data System (ADS)
Samei, Ehsan; Krupinski, Elizabeth
2009-12-01
1. Medical image perception Ehsan Samei and Elizabeth Krupinski; Part I. Historical Reflections and Theoretical Foundations: 2. A short history of image perception in medical radiology Harold Kundel and Calvin Nodine; 3. Spatial vision research without noise Arthur Burgess; 4. Signal detection theory, a brief history Arthur Burgess; 5. Signal detection in radiology Arthur Burgess; 6. Lessons from dinners with the giants of modern image science Robert Wagner; Part II. Science of Image Perception: 7. Perceptual factors in reading medical images Elizabeth Krupinski; 8. Cognitive factors in reading medical images David Manning; 9. Satisfaction of search in traditional radiographic imaging Kevin Berbaum, Edmund Franken, Robert Caldwell and Kevin Schartz; 10. The role of expertise in radiologic image interpretation Calvin Nodine and Claudia Mello-Thoms; 11. A primer of image quality and its perceptual relevance Robert Saunders and Ehsan Samei; 12. Beyond the limitations of human vision Maria Petrou; Part III. Perception Metrology: 13. Logistical issues in designing perception experiments Ehsan Samei and Xiang Li; 14. ROC analysis: basic concepts and practical applications Georgia Tourassi; 15. Multi-reader ROC Steve Hillis; 16. Recent developments in FROC methodology Dev Chakraborty; 17. Observer models as a surrogate to perception experiments Craig Abbey and Miguel Eckstein; 18. Implementation of observer models Matthew Kupinski; Part IV. Decision Support and Computer Aided Detection: 19. CAD: an image perception perspective Maryellen Giger and Weijie Chen; 20. Common designs of CAD studies Yulei Jiang; 21. Perceptual effect of CAD in reading chest images Matthew Freedman and Teresa Osicka; 22. Perceptual issues in mammography and CAD Michael Ulissey; 23. How perceptual factors affect the use and accuracy of CAD for interpretation of CT images Ronald Summers; 24. CAD: risks and benefits for radiologists' decisions Eugenio Alberdi, Andrey Povyakalo, Lorenzo Strigini and Peter Ayton; Part V. Optimization and Practical Issues: 25. Optimization of 2D and 3D radiographic systems Jeff Siewerdson; 26. Applications of AFC methodology in optimization of CT imaging systems Kent Ogden and Walter Huda; 27. Perceptual issues in reading mammograms Margarita Zuley; 28. Perceptual optimization of display processing techniques Richard Van Metter; 29. Optimization of display systems Elizabeth Krupinski and Hans Roehrig; 30. Ergonomic radiologist workplaces in the PACS environment Carl Zylack; Part VI. Epilogue: 31. Future prospects of medical image perception Ehsan Samei and Elizabeth Krupinski; Index.
NASA Applications and Lessons Learned in Reliability Engineering
NASA Technical Reports Server (NTRS)
Safie, Fayssal M.; Fuller, Raymond P.
2011-01-01
Since the Shuttle Challenger accident in 1986, communities across NASA have been developing and extensively using quantitative reliability and risk assessment methods in their decision making process. This paper discusses several reliability engineering applications that NASA has used over the year to support the design, development, and operation of critical space flight hardware. Specifically, the paper discusses several reliability engineering applications used by NASA in areas such as risk management, inspection policies, components upgrades, reliability growth, integrated failure analysis, and physics based probabilistic engineering analysis. In each of these areas, the paper provides a brief discussion of a case study to demonstrate the value added and the criticality of reliability engineering in supporting NASA project and program decisions to fly safely. Examples of these case studies discussed are reliability based life limit extension of Shuttle Space Main Engine (SSME) hardware, Reliability based inspection policies for Auxiliary Power Unit (APU) turbine disc, probabilistic structural engineering analysis for reliability prediction of the SSME alternate turbo-pump development, impact of ET foam reliability on the Space Shuttle System risk, and reliability based Space Shuttle upgrade for safety. Special attention is given in this paper to the physics based probabilistic engineering analysis applications and their critical role in evaluating the reliability of NASA development hardware including their potential use in a research and technology development environment.
NASA Technical Reports Server (NTRS)
Karns, James
1993-01-01
The objective of this study was to establish the initial quantitative reliability bounds for nuclear electric propulsion systems in a manned Mars mission required to ensure crew safety and mission success. Finding the reliability bounds involves balancing top-down (mission driven) requirements and bottom-up (technology driven) capabilities. In seeking this balance we hope to accomplish the following: (1) provide design insights into the achievability of the baseline design in terms of reliability requirements, given the existing technology base; (2) suggest alternative design approaches which might enhance reliability and crew safety; and (3) indicate what technology areas require significant research and development to achieve the reliability objectives.
A systematic review of the factor structure and reliability of the Spence Children's Anxiety Scale.
Orgilés, Mireia; Fernández-Martínez, Iván; Guillén-Riquelme, Alejandro; Espada, José P; Essau, Cecilia A
2016-01-15
The Spence Children's Anxiety Scale (SCAS) is a widely used instrument for assessing symptoms of anxiety disorders among children and adolescents. Previous studies have demonstrated its good reliability for children and adolescents from different backgrounds. However, remarkable variability in the reliability of the SCAS across studies and inconsistent results regarding its factor structure has been found. The present study aims to examine the SCAS factor structure by means of a systematic review with narrative synthesis, the mean reliability of the SCAS by means of a meta-analysis, and the influence of the moderators on the SCAS reliability. Databases employed to collect the studies included Scholar Google, PsycARTICLES, PsycINFO, Web of Science, and Scopus since 1997. Twenty-nine and 32 studies, which examined the factor structure and the internal consistency of the SCAS, respectively, were included. The SCAS was found to have strong internal consistency, influenced by different moderators. The systematic review demonstrated that the original six-factor model was supported by most studies. Factorial invariance studies (across age, gender, country) and test-retest reliability of the SCAS were not examined in this study. It is concluded that the SCAS is a reliable instrument for cross-cultural use, and it is suggested that the original six-factor model is appropriate for cross-cultural application. Copyright © 2015 Elsevier B.V. All rights reserved.
Rosa-Rizzotto, M; Visonà Dalla Pozza, L; Corlatti, A; Luparia, A; Marchi, A; Molteni, F; Facchin, P; Pagliano, E; Fedrizzi, E
2014-10-01
In hemiplegic children, the recognition of the activity limitation pattern and the possibility of grading its severity are relevant for clinicians while planning interventions, monitoring results, predicting outcomes. Aim of the study is to examine the reliability and validity of Besta Scale, an instrument used to measure in hemiplegic children from 18 months to 12 years of age both grasp on request (capacity) and spontaneous use of upper limb (performance) in bimanual play activities and in ADL. Psychometric analysis of reliability and of validity of the Besta scale was performed. Outpatient study sample Reliability study: A sample of 39 patients was enrolled. The administration of Besta scale was video-recorded in a standardized manner. All videos were scored by 20 independent raters on subsequent viewing. 3 raters randomly selected from the 20-raters group rescored the same video two years later for intra-rater reliability. Intra and inter-rater reliability were calculated using Intraclass Correlation Coefficient (ICC) and Kendall's coefficient (K), respectively. Internal consistency reliability was assessed using Alpha's Chronbach coefficient. Validity study: a sample of 105 children was assessed 5 times (at t0 and 2, 3, 6 and 12 months later) by 20 independent raters. Each patient underwent at the same time to QUEST and Besta scale administration and assessment. Criterion validity was calculated using rho-Pearson coefficient. Reliability study: The inter-rater reliability calculated with Kendall's coefficient resulted moderate K=0.47. The intra-rater (or test-retest) reliability for 3 raters was excellent (ICC=0.927). The Cronbach's alpha for internal consistency was 0.972. Validity study: Besta scale showed a good criterion validity compared to QUEST increasing by age and severity of impairment. Rho Pearson's correlation coefficient r was 0.81 (P<0.0001). Limitations. Besta scales in infants finds hard to distinguish between mild to moderately impaired hand function. Besta scale scoring system is a valid and reliable tool, utilizable in a clinical setting to monitor evolution of unimanual and bimanual manipulation and to distinguish hand's capacity from performance.
Moore, Amy Lawson; Miller, Terissa M
2018-01-01
The purpose of the current study is to evaluate the validity and reliability of the revised Gibson Test of Cognitive Skills, a computer-based battery of tests measuring short-term memory, long-term memory, processing speed, logic and reasoning, visual processing, as well as auditory processing and word attack skills. This study included 2,737 participants aged 5-85 years. A series of studies was conducted to examine the validity and reliability using the test performance of the entire norming group and several subgroups. The evaluation of the technical properties of the test battery included content validation by subject matter experts, item analysis and coefficient alpha, test-retest reliability, split-half reliability, and analysis of concurrent validity with the Woodcock Johnson III Tests of Cognitive Abilities and Tests of Achievement. Results indicated strong sources of evidence of validity and reliability for the test, including internal consistency reliability coefficients ranging from 0.87 to 0.98, test-retest reliability coefficients ranging from 0.69 to 0.91, split-half reliability coefficients ranging from 0.87 to 0.91, and concurrent validity coefficients ranging from 0.53 to 0.93. The Gibson Test of Cognitive Skills-2 is a reliable and valid tool for assessing cognition in the general population across the lifespan.
ERIC Educational Resources Information Center
Park, Bitnara Jasmine; Irvin, P. Shawn; Lai, Cheng-Fei; Alonzo, Julie; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the fifth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
ERIC Educational Resources Information Center
Lai, Cheng-Fei; Irvin, P. Shawn; Alonzo, Julie; Park, Bitnara Jasmine; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the second-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
ERIC Educational Resources Information Center
Park, Bitnara Jasmine; Irvin, P. Shawn; Alonzo, Julie; Lai, Cheng-Fei; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the fourth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
ERIC Educational Resources Information Center
Irvin, P. Shawn; Alonzo, Julie; Park, Bitnara Jasmine; Lai, Cheng-Fei; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the sixth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
ERIC Educational Resources Information Center
Irvin, P. Shawn; Alonzo, Julie; Lai, Cheng-Fei; Park, Bitnara Jasmine; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the seventh-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
ERIC Educational Resources Information Center
Lai, Cheng-Fei; Irvin, P. Shawn; Park, Bitnara Jasmine; Alonzo, Julie; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the third-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
The 747 primary flight control systems reliability and maintenance study
NASA Technical Reports Server (NTRS)
1979-01-01
The major operational characteristics of the 747 Primary Flight Control Systems (PFCS) are described. Results of reliability analysis for separate control functions are presented. The analysis makes use of a NASA computer program which calculates reliability of redundant systems. Costs for maintaining the 747 PFCS in airline service are assessed. The reliabilities and cost will provide a baseline for use in trade studies of future flight control system design.
How reliable are clinical systems in the UK NHS? A study of seven NHS organisations
Franklin, Bryony Dean; Moorthy, Krishna; Cooke, Matthew W; Vincent, Charles
2012-01-01
Background It is well known that many healthcare systems have poor reliability; however, the size and pervasiveness of this problem and its impact has not been systematically established in the UK. The authors studied four clinical systems: clinical information in surgical outpatient clinics, prescribing for hospital inpatients, equipment in theatres, and insertion of peripheral intravenous lines. The aim was to describe the nature, extent and variation in reliability of these four systems in a sample of UK hospitals, and to explore the reasons for poor reliability. Methods Seven UK hospital organisations were involved; each system was studied in three of these. The authors took delivery of the systems' intended outputs to be a proxy for the reliability of the system as a whole. For example, for clinical information, 100% reliability was defined as all patients having an agreed list of clinical information available when needed during their appointment. Systems factors were explored using semi-structured interviews with key informants. Common themes across the systems were identified. Results Overall reliability was found to be between 81% and 87% for the systems studied, with significant variation between organisations for some systems: clinical information in outpatient clinics ranged from 73% to 96%; prescribing for hospital inpatients 82–88%; equipment availability in theatres 63–88%; and availability of equipment for insertion of peripheral intravenous lines 80–88%. One in five reliability failures were associated with perceived threats to patient safety. Common factors causing poor reliability included lack of feedback, lack of standardisation, and issues such as access to information out of working hours. Conclusions Reported reliability was low for the four systems studied, with some common factors behind each. However, this hides significant variation between organisations for some processes, suggesting that some organisations have managed to create more reliable systems. Standardisation of processes would be expected to have significant benefit. PMID:22495099
Which symptom assessments and approaches are uniquely appropriate for paediatric concussion?
Gioia, G A; Schneider, J C; Vaughan, C G; Isquith, P K
2009-05-01
To (a) identify post-concussion symptom scales appropriate for children and adolescents in sports; (b) review evidence for reliability and validity; and (c) recommend future directions for scale development. Quantitative and qualitative literature review of symptom rating scales appropriate for children and adolescents aged 5 to 22 years. Literature identified via search of Medline, Ovid-Medline and PsycInfo databases; review of reference lists in identified articles; querying sports concussion specialists. 29 articles met study inclusion criteria. 5 symptom scales examined in 11 studies for ages 5-12 years and in 25 studies for ages 13-22. 10 of 11 studies for 5-12-year-olds presented validity evidence for three scales; 7 studies provided reliability evidence for two scales; 7 studies used serial administrations but no reliable change metrics. Two scales included parent-reports and one included a teacher report. 24 of 25 studies for 13-22 year-olds presented validity evidence for five measures; seven studies provided reliability evidence for four measures with 18 studies including serial administrations and two examining Reliable Change. Psychometric evidence for symptom scales is stronger for adolescents than for younger children. Most scales provide evidence of concurrent validity, discriminating concussed and non-concussed groups. Few report reliability and evidence for validity is narrow. Two measures include parent/teacher reports. Few scales examine reliable change statistics, limiting interpretability of temporal changes. Future studies are needed to fully define symptom scale psychometric properties with the greatest need in younger student-athletes.
Assessing the reliability of ecotoxicological studies: An overview of current needs and approaches.
Moermond, Caroline; Beasley, Amy; Breton, Roger; Junghans, Marion; Laskowski, Ryszard; Solomon, Keith; Zahner, Holly
2017-07-01
In general, reliable studies are well designed and well performed, and enough details on study design and performance are reported to assess the study. For hazard and risk assessment in various legal frameworks, many different types of ecotoxicity studies need to be evaluated for reliability. These studies vary in study design, methodology, quality, and level of detail reported (e.g., reviews, peer-reviewed research papers, or industry-sponsored studies documented under Good Laboratory Practice [GLP] guidelines). Regulators have the responsibility to make sound and verifiable decisions and should evaluate each study for reliability in accordance with scientific principles regardless of whether they were conducted in accordance with GLP and/or standardized methods. Thus, a systematic and transparent approach is needed to evaluate studies for reliability. In this paper, 8 different methods for reliability assessment were compared using a number of attributes: categorical versus numerical scoring methods, use of exclusion and critical criteria, weighting of criteria, whether methods are tested with case studies, domain of applicability, bias toward GLP studies, incorporation of standard guidelines in the evaluation method, number of criteria used, type of criteria considered, and availability of guidance material. Finally, some considerations are given on how to choose a suitable method for assessing reliability of ecotoxicity studies. Integr Environ Assess Manag 2017;13:640-651. © 2016 The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals, Inc. on behalf of Society of Environmental Toxicology & Chemistry (SETAC). © 2016 The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals, Inc. on behalf of Society of Environmental Toxicology & Chemistry (SETAC).
Reliability Generalization of the Alcohol Use Disorder Identification Test.
ERIC Educational Resources Information Center
Shields, Alan L.; Caruso, John C.
2002-01-01
Evaluated the reliability of scores from the Alcohol Use Disorders Identification Test (AUDIT; J. Sounders and others, 1993) in a reliability generalization study based on 17 empirical journal articles. Results show AUDIT scores to be generally reliable for basic assessment. (SLD)
Mani, Suresh; Sharma, Shobha; Omar, Baharudin; Paungmali, Aatit; Joseph, Leonard
2017-04-01
Purpose The purpose of this review is to systematically explore and summarise the validity and reliability of telerehabilitation (TR)-based physiotherapy assessment for musculoskeletal disorders. Method A comprehensive systematic literature review was conducted using a number of electronic databases: PubMed, EMBASE, PsycINFO, Cochrane Library and CINAHL, published between January 2000 and May 2015. The studies examined the validity, inter- and intra-rater reliabilities of TR-based physiotherapy assessment for musculoskeletal conditions were included. Two independent reviewers used the Quality Appraisal Tool for studies of diagnostic Reliability (QAREL) and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool to assess the methodological quality of reliability and validity studies respectively. Results A total of 898 hits were achieved, of which 11 articles based on inclusion criteria were reviewed. Nine studies explored the concurrent validity, inter- and intra-rater reliabilities, while two studies examined only the concurrent validity. Reviewed studies were moderate to good in methodological quality. The physiotherapy assessments such as pain, swelling, range of motion, muscle strength, balance, gait and functional assessment demonstrated good concurrent validity. However, the reported concurrent validity of lumbar spine posture, special orthopaedic tests, neurodynamic tests and scar assessments ranged from low to moderate. Conclusion TR-based physiotherapy assessment was technically feasible with overall good concurrent validity and excellent reliability, except for lumbar spine posture, orthopaedic special tests, neurodynamic testa and scar assessment.
Sandhu, Simrenjeet; Rudnisky, Chris; Arora, Sourabh; Kassam, Faazil; Douglas, Gordon; Edwards, Marianne C; Verstraten, Karin; Wong, Beatrice; Damji, Karim F
2018-03-01
Clinicians can feel confident compressed three-dimensional digital (3DD) and two-dimensional digital (2DD) imaging evaluating important features of glaucomatous disc damage is comparable to the previous gold standard of stereoscopic slide film photography, supporting the use of digital imaging for teleglaucoma applications. To compare the sensitivity and specificity of 3DD and 2DD photography with stereo slide film in detecting glaucomatous optic nerve head features. This prospective, multireader validation study imaged and compressed glaucomatous, suspicious or normal optic nerves using a ratio of 16:1 into 3DD and 2DD (1024×1280 pixels) and compared both to stereo slide film. The primary outcome was vertical cup-to-disc ratio (VCDR) and secondary outcomes, including disc haemorrhage and notching, were also evaluated. Each format was graded randomly by four glaucoma specialists. A protocol was implemented for harmonising data including consensus-based interpretation as needed. There were 192 eyes imaged with each format. The mean VCDR for slide, 3DD and 2DD was 0.59±0.20, 0.60±0.18 and 0.62±0.17, respectively. The agreement of VCDR for 3DD versus film was κ=0.781 and for 2DD versus film was κ=0.69. Sensitivity (95.2%), specificity (95.2%) and area under the curve (AUC; 0.953) of 3DD imaging to detect notching were better (p=0.03) than for 2DD (90.5%; 88.6%; AUC=0.895). Similarly, sensitivity (77.8%), specificity (98.9%) and AUC (0.883) of 3DD to detect disc haemorrhage were better (p=0.049) than for 2DD (44.4%; 99.5%; AUC=0.72). There was no difference between 3DD and 2DD imaging in detecting disc tilt (p=0.7), peripapillary atrophy (p=0.16), grey crescent (p=0.1) or pallor (p=0.43), although 3D detected sloping better (p=0.013). Both 3DD and 2DD imaging demonstrates excellent reproducibility in comparison to stereo slide film with experts evaluating VCDR, notching and disc haemorrhage. 3DD in this study was slightly more accurate than 2DD for evaluating disc haemorrhage, notching and sloping. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
How reliable are Functional Movement Screening scores? A systematic review of rater reliability.
Moran, Robert W; Schneiders, Anthony G; Major, Katherine M; Sullivan, S John
2016-05-01
Several physical assessment protocols to identify intrinsic risk factors for injury aetiology related to movement quality have been described. The Functional Movement Screen (FMS) is a standardised, field-expedient test battery intended to assess movement quality and has been used clinically in preparticipation screening and in sports injury research. To critically appraise and summarise research investigating the reliability of scores obtained using the FMS battery. Systematic literature review. Systematic search of Google Scholar, Scopus (including ScienceDirect and PubMed), EBSCO (including Academic Search Complete, AMED, CINAHL, Health Source: Nursing/Academic Edition), MEDLINE and SPORTDiscus. Studies meeting eligibility criteria were assessed by 2 reviewers for risk of bias using the Quality Appraisal of Reliability Studies checklist. Overall quality of evidence was determined using van Tulder's levels of evidence approach. 12 studies were appraised. Overall, there was a 'moderate' level of evidence in favour of 'acceptable' (intraclass correlation coefficient ≥0.6) inter-rater and intra-rater reliability for composite scores derived from live scoring. For inter-rater reliability of composite scores derived from video recordings there was 'conflicting' evidence, and 'limited' evidence for intra-rater reliability. For inter-rater reliability based on live scoring of individual subtests there was 'moderate' evidence of 'acceptable' reliability (κ≥0.4) for 4 subtests (Deep Squat, Shoulder Mobility, Active Straight-leg Raise, Trunk Stability Push-up) and 'conflicting' evidence for the remaining 3 (Hurdle Step, In-line Lunge, Rotary Stability). This review found 'moderate' evidence that raters can achieve acceptable levels of inter-rater and intra-rater reliability of composite FMS scores when using live ratings. Overall, there were few high-quality studies, and the quality of several studies was impacted by poor study reporting particularly in relation to rater blinding. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Fuller, Catherine J; Bladon, Bruce M; Driver, Adam J; Barr, Alistair R S
2006-03-01
The objective of this study was to assess the reliability of lameness scoring in horses. One veterinary surgeon examined nineteen lame horses on four occasions. Gait was recorded by camcorder, and scored from 0 to 10 ranging from sound to non-weight bearing lameness. A global score of overall change in lameness during the study was also determined for each horse. To measure intra-assessor reliability of the scoring systems, one veterinary surgeon scored videotapes of the horses' gaits on two occasions. To measure inter-assessor reliability, three veterinary surgeons viewed the videotapes, assigning individual lameness scores plus global scores to each horse. Reliability of individual lameness scoring was good intra-assessor, but only just within our acceptable limit inter-assessor. However, global scoring of change in lameness throughout the study was found to be reliable overall. Since clinician scoring is commonly used to assess lameness in horses, this is an important finding, fundamental to future clinical studies.
Reliability assessments in qualitative health promotion research.
Cook, Kay E
2012-03-01
This article contributes to the debate about the use of reliability assessments in qualitative research in general, and health promotion research in particular. In this article, I examine the use of reliability assessments in qualitative health promotion research in response to health promotion researchers' commonly held misconception that reliability assessments improve the rigor of qualitative research. All qualitative articles published in the journal Health Promotion International from 2003 to 2009 employing reliability assessments were examined. In total, 31.3% (20/64) articles employed some form of reliability assessment. The use of reliability assessments increased over the study period, ranging from <20% in 2003/2004 to 50% and above in 2008/2009, while at the same time the total number of qualitative articles decreased. The articles were then classified into four types of reliability assessments, including the verification of thematic codes, the use of inter-rater reliability statistics, congruence in team coding and congruence in coding across sites. The merits of each type were discussed, with the subsequent discussion focusing on the deductive nature of reliable thematic coding, the limited depth of immediately verifiable data and the usefulness of such studies to health promotion and the advancement of the qualitative paradigm.
Boonstra, Anne M; Schiphorst Preuper, Henrica R; Reneman, Michiel F; Posthumus, Jitze B; Stewart, Roy E
2008-06-01
To determine the reliability and concurrent validity of a visual analogue scale (VAS) for disability as a single-item instrument measuring disability in chronic pain patients was the objective of the study. For the reliability study a test-retest design and for the validity study a cross-sectional design was used. A general rehabilitation centre and a university rehabilitation centre was the setting for the study. The study population consisted of patients over 18 years of age, suffering from chronic musculoskeletal pain; 52 patients in the reliability study, 344 patients in the validity study. Main outcome measures were as follows. Reliability study: Spearman's correlation coefficients (rho values) of the test and retest data of the VAS for disability; validity study: rho values of the VAS disability scores with the scores on four domains of the Short-Form Health Survey (SF-36) and VAS pain scores, and with Roland-Morris Disability Questionnaire scores in chronic low back pain patients. Results were as follows: in the reliability study rho values varied from 0.60 to 0.77; and in the validity study rho values of VAS disability scores with SF-36 domain scores varied from 0.16 to 0.51, with Roland-Morris Disability Questionnaire scores from 0.38 to 0.43 and with VAS pain scores from 0.76 to 0.84. The conclusion of the study was that the reliability of the VAS for disability is moderate to good. Because of a weak correlation with other disability instruments and a strong correlation with the VAS for pain, however, its validity is questionable.
Gilkison, C R; Fenton, M V; Lester, J W
1992-05-01
This study was designed to establish the reliability of a health history questionnaire used as a screening tool for incoming university students. The authors used a test-retest design, with a test interval of 6 months, on a sample of medical and nursing students. The analysis focused on overall reliability of the questionnaire and reproducibility of specific items, based on question format. Questionnaire items of specific interest were those with dichotomous yes/no response options versus open-ended format questions, those using the words frequently or recently, or those that asked multiple questions. Demographic characteristics of the subjects were considered in the evaluation of reliability. Overall reliability of the questionnaire (93.6%) was above the anticipated level of 90%, and subject sex or program of study did not show any significant differences in reproducibility of responses. Although wording of questions did not affect item reliability, dichotomous format questions demonstrated a higher degree of reliability (96.4%) than the overall reliability of the questionnaire. Recommendations for enhancing the reliability of the questionnaire are based on item analysis and information gathered from interviews with subjects.
VFS interjudge reliability using a free and directed search.
Bryant, Karen N; Finnegan, Eileen; Berbaum, Kevin
2012-03-01
Reports in the literature suggest that clinicians demonstrate poor reliability in rating videofluoroscopic swallow (VFS) variables. Contemporary perception theories suggest that the methods used in VFS reliability studies constrain subjects to make judgments in an abnormal way. The purpose of this study was to determine whether a directed search or a free search approach to rating swallow studies results in better interjudge reliability. Ten speech pathologists served as judges. Five clinical judges were assigned to the directed search group (use checklist) and five to the free search group (unguided observations). Clinical judges interpreted 20 VFS examinations of swallowing. Interjudge reliability of ratings of dysphagia severity, affected stage of swallow, dysphagia symptoms, and attributes identified by clinical judges using a directed search was compared with that using a free search approach. Interjudge reliability for rating the presence of aspiration and penetration was significantly better using a free search ("substantial" to "almost perfect" agreement) compared to a directed search ("moderate" agreement). Reliability of dysphagia severity ratings ranged from "moderate" to "almost perfect" agreement for both methods of search. Reliability for reporting all other symptoms and attributes of dysphagia was variable and was not significantly different between the groups.
Theory of reliable systems. [systems analysis and design
NASA Technical Reports Server (NTRS)
Meyer, J. F.
1973-01-01
The analysis and design of reliable systems are discussed. The attributes of system reliability studied are fault tolerance, diagnosability, and reconfigurability. Objectives of the study include: to determine properties of system structure that are conducive to a particular attribute; to determine methods for obtaining reliable realizations of a given system; and to determine how properties of system behavior relate to the complexity of fault tolerant realizations. A list of 34 references is included.
A Monte Carlo Simulation Study of the Reliability of Intraindividual Variability
Estabrook, Ryne; Grimm, Kevin J.; Bowles, Ryan P.
2012-01-01
Recent research has seen intraindividual variability (IIV) become a useful technique to incorporate trial-to-trial variability into many types of psychological studies. IIV as measured by individual standard deviations (ISDs) has shown unique prediction to several types of positive and negative outcomes (Ram, Rabbit, Stollery, & Nesselroade, 2005). One unanswered question regarding measuring intraindividual variability is its reliability and the conditions under which optimal reliability is achieved. Monte Carlo simulation studies were conducted to determine the reliability of the ISD compared to the intraindividual mean. The results indicate that ISDs generally have poor reliability and are sensitive to insufficient measurement occasions, poor test reliability, and unfavorable amounts and distributions of variability in the population. Secondary analysis of psychological data shows that use of individual standard deviations in unfavorable conditions leads to a marked reduction in statistical power, although careful adherence to underlying statistical assumptions allows their use as a basic research tool. PMID:22268793
Reliability reporting across studies using the Buss Durkee Hostility Inventory.
Vassar, Matt; Hale, William
2009-01-01
Empirical research on anger and hostility has pervaded the academic literature for more than 50 years. Accurate measurement of anger/hostility and subsequent interpretation of results requires that the instruments yield strong psychometric properties. For consistent measurement, reliability estimates must be calculated with each administration, because changes in sample characteristics may alter the scale's ability to generate reliable scores. Therefore, the present study was designed to address reliability reporting practices for a widely used anger assessment, the Buss Durkee Hostility Inventory (BDHI). Of the 250 published articles reviewed, 11.2% calculated and presented reliability estimates for the data at hand, 6.8% cited estimates from a previous study, and 77.1% made no mention of score reliability. Mean alpha estimates of scores for BDHI subscales generally fell below acceptable standards. Additionally, no detectable pattern was found between reporting practices and publication year or journal prestige. Areas for future research are also discussed.
Towards early software reliability prediction for computer forensic tools (case study).
Abu Talib, Manar
2016-01-01
Versatility, flexibility and robustness are essential requirements for software forensic tools. Researchers and practitioners need to put more effort into assessing this type of tool. A Markov model is a robust means for analyzing and anticipating the functioning of an advanced component based system. It is used, for instance, to analyze the reliability of the state machines of real time reactive systems. This research extends the architecture-based software reliability prediction model for computer forensic tools, which is based on Markov chains and COSMIC-FFP. Basically, every part of the computer forensic tool is linked to a discrete time Markov chain. If this can be done, then a probabilistic analysis by Markov chains can be performed to analyze the reliability of the components and of the whole tool. The purposes of the proposed reliability assessment method are to evaluate the tool's reliability in the early phases of its development, to improve the reliability assessment process for large computer forensic tools over time, and to compare alternative tool designs. The reliability analysis can assist designers in choosing the most reliable topology for the components, which can maximize the reliability of the tool and meet the expected reliability level specified by the end-user. The approach of assessing component-based tool reliability in the COSMIC-FFP context is illustrated with the Forensic Toolkit Imager case study.
A Note on the Score Reliability for the Satisfaction with Life Scale: An RG Study
ERIC Educational Resources Information Center
Vassar, Matt
2008-01-01
The purpose of the present study was to meta-analytically investigate the score reliability for the Satisfaction With Life Scale. Four-hundred and sixteen articles using the measure were located through electronic database searches and then separated to identify studies which had calculated reliability estimates from their own data. Sixty-two…
Test-Retest Reliability of Pediatric Heart Rate Variability: A Meta-Analysis.
Weiner, Oren M; McGrath, Jennifer J
2017-01-01
Heart rate variability (HRV), an established index of autonomic cardiovascular modulation, is associated with health outcomes (e.g., obesity, diabetes) and mortality risk. Time- and frequency-domain HRV measures are commonly reported in longitudinal adult and pediatric studies of health. While test-retest reliability has been established among adults, less is known about the psychometric properties of HRV among infants, children, and adolescents. The objective was to conduct a meta-analysis of the test-retest reliability of time- and frequency-domain HRV measures from infancy to adolescence. Electronic searches (PubMed, PsycINFO; January 1970-December 2014) identified studies with nonclinical samples aged ≤ 18 years; ≥ 2 baseline HRV recordings separated by ≥ 1 day; and sufficient data for effect size computation. Forty-nine studies ( N = 5,170) met inclusion criteria. Methodological variables coded included factors relevant to study protocol, sample characteristics, electrocardiogram (ECG) signal acquisition and preprocessing, and HRV analytical decisions. Fisher's Z was derived as the common effect size. Analyses were age-stratified (infant/toddler < 5 years, n = 3,329; child/adolescent 5-18 years, n = 1,841) due to marked methodological differences across the pediatric literature. Meta-analytic results revealed HRV demonstrated moderate reliability; child/adolescent studies ( Z = 0.62, r = 0.55) had significantly higher reliability than infant/toddler studies ( Z = 0.42, r = 0.40). Relative to other reported measures, HF exhibited the highest reliability among infant/toddler studies ( Z = 0.42, r = 0.40), while rMSSD exhibited the highest reliability among child/adolescent studies ( Z = 1.00, r = 0.76). Moderator analyses indicated greater reliability with shorter test-retest interval length, reported exclusion criteria based on medical illness/condition, lower proportion of males, prerecording acclimatization period, and longer recording duration; differences were noted across age groups. HRV is reliable among pediatric samples. Reliability is sensitive to pertinent methodological decisions that require careful consideration by the researcher. Limited methodological reporting precluded several a priori moderator analyses. Suggestions for future research, including standards specified by Task Force Guidelines, are discussed.
Test-Retest Reliability of Pediatric Heart Rate Variability
Weiner, Oren M.; McGrath, Jennifer J.
2017-01-01
Heart rate variability (HRV), an established index of autonomic cardiovascular modulation, is associated with health outcomes (e.g., obesity, diabetes) and mortality risk. Time- and frequency-domain HRV measures are commonly reported in longitudinal adult and pediatric studies of health. While test-retest reliability has been established among adults, less is known about the psychometric properties of HRV among infants, children, and adolescents. The objective was to conduct a meta-analysis of the test-retest reliability of time- and frequency-domain HRV measures from infancy to adolescence. Electronic searches (PubMed, PsycINFO; January 1970–December 2014) identified studies with nonclinical samples aged ≤ 18 years; ≥ 2 baseline HRV recordings separated by ≥ 1 day; and sufficient data for effect size computation. Forty-nine studies (N = 5,170) met inclusion criteria. Methodological variables coded included factors relevant to study protocol, sample characteristics, electrocardiogram (ECG) signal acquisition and preprocessing, and HRV analytical decisions. Fisher’s Z was derived as the common effect size. Analyses were age-stratified (infant/toddler < 5 years, n = 3,329; child/adolescent 5–18 years, n = 1,841) due to marked methodological differences across the pediatric literature. Meta-analytic results revealed HRV demonstrated moderate reliability; child/adolescent studies (Z = 0.62, r = 0.55) had significantly higher reliability than infant/toddler studies (Z = 0.42, r = 0.40). Relative to other reported measures, HF exhibited the highest reliability among infant/toddler studies (Z = 0.42, r = 0.40), while rMSSD exhibited the highest reliability among child/adolescent studies (Z = 1.00, r = 0.76). Moderator analyses indicated greater reliability with shorter test-retest interval length, reported exclusion criteria based on medical illness/condition, lower proportion of males, prerecording acclimatization period, and longer recording duration; differences were noted across age groups. HRV is reliable among pediatric samples. Reliability is sensitive to pertinent methodological decisions that require careful consideration by the researcher. Limited methodological reporting precluded several a priori moderator analyses. Suggestions for future research, including standards specified by Task Force Guidelines, are discussed. PMID:29307951
Integrating Reliability Analysis with a Performance Tool
NASA Technical Reports Server (NTRS)
Nicol, David M.; Palumbo, Daniel L.; Ulrey, Michael
1995-01-01
A large number of commercial simulation tools support performance oriented studies of complex computer and communication systems. Reliability of these systems, when desired, must be obtained by remodeling the system in a different tool. This has obvious drawbacks: (1) substantial extra effort is required to create the reliability model; (2) through modeling error the reliability model may not reflect precisely the same system as the performance model; (3) as the performance model evolves one must continuously reevaluate the validity of assumptions made in that model. In this paper we describe an approach, and a tool that implements this approach, for integrating a reliability analysis engine into a production quality simulation based performance modeling tool, and for modeling within such an integrated tool. The integrated tool allows one to use the same modeling formalisms to conduct both performance and reliability studies. We describe how the reliability analysis engine is integrated into the performance tool, describe the extensions made to the performance tool to support the reliability analysis, and consider the tool's performance.
Score Reliability of Adolescent Alcohol Screening Measures: A Meta-Analytic Inquiry
ERIC Educational Resources Information Center
Shields, Alan L.; Campfield, Delia C.; Miller, Christopher S.; Howell, Ryan T.; Wallace, Kimberly; Weiss, Roger D.
2008-01-01
This study describes the reliability reporting practices in empirical studies using eight adolescent alcohol screening tools and characterizes and explores variability in internal consistency estimates across samples. Of 119 observed administrations of these instruments, 40 (34%) reported usable reliability information. The Personal Experience…
Retest Reliability of the Rosenzweig Picture-Frustration Study and Similar Semiprojective Techniques
ERIC Educational Resources Information Center
Rosenzweig, Saul; And Others
1975-01-01
The research dealing with the reliability of the Rosenzweig Picture-Frustration Study is surveyed. Analysis of various split-half, and retest procedures are reviewed and their relative effectiveness evaluated. Reliability measures as applied to projective techniques in general are discussed. (Author/DEP)
Lee, Chin-Pang; Chiu, Yu-Wen; Chu, Chun-Lin; Chen, Yu; Jiang, Kun-Hao; Chen, Jiun-Liang; Chen, Ching-Yen
2016-12-01
The aging males' symptoms (AMS) scale is an instrument used to determine the health-related quality of life in adult and elderly men. The purpose of this study was to synthesize internal consistency (Cronbach's alpha) and test-retest reliability for the AMS scale and its three subscales. Of the 123 studies reviewed, 12 provided alpha coefficients which were then used in the meta-analyses of internal consistency. Seven of the 12 included studies provided test-retest coefficients, and these were used in the meta-analyses of test-retest reliability. The AMS scale had excellent internal consistency [α = 0.89 (95% CI 0.88-0.90)]; the mean alpha estimates across the AMS subscales ranged from 0.79 to 0.82. The AMS scale also had good test-retest reliability [r = 0.85 (95% CI 0.82-0.88]; the test-retest reliability coefficients of the AMS subscales ranged from 0.76 to 0.83. There was significant heterogeneity among the included studies. The AMS scale and the three subscales had fairly good internal consistency and test-retest reliability. Future psychometric studies of the AMS scale should report important characteristics of the participants, details of item scores, and test-retest reliability.
Validity and Reliability of Turkish Male Breast Self-Examination Instrument.
Erkin, Özüm; Göl, İlknur
2018-04-01
This study aims to measure the validity and reliability of Turkish male breast self-examination (MBSE) instrument. The methodological study was performed in 2016 at Ege University, Faculty of Nursing, İzmir, Turkey. The MBSE includes ten steps. For validity studies, face validity, content validity, and construct validity (exploratory factor analysis) were done. For reliability study, Kuder Richardson was calculated. The content validity index was found to be 0.94. Kendall W coefficient was 0.80 (p=0.551). The total variance explained by the two factors was found to be 63.24%. Kuder Richardson 21 was done for reliability study and found to be 0.97 for the instrument. The final instrument included 10 steps and two stages. The Turkish version of MBSE is a valid and reliable instrument for early diagnose. The MBSE can be used in Turkish speaking countries and cultures with two stages and 10 steps.
Apeldoorn, Adri T.; van Helvoirt, Hans; Ostelo, Raymond W.; Meihuizen, Hanneke; Kamper, Steven J.; van Tulder, Maurits W.; de Vet, Henrica C. W.
2016-01-01
Study design Observational inter-rater reliability study. Objectives To examine: (1) the inter-rater reliability of a modified version of Delitto et al.’s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3) the inter-rater reliability of algorithm decisions in patients who clearly fit into a subgroup (clear classifications) and those who do not (unclear classifications). Methods Patients were examined twice on the same day by two of three participating physical therapists with different levels of familiarity with the system. Patients were classified into one of four classification groups. Raters were blind to the others’ classification decision. In order to quantify the inter-rater reliability, percentages of agreement and Cohen’s Kappa were calculated. Results A total of 36 patients were included (clear classification n = 23; unclear classification n = 13). The overall rate of agreement was 53% and the Kappa value was 0·34 [95% confidence interval (CI): 0·11–0·57], which indicated only fair inter-rater reliability. Inter-rater reliability for patients with a clear classification (agreement 52%, Kappa value 0·29) was not higher than for patients with an unclear classification (agreement 54%, Kappa value 0·33). Familiarity with the system (i.e. trained with written instructions and previous research experience with the algorithm) did not improve the inter-rater reliability. Conclusion Our pilot study challenges the inter-rater reliability of the classification procedure in clinical practice. Therefore, more knowledge is needed about factors that affect the inter-rater reliability, in order to improve the clinical applicability of the classification scheme. PMID:27559279
Understanding a Widely Misunderstood Statistic: Cronbach's "Alpha"
ERIC Educational Resources Information Center
Ritter, Nicola L.
2010-01-01
It is important to explore score reliability in virtually all studies, because tests are not reliable. The present paper explains the most frequently used reliability estimate, coefficient alpha, so that the coefficient's conceptual underpinnings will be understood. Researchers need to understand score reliability because of the possible impact…
A Cross-Cultural Study of the Reliability of the Coopersmith Self-Esteem Inventory.
ERIC Educational Resources Information Center
Diaz, Joseph O.
1984-01-01
The purpose of this study was to determine the reliability of the Spanish translation of the Coopersmith Self-Esteem Inventory with a group of Puerto Rican students on the island and another on the mainlands. It was found to be reliable for both groups. (Author/BW)
Federal Register 2010, 2011, 2012, 2013, 2014
2013-01-28
...: The Sacramento River Water Reliability Study (SRWRS) was a water supply plan consistent with the Water... supplies to meet growing water supply demands and reliability objectives in their respective service areas.../Environmental Impact Report on the Sacramento River Water Reliability Study, California AGENCY: Bureau of...
An Investigation of the Impact of Guessing on Coefficient α and Reliability
2014-01-01
Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
Milanović, Zoran; Pantelić, Saša; Trajković, Nebojša; Jorgić, Bojan; Sporiš, Goran; Bratić, Milovan
2014-01-01
The purpose of this study was to determine the test-retest reliability of the International Physical Activity Questionnaire (IPAQ) for older adults in Serbia. Six hundred and sixty older adults (352 men, 53%; 308 women, 47%; mean age 67.65±5.76 years) participated in the study. To examine test-retest reliability, the participants were asked to complete the IPAQ on two occasions 2 weeks apart. Moderate reliability was observed between the repeated IPAQ, with intraclass correlation coefficients ranging from 0.53 to 0.91. The least reliability was established in leisure time activity (0.53) and the most reliability in the transport domain (0.91). Men and women had similar intraclass correlation coefficients for total physical activity (0.71 versus 0.74, respectively), while the biggest difference was obtained for housework in men (0.68) and in women (0.90). Our study shows that the long version of the IPAQ is a reliable instrument for assessing physical activity levels in older adults and that it may be useful for generating internationally comparable data.
Critically re-evaluating a common technique: Accuracy, reliability, and confirmation bias of EMG.
Narayanaswami, Pushpa; Geisbush, Thomas; Jones, Lyell; Weiss, Michael; Mozaffar, Tahseen; Gronseth, Gary; Rutkove, Seward B
2016-01-19
(1) To assess the diagnostic accuracy of EMG in radiculopathy. (2) To evaluate the intrarater reliability and interrater reliability of EMG in radiculopathy. (3) To assess the presence of confirmation bias in EMG. Three experienced academic electromyographers interpreted 3 compact discs with 20 EMG videos (10 normal, 10 radiculopathy) in a blinded, standardized fashion without information regarding the nature of the study. The EMGs were interpreted 3 times (discs A, B, C) 1 month apart. Clinical information was provided only with disc C. Intrarater reliability was calculated by comparing interpretations in discs A and B, interrater reliability by comparing interpretation between reviewers. Confirmation bias was estimated by the difference in correct interpretations when clinical information was provided. Sensitivity was similar to previous reports (77%, confidence interval [CI] 63%-90%); specificity was 71%, CI 56%-85%. Intrarater reliability was good (κ 0.61, 95% CI 0.41-0.81); interrater reliability was lower (κ 0.53, CI 0.35-0.71). There was no substantial confirmation bias when clinical information was provided (absolute difference in correct responses 2.2%, CI -13.3% to 17.7%); the study lacked precision to exclude moderate confirmation bias. This study supports that (1) serial EMG studies should be performed by the same electromyographer since intrarater reliability is better than interrater reliability; (2) knowledge of clinical information does not bias EMG interpretation substantially; (3) EMG has moderate diagnostic accuracy for radiculopathy with modest specificity and electromyographers should exercise caution interpreting mild abnormalities. This study provides Class III evidence that EMG has moderate diagnostic accuracy and specificity for radiculopathy. © 2015 American Academy of Neurology.
The reliability of knee joint position testing using electrogoniometry
Piriyaprasarth, Pagamas; Morris, Meg E; Winter, Adele; Bialocerkowski, Andrea E
2008-01-01
Background The current investigation examined the inter- and intra-tester reliability of knee joint angle measurements using a flexible Penny and Giles Biometric® electrogoniometer. The clinical utility of electrogoniometry was also addressed. Methods The first study examined the inter- and intra-tester reliability of measurements of knee joint angles in supine, sitting and standing in 35 healthy adults. The second study evaluated inter-tester and intra-tester reliability of knee joint angle measurements in standing and after walking 10 metres in 20 healthy adults, using an enhanced measurement protocol with a more detailed electrogoniometer attachment procedure. Both inter-tester reliability studies involved two testers. Results In the first study, inter-tester reliability (ICC[2,10]) ranged from 0.58–0.71 in supine, 0.68–0.79 in sitting and 0.57–0.80 in standing. The standard error of measurement between testers was less than 3.55° and the limits of agreement ranged from -12.51° to 12.21°. Reliability coefficients for intra-tester reliability (ICC[3,10]) ranged from 0.75–0.76 in supine, 0.86–0.87 in sitting and 0.87–0.88 in standing. The standard error of measurement for repeated measures by the same tester was less than 1.7° and the limits of agreement ranged from -8.13° to 7.90°. The second study showed that using a more detailed electrogoniometer attachment protocol reduced the error of measurement between testers to 0.5°. Conclusion Using a standardised protocol, reliable measures of knee joint angles can be gained in standing, supine and sitting by using a flexible goniometer. PMID:18211714
Wang, X; Jiao, Y; Tang, T; Wang, H; Lu, Z
2013-12-19
Intrinsic connectivity networks (ICNs) are composed of spatial components and time courses. The spatial components of ICNs were discovered with moderate-to-high reliability. So far as we know, few studies focused on the reliability of the temporal patterns for ICNs based their individual time courses. The goals of this study were twofold: to investigate the test-retest reliability of temporal patterns for ICNs, and to analyze these informative univariate metrics. Additionally, a correlation analysis was performed to enhance interpretability. Our study included three datasets: (a) short- and long-term scans, (b) multi-band echo-planar imaging (mEPI), and (c) eyes open or closed. Using dual regression, we obtained the time courses of ICNs for each subject. To produce temporal patterns for ICNs, we applied two categories of univariate metrics: network-wise complexity and network-wise low-frequency oscillation. Furthermore, we validated the test-retest reliability for each metric. The network-wise temporal patterns for most ICNs (especially for default mode network, DMN) exhibited moderate-to-high reliability and reproducibility under different scan conditions. Network-wise complexity for DMN exhibited fair reliability (ICC<0.5) based on eyes-closed sessions. Specially, our results supported that mEPI could be a useful method with high reliability and reproducibility. In addition, these temporal patterns were with physiological meanings, and certain temporal patterns were correlated to the node strength of the corresponding ICN. Overall, network-wise temporal patterns of ICNs were reliable and informative and could be complementary to spatial patterns of ICNs for further study. Copyright © 2013 IBRO. Published by Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Strauss, Gregory P.; Allen, Daniel N.; Jorgensen, Melinda L.; Cramer, Stacey L.
2005-01-01
Previous studies have examined the reliability of scores derived from various Stroop tasks. However, few studies have compared reliability of more recently developed Stroop variants such as emotional Stroop tasks to standard versions of the Stroop. The current study developed four different single-stimulus Stroop tasks and compared test-retest…
Cognitive Decline in Down Syndrome: A Validity/Reliability Study of the Test for Severe Impairment.
ERIC Educational Resources Information Center
Cosgrave, Mary P.; McCarron, Mary; Anderson, Mary; Tyrrell, Janette; Gill, Michael; Lawlor, Brian A.
1998-01-01
The utility of the Test for Severe Impairment was studied with 60 older persons who had Down Syndrome. Construct validity, test-retest reliability, and interrater reliability were established for the full study group and for subgroups based on degree of mental retardation and dementia status. Some possible applications and limitations of the test…
ERIC Educational Resources Information Center
Boonstra, Anne M.; Reneman, Michiel F.; Stewart, Roy E.; Balk, Gerlof A.
2012-01-01
The aim of this study was to determine the reliability and discriminant validity of the Dutch version of the life satisfaction questionnaire (Lisat-9 DV) to assess patients with an acquired brain injury. The reliability study used a test-retest design, and the validity study used a cross-sectional design. The setting was the general rehabilitation…
Validity and Reliability of the Academic Resilience Scale in Turkish High School
ERIC Educational Resources Information Center
Kapikiran, Sahin
2012-01-01
The present study aims to determine the validity and reliability of the academic resilience scale in Turkish high school. The participances of the study includes 378 high school students in total (192 female and 186 male). A set of analyses were conducted in order to determine the validity and reliability of the study. Firstly, both exploratory…
Cuchna, Jennifer W; Hoch, Matthew C; Hoch, Johanna M
2016-05-01
To synthesize the literature and perform a meta-analysis for both the interrater and intrarater reliability of the FMS™. Academic Search Complete, CINAHL, Medline and SportsDiscus databases were systematically searched from inception to March 2015. Studies were included if the primary purpose was to determine the interrater or intrarater reliability of the FMS™, assessed and scored all 7-items using the standard scoring criteria, provided a composite score and employed intraclass correlation coefficients (ICCs). Studies were excluded if reliability was not the primary aim, participants were injured at data collection, or a modified FMS™ or scoring system was utilized. Seven papers were included; 6 assessing interrater and 6 assessing intrarater reliability. There was moderate evidence in good interrater reliability with a summary ICC of 0.843 (95% CI = 0.640, 0.936; Q7 = 84.915, p < 0.0001). There was moderate evidence in good intrarater reliability with a summary ICC of 0.869 (95% CI = 0.785, 0.921; Q12 = 60.763, p < 0.0001). There was moderate evidence for both forms of reliability. The sensitivity assessments revealed this interpretation is stable and not influenced by any one study. Overall, the FMS™ is a reliable tool for clinical practice. Copyright © 2015 Elsevier Ltd. All rights reserved.
Hanson, Lisa C; Taylor, Nicholas F; McBurney, Helen
2016-09-01
To determine the retest reliability of the 10m incremental shuttle walk test (ISWT) in a mixed cardiac rehabilitation population. Participants completed two 10m ISWTs in a single session in a repeated measures study. Ten participants completed a third 10m ISWT as part of a pilot study. Hospital physiotherapy department. 62 adults aged a mean of 68 years (SD 10) referred to a cardiac rehabilitation program. Retest reliability of the 10m ISWT expressed as relative reliability and measurement error. Relative reliability was expressed in a ratio in the form of an intraclass correlation coefficient (ICC) and measurement error in the form of the standard error of measurement (SEM) and 95% confidence intervals for the group and individual. There was a high level of relative reliability over the two walks with an ICC of .99. The SEMagreement was 17m, and a change of at least 23m for the group and 54m for the individual would be required to be 95% confident of exceeding measurement error. The 10m ISWT demonstrated good retest reliability and is sufficiently reliable to be applied in practice in this population without the use of a practice test. Copyright © 2015 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
Test-retest and between-site reliability in a multicenter fMRI study.
Friedman, Lee; Stern, Hal; Brown, Gregory G; Mathalon, Daniel H; Turner, Jessica; Glover, Gary H; Gollub, Randy L; Lauriello, John; Lim, Kelvin O; Cannon, Tyrone; Greve, Douglas N; Bockholt, Henry Jeremy; Belger, Aysenil; Mueller, Bryon; Doty, Michael J; He, Jianchun; Wells, William; Smyth, Padhraic; Pieper, Steve; Kim, Seyoung; Kubicki, Marek; Vangel, Mark; Potkin, Steven G
2008-08-01
In the present report, estimates of test-retest and between-site reliability of fMRI assessments were produced in the context of a multicenter fMRI reliability study (FBIRN Phase 1, www.nbirn.net). Five subjects were scanned on 10 MRI scanners on two occasions. The fMRI task was a simple block design sensorimotor task. The impulse response functions to the stimulation block were derived using an FIR-deconvolution analysis with FMRISTAT. Six functionally-derived ROIs covering the visual, auditory and motor cortices, created from a prior analysis, were used. Two dependent variables were compared: percent signal change and contrast-to-noise-ratio. Reliability was assessed with intraclass correlation coefficients derived from a variance components analysis. Test-retest reliability was high, but initially, between-site reliability was low, indicating a strong contribution from site and site-by-subject variance. However, a number of factors that can markedly improve between-site reliability were uncovered, including increasing the size of the ROIs, adjusting for smoothness differences, and inclusion of additional runs. By employing multiple steps, between-site reliability for 3T scanners was increased by 123%. Dropping one site at a time and assessing reliability can be a useful method of assessing the sensitivity of the results to particular sites. These findings should provide guidance toothers on the best practices for future multicenter studies.
Partnering to Establish and Study Simulation in International Nursing Education.
Garner, Shelby L; Killingsworth, Erin; Raj, Leena
The purpose of this article was to describe an international partnership to establish and study simulation in India. A pilot study was performed to determine interrater reliability among faculty new to simulation when evaluating nursing student competency performance. Interrater reliability was below the ideal agreement level. Findings in this study underscore the need to obtain baseline interrater reliability data before integrating competency evaluation into a simulation program.
Reliability Generalization of Scores on the Spielberger State-Trait Anxiety Inventory.
ERIC Educational Resources Information Center
Barnes, Laura L. B.; Harp, Diane; Jung, Woo Sik
2002-01-01
Conducted a reliability generalization study for the State-Trait Anxiety Inventory (C. Spielberger, 1983) by reviewing and classifying 816 research articles. Average reliability coefficients were acceptable for both internal consistency and test-retest reliability, but variation was present among the estimates. Other differences are discussed.…
Meta-Analysis of Scale Reliability Using Latent Variable Modeling
ERIC Educational Resources Information Center
Raykov, Tenko; Marcoulides, George A.
2013-01-01
A latent variable modeling approach is outlined that can be used for meta-analysis of reliability coefficients of multicomponent measuring instruments. Important limitations of efforts to combine composite reliability findings across multiple studies are initially pointed out. A reliability synthesis procedure is discussed that is based on…
Software reliability models for critical applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, H.; Pham, M.
This report presents the results of the first phase of the ongoing EG G Idaho, Inc. Software Reliability Research Program. The program is studying the existing software reliability models and proposes a state-of-the-art software reliability model that is relevant to the nuclear reactor control environment. This report consists of three parts: (1) summaries of the literature review of existing software reliability and fault tolerant software reliability models and their related issues, (2) proposed technique for software reliability enhancement, and (3) general discussion and future research. The development of this proposed state-of-the-art software reliability model will be performed in the secondmore » place. 407 refs., 4 figs., 2 tabs.« less
Software reliability models for critical applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, H.; Pham, M.
This report presents the results of the first phase of the ongoing EG&G Idaho, Inc. Software Reliability Research Program. The program is studying the existing software reliability models and proposes a state-of-the-art software reliability model that is relevant to the nuclear reactor control environment. This report consists of three parts: (1) summaries of the literature review of existing software reliability and fault tolerant software reliability models and their related issues, (2) proposed technique for software reliability enhancement, and (3) general discussion and future research. The development of this proposed state-of-the-art software reliability model will be performed in the second place.more » 407 refs., 4 figs., 2 tabs.« less
Reliability of Computerized Neurocognitive Tests for Concussion Assessment: A Meta-Analysis.
Farnsworth, James L; Dargo, Lucas; Ragan, Brian G; Kang, Minsoo
2017-09-01
Although widely used, computerized neurocognitive tests (CNTs) have been criticized because of low reliability and poor sensitivity. A systematic review was published summarizing the reliability of Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) scores; however, this was limited to a single CNT. Expansion of the previous review to include additional CNTs and a meta-analysis is needed. Therefore, our purpose was to analyze reliability data for CNTs using meta-analysis and examine moderating factors that may influence reliability. A systematic literature search (key terms: reliability, computerized neurocognitive test, concussion) of electronic databases (MEDLINE, PubMed, Google Scholar, and SPORTDiscus) was conducted to identify relevant studies. Studies were included if they met all of the following criteria: used a test-retest design, involved at least 1 CNT, provided sufficient statistical data to allow for effect-size calculation, and were published in English. Two independent reviewers investigated each article to assess inclusion criteria. Eighteen studies involving 2674 participants were retained. Intraclass correlation coefficients were extracted to calculate effect sizes and determine overall reliability. The Fisher Z transformation adjusted for sampling error associated with averaging correlations. Moderator analyses were conducted to evaluate the effects of the length of the test-retest interval, intraclass correlation coefficient model selection, participant demographics, and study design on reliability. Heterogeneity was evaluated using the Cochran Q statistic. The proportion of acceptable outcomes was greatest for the Axon Sports CogState Test (75%) and lowest for the ImPACT (25%). Moderator analyses indicated that the type of intraclass correlation coefficient model used significantly influenced effect-size estimates, accounting for 17% of the variation in reliability. The Axon Sports CogState Test, which has a higher proportion of acceptable outcomes and shorter test duration relative to other CNTs, may be a reliable option; however, future studies are needed to compare the diagnostic accuracy of these instruments.
An interrater reliability study of the Braden scale in two nursing homes.
Kottner, Jan; Dassen, Theo
2008-10-01
Adequate risk assessment is essential in pressure ulcer prevention. Assessment scales were designed to support practitioners in identifying persons at pressure ulcer risk. The Braden scale is one of the most extensively studied risk assessment instruments, although the majority of studies focused on validity rather than reliability. The first aim was to measure the interrater reliability of the Braden scale and its individual items. The second aim was to study different statistical approaches regarding interrater reliability estimation. An interrater reliability study was conducted in two German nursing homes. Residents (n = 152) from 8 units were assessed twice. The raters were trained nurses with a work experience ranging from 0.5 to 30 years. Data were analysed using an overall percentage of agreement, weighted and unweighted kappa and the intraclass correlation coefficient. Differences between nurses rating the overall Braden score ranged from 0 up to 9 points. Interrater reliability expressed by the intraclass correlation coefficient ranged from 0.73 (95% CI 0.26 - 0.91) to 0.95 (95% CI 0.87 - 0.98). Calculated intraclass correlation coefficients for individual items ranged from 0.06 (95% CI -0.31 to 0.48) to 0.97 (95% CI 0.93-0.99) with the lowest values being measured for the items "sensory perception" and "nutrition". There was no association between work experience and the level of interrater reliability. With two exceptions, simple kappa-values were always lower than weighted kappa-values and intraclass correlation coefficients. Although the calculated interrater reliability coefficients for the total Braden score were high in some cases, several clinically relevant differences occurred between the nurses. Due to interrater reliability being very low for the items "sensory perception" and "nutrition", it is doubtful if their assessment contributes to any valid results. The calculation of weighted kappa or intraclass correlation coefficients is the most appropriate interrater reliability estimates.
Milner, Clare E; Brindle, Richard A
2016-01-01
There has been increased interest recently in measuring kinematics within the foot during gait. While several multisegment foot models have appeared in the literature, the Oxford foot model has been used frequently for both walking and running. Several studies have reported the reliability for the Oxford foot model, but most studies to date have reported reliability for barefoot walking. The purpose of this study was to determine between-day (intra-rater) and within-session (inter-trial) reliability of the modified Oxford foot model during shod walking and running and calculate minimum detectable difference for common variables of interest. Healthy adult male runners participated. Participants ran and walked in the gait laboratory for five trials of each. Three-dimensional gait analysis was conducted and foot and ankle joint angle time series data were calculated. Participants returned for a second gait analysis at least 5 days later. Intraclass correlation coefficients and minimum detectable difference were determined for walking and for running, to indicate both within-session and between-day reliability. Overall, relative variables were more reliable than absolute variables, and within-session reliability was greater than between-day reliability. Between-day intraclass correlation coefficients were comparable to those reported previously for adults walking barefoot. It is an extension in the use of the Oxford foot model to incorporate wearing a shoe while maintaining marker placement directly on the skin for each segment. These reliability data for walking and running will aid in the determination of meaningful differences in studies which use this model during shod gait. Copyright © 2015 Elsevier B.V. All rights reserved.
Rossettini, Giacomo; Rondoni, Angie; Lovato, Tommaso; Strobe, Marco; Verzè, Elisa; Vicentini, Marco; Testa, Marco
2016-06-03
Passive Intervertebral Movements (PIVMs) are commonly used to assess and treat patients with nonspecific neck pain. Only very few studies have investigated 3D movements until now. This study assessed intra- and inter-rater reliability of three-dimensional (3D) cervical PIVMs performed by physical therapy students in patients with nonspecific neck pain. Thirty-one patients, mean age 47.2 ± 7.2 years, were independently evaluated by 2 physical therapy students. The raters (A and B) assessed mobility, end-feel and pain provocation performing bilaterally the 3D cervical segmental side-bending test (3D CSSB) from levels C2-C3 to C6-C7. Percentage agreement (raw, positive and negative), Cohen's kappa (95% CI), prevalence index and bias index were calculated to estimate intra- and inter-reliability. Intra-rater reliability showed kappa values ranging between fair and substantial (k 0.29-0.80) for pain provocation, mobility and end-feel, with percentage agreements between 61%-90%. Inter-rater reliability presented kappa values ranging between fair and substantial (k 0.22-0.62) for pain provocation, mobility and end-feel, with percentage agreements between 61% and 80%. Intra-rater reliability of 3D PIVMs was superior to inter-rater reliability in patients with nonspecific neck pain. The most repeatable evaluation parameter was pain. However overall poor reliability suggests avoiding the use of these techniques alone to examine patients and measure their outcome. Further studies are needed to investigate PIVMs reliability in combination with other assessment procedure in symptomatic patients.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rogue, F.; Binnall, E.P.
1982-10-01
Reliable instrumentation will be needed to monitor the performance of future high-level waste repository sites. A study has been made to assess instrument reliability at Department of Energy (DOE) waste repository related experiments. Though the study covers a wide variety of instrumentation, this paper concentrates on experiences with geotechnical instrumentation in hostile repository-type environments. Manufacturers have made some changes to improve the reliability of instruments for repositories. This paper reviews the failure modes, rates, and mechanisms, along with manufacturer modifications and recommendations for additional improvements to enhance instrument performance. 4 tables.
Wei, Meifen; Russell, Daniel W; Mallinckrodt, Brent; Vogel, David L
2007-04-01
We developed a 12-item, short form of the Experiences in Close Relationship Scale (ECR; Brennan, Clark, & Shaver, 1998) across 6 studies. In Study 1, we examined the reliability and factor structure of the measure. In Studies 2 and 3, we cross-validated the reliability, factor structure, and validity of the short form measure; whereas in Study 4, we examined test-retest reliability over a 1-month period. In Studies 5 and 6, we further assessed the reliability, factor structure, and validity of the short version of the ECR when administered as a stand-alone instrument. Confirmatory factor analyses indicated that 2 factors, labeled Anxiety and Avoidance, provided a good fit to the data after removing the influence of response sets. We found validity to be equivalent for the short and the original versions of the ECR across studies. Finally, the results were comparable when we embedded the short form within the original version of the ECR and when we administered it as a stand-alone measure.
Sarig Bahat, Hilla; Sprecher, Elliot; Sela, Itamar; Treleaven, Julia
2016-07-01
The use of virtual reality (VR) for assessment and intervention of neck pain has previously been used and shown reliable for cervical range of motion measures. Neck VR enables analysis of task-oriented neck movement by stimulating responsive movements to external stimuli. Therefore, the purpose of this study was to establish inter-tester reliability of neck kinematic measures so that it can be used as a reliable assessment and treatment tool between clinicians. This reliability study included 46 asymptomatic participants, who were assessed using the neck VR system which displayed an interactive VR scenario via a head-mounted device, controlled by neck movements. The objective of the interactive assessment was to hit 16 targets, randomly appearing in four directions, as fast as possible. Each participant was tested twice by two different testers. Good reliability was found of neck motion kinematic measures in flexion, extension, and rotation (0.64-0.93 inter-class correlation). High reliability was shown for peak velocity globally (0.93), in left rotation (0.9), right rotation and extension (0.88), and flexion (0.86). Mean velocity had a good global reliability (0.84), except for left rotation directed movement with moderate reliability (0.68). Minimal detectable change for peak velocity ranged from 41 to 53 °/s, while mean velocity ranged from 20 to 25 °/s. The results suggest high reliability for peak and mean velocity as measured by the interactive Neck VR assessment of neck motion kinematics. VR appears to provide a reliable and more ecologically valid method of cervical motion evaluation than previous conventional methodologies.
Rathi, Sangeeta; Taylor, Nicholas F; Gee, Jamie; Green, Rodney A
2016-12-01
Ultrasonography is an economical and non-invasive method for measuring real-time joint movements. Although physiotherapists are increasingly using ultrasound imaging for rotator cuff disorders, there is a lack of evidence on their reliability in using ultrasonography to measure glenohumeral translation. The aim of this study was to evaluate the reliability of a physiotherapist in measuring anterior and posterior glenohumeral joint translation with ultrasound. Study design: within day reliability. Anterior and posterior glenohumeral translations were measured at rest, in response to passive accessory motion testing force, and with isometric internal and external rotation in 12 young healthy adults. All the measurements were made in real time by a physiotherapist and an experienced sonographer in two positions (neutral and abducted) and in two views (anterior and posterior). Intra-rater and inter-rater reliability were expressed using intraclass correlation coefficients (ICC) and measurement error (mm). Intra-rater reliability was good for both raters (ICC P : 0.86-0.98; ICC S : 0.85-0.96). The inter-rater reliability between the physiotherapist and sonographer was moderate to good for posterior measurements (ICC 0.50-0.75) and poor to moderate for anterior measurements (ICC 0.31-0.53). For both intra-rater and inter-rater measurements, posterior translation was more reliable than the anterior translation with smaller measurement errors (posterior: 0.1-0.2 mm, anterior: 0.2-0.3 mm). A physiotherapist with minimal training was reliable in measuring glenohumeral joint translations. The ultrasound method was reliable for repeated measurement of both anterior and posterior glenohumeral translations with posterior measurements being more reliable than anterior. This method is recommended for future research to investigate the stabilising role of rotator cuff muscles. Copyright © 2016 Elsevier Ltd. All rights reserved.
Reliability of videotaped observational gait analysis in patients with orthopedic impairments
Brunnekreef, Jaap J; van Uden, Caro JT; van Moorsel, Steven; Kooloos, Jan GM
2005-01-01
Background In clinical practice, visual gait observation is often used to determine gait disorders and to evaluate treatment. Several reliability studies on observational gait analysis have been described in the literature and generally showed moderate reliability. However, patients with orthopedic disorders have received little attention. The objective of this study is to determine the reliability levels of visual observation of gait in patients with orthopedic disorders. Methods The gait of thirty patients referred to a physical therapist for gait treatment was videotaped. Ten raters, 4 experienced, 4 inexperienced and 2 experts, individually evaluated these videotaped gait patterns of the patients twice, by using a structured gait analysis form. Reliability levels were established by calculating the Intraclass Correlation Coefficient (ICC), using a two-way random design and based on absolute agreement. Results The inter-rater reliability among experienced raters (ICC = 0.42; 95%CI: 0.38–0.46) was comparable to that of the inexperienced raters (ICC = 0.40; 95%CI: 0.36–0.44). The expert raters reached a higher inter-rater reliability level (ICC = 0.54; 95%CI: 0.48–0.60). The average intra-rater reliability of the experienced raters was 0.63 (ICCs ranging from 0.57 to 0.70). The inexperienced raters reached an average intra-rater reliability of 0.57 (ICCs ranging from 0.52 to 0.62). The two expert raters attained ICC values of 0.70 and 0.74 respectively. Conclusion Structured visual gait observation by use of a gait analysis form as described in this study was found to be moderately reliable. Clinical experience appears to increase the reliability of visual gait analysis. PMID:15774012
Marawar, Satyajit V; Madom, Ian A; Palumbo, Mark; Tallarico, Richard A; Ordway, Nathaniel R; Metkar, Umesh; Wang, Dongliang; Green, Adam; Lavelle, William F
2017-01-01
Treating surgeon's visual assessment of axial MRI images to ascertain the degree of stenosis has a critical impact on surgical decision-making. The purpose of this study was to prospectively analyze the impact of surgeon experience on inter-observer and intra-observer reliability of assessing severity of spinal stenosis on MRIs by spine surgeons directly involved in surgical decision-making. Seven fellowship trained spine surgeons reviewed MRI studies of 30 symptomatic patients with lumbar stenosis and graded the stenosis in the central canal, the lateral recess and the foramen at T12-L1 to L5-S1 as none, mild, moderate or severe. No specific instructions were provided to what constituted mild, moderate, or severe stenosis. Two surgeons were "senior" (>fifteen years of practice experience); two were "intermediate" (>four years of practice experience), and three "junior" (< one year of practice experience). The concordance correlation coefficient (CCC) was calculated to assess inter-observer reliability. Seven MRI studies were duplicated and randomly re-read to evaluate inter-observer reliability. Surgeon experience was found to be a strong predictor of inter-observer reliability. Senior inter-observer reliability was significantly higher assessing central(p<0.001), foraminal p=0.005 and lateral p=0.001 than "junior" group.Senior group also showed significantly higher inter-observer reliability that intermediate group assessing foraminal stenosis (p=0.036). In intra-observer reliability the results were contrary to that found in inter-observer reliability. Inter-observer reliability of assessing stenosis on MRIs increases with surgeon experience. Lower intra-observer reliability values among the senior group, although not clearly explained, may be due to the small number of MRIs evaluated and quality of MRI images.Level of evidence: Level 3.
Reliability of two social cognition tests: The combined stories test and the social knowledge test.
Thibaudeau, Élisabeth; Cellard, Caroline; Legendre, Maxime; Villeneuve, Karèle; Achim, Amélie M
2018-04-01
Deficits in social cognition are common in psychiatric disorders. Validated social cognition measures with good psychometric properties are necessary to assess and target social cognitive deficits. Two recent social cognition tests, the Combined Stories Test (COST) and the Social Knowledge Test (SKT), respectively assess theory of mind and social knowledge. Previous studies have shown good psychometric properties for these tests, but the test-retest reliability has never been documented. The aim of this study was to evaluate the test-retest reliability and the inter-rater reliability of the COST and the SKT. The COST and the SKT were administered twice to a group of forty-two healthy adults, with a delay of approximately four weeks between the assessments. Excellent test-retest reliability was observed for the COST, and a good test-retest reliability was observed for the SKT. There was no evidence of practice effect. Furthermore, an excellent inter-rater reliability was observed for both tests. This study shows a good reliability of the COST and the SKT that adds to the good validity previously reported for these two tests. These good psychometrics properties thus support that the COST and the SKT are adequate measures for the assessment of social cognition. Copyright © 2018. Published by Elsevier B.V.
Savage, Trevor Nicholas; McIntosh, Andrew Stuart
2017-03-01
It is important to understand factors contributing to and directly causing sports injuries to improve the effectiveness and safety of sports skills. The characteristics of injury events must be evaluated and described meaningfully and reliably. However, many complex skills cannot be effectively investigated quantitatively because of ethical, technological and validity considerations. Increasingly, qualitative methods are being used to investigate human movement for research purposes, but there are concerns about reliability and measurement bias of such methods. Using the tackle in Rugby union as an example, we outline a systematic approach for developing a skill analysis protocol with a focus on improving objectivity, validity and reliability. Characteristics for analysis were selected using qualitative analysis and biomechanical theoretical models and epidemiological and coaching literature. An expert panel comprising subject matter experts provided feedback and the inter-rater reliability of the protocol was assessed using ten trained raters. The inter-rater reliability results were reviewed by the expert panel and the protocol was revised and assessed in a second inter-rater reliability study. Mean agreement in the second study improved and was comparable (52-90% agreement and ICC between 0.6 and 0.9) with other studies that have reported inter-rater reliability of qualitative analysis of human movement.
Lindskog, Marcus; Winman, Anders; Juslin, Peter; Poom, Leo
2013-01-01
Two studies investigated the reliability and predictive validity of commonly used measures and models of Approximate Number System acuity (ANS). Study 1 investigated reliability by both an empirical approach and a simulation of maximum obtainable reliability under ideal conditions. Results showed that common measures of the Weber fraction (w) are reliable only when using a substantial number of trials, even under ideal conditions. Study 2 compared different purported measures of ANS acuity as for convergent and predictive validity in a within-subjects design and evaluated an adaptive test using the ZEST algorithm. Results showed that the adaptive measure can reduce the number of trials needed to reach acceptable reliability. Only direct tests with non-symbolic numerosity discriminations of stimuli presented simultaneously were related to arithmetic fluency. This correlation remained when controlling for general cognitive ability and perceptual speed. Further, the purported indirect measure of ANS acuity in terms of the Numeric Distance Effect (NDE) was not reliable and showed no sign of predictive validity. The non-symbolic NDE for reaction time was significantly related to direct w estimates in a direction contrary to the expected. Easier stimuli were found to be more reliable, but only harder (7:8 ratio) stimuli contributed to predictive validity. PMID:23964256
Janssen, Ellen M; Marshall, Deborah A; Hauber, A Brett; Bridges, John F P
2017-12-01
The recent endorsement of discrete-choice experiments (DCEs) and other stated-preference methods by regulatory and health technology assessment (HTA) agencies has placed a greater focus on demonstrating the validity and reliability of preference results. Areas covered: We present a practical overview of tests of validity and reliability that have been applied in the health DCE literature and explore other study qualities of DCEs. From the published literature, we identify a variety of methods to assess the validity and reliability of DCEs. We conceptualize these methods to create a conceptual model with four domains: measurement validity, measurement reliability, choice validity, and choice reliability. Each domain consists of three categories that can be assessed using one to four procedures (for a total of 24 tests). We present how these tests have been applied in the literature and direct readers to applications of these tests in the health DCE literature. Based on a stakeholder engagement exercise, we consider the importance of study characteristics beyond traditional concepts of validity and reliability. Expert commentary: We discuss study design considerations to assess the validity and reliability of a DCE, consider limitations to the current application of tests, and discuss future work to consider the quality of DCEs in healthcare.
Study on the Validity and Reliability of Melbourne Decision Making Scale in Turkey
ERIC Educational Resources Information Center
Çolakkadioglu, Oguzhan; Deniz, M. Engin
2015-01-01
This study is to analyze the validity and reliability of Melbourne Decision Making Questionnaire (MDMQ). The sample consisted of 650 university students. The structural validity of the MDMQ, as well as correlations among its sub-scales, measure-bound validity, internal consistency, item total correlations and test-retest reliability coefficients…
Reliability and Validity Tests of Singelis's Self-Construal Scale (1994).
ERIC Educational Resources Information Center
Wang, Qi
Two studies focused on the reliability and validity of T.M. Singelis's 24-item Self-Construal Scale (SCS) (1994). In the first study, Cronbach alphas were calculated to assess the internal consistency of the reliability of the two subscales that were supposed to measure individuals' independent and interdependent self construals. The sample was…
Reliability of reports of childhood trauma in bipolar disorder: A test-retest study over 18 months.
Shannon, Ciaran; Hanna, Donncha; Tumelty, Leo; Waldron, Daniel; Maguire, Chrissie; Mowlds, William; Meenagh, Ciaran; Mulholland, Ciaran
2016-01-01
This study aimed to explore the reliability of self-reported trauma histories in a population with a diagnosis of bipolar disorder using the Childhood Trauma Questionnaire. Previous studies in other populations suggest high reliability of trauma histories over time, and it was postulated that a similar high reliability would be demonstrated in this population. A total of 39 patients with a confirmed diagnosis (Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, criteria) were followed up and readministered the Childhood Trauma Questionnaire after 18 months. Cohen's kappa scores and intraclass correlations suggested reasonable test-retest reliability over the 18-month time period of the study for all types of childhood abuse, namely, emotional, physical, and sexual abuse and physical and emotional neglect. Intraclass correlations ranged from r = .50 (sexual abuse) to r = .96 (physical abuse). Cohen's kappas ranged from .44 (sexual abuse) to .76 (physical abuse). Retrospective reports of childhood trauma can be seen as reliable and are in keeping with results found with other mental health populations.
Richler, Jennifer J.; Floyd, R. Jackie; Gauthier, Isabel
2014-01-01
Efforts to understand individual differences in high-level vision necessitate the development of measures that have sufficient reliability, which is generally not a concern in group studies. Holistic processing is central to research on face recognition and, more recently, to the study of individual differences in this area. However, recent work has shown that the most popular measure of holistic processing, the composite task, has low reliability. This is particularly problematic for the recent surge in interest in studying individual differences in face recognition. Here, we developed and validated a new measure of holistic face processing specifically for use in individual-differences studies. It avoids some of the pitfalls of the standard composite design and capitalizes on the idea that trial variability allows for better traction on reliability. Across four experiments, we refine this test and demonstrate its reliability. PMID:25228629
Reliability Generalization of the Psychopathy Checklist Applied in Youthful Samples
ERIC Educational Resources Information Center
Campbell, Justin S.; Pulos, Steven; Hogan, Mike; Murry, Francie
2005-01-01
This study examines the average reliability of Hare Psychopathy Checklists (PCLs) adapted for use in samples of youthful offenders (aged 12 to 21 years). Two forms of reliability are examined: 18 alpha estimates of internal consistency and 18 intraclass correlation (two or more raters) estimates of interrater reliability. The results, an average…
Performance Evaluation of Reliable Multicast Protocol for Checkout and Launch Control Systems
NASA Technical Reports Server (NTRS)
Shu, Wei Wennie; Porter, John
2000-01-01
The overall objective of this project is to study reliability and performance of Real Time Critical Network (RTCN) for checkout and launch control systems (CLCS). The major tasks include reliability and performance evaluation of Reliable Multicast (RM) package and fault tolerance analysis and design of dual redundant network architecture.
ERIC Educational Resources Information Center
Morgan, Grant B.; Zhu, Min; Johnson, Robert L.; Hodge, Kari J.
2014-01-01
Common estimators of interrater reliability include Pearson product-moment correlation coefficients, Spearman rank-order correlations, and the generalizability coefficient. The purpose of this study was to examine the accuracy of estimators of interrater reliability when varying the true reliability, number of scale categories, and number of…
Testing the Difference between Reliability Coefficients Alpha and Omega
ERIC Educational Resources Information Center
Deng, Lifang; Chan, Wai
2017-01-01
Reliable measurements are key to social science research. Multiple measures of reliability of the total score have been developed, including coefficient alpha, coefficient omega, the greatest lower bound reliability, and others. Among these, the coefficient alpha has been most widely used, and it is reported in nearly every study involving the…
The Reliability of Difference Scores in Populations and Samples
ERIC Educational Resources Information Center
Zimmerman, Donald W.
2009-01-01
This study was an investigation of the relation between the reliability of difference scores, considered as a parameter characterizing a population of examinees, and the reliability estimates obtained from random samples from the population. The parameters in familiar equations for the reliability of difference scores were redefined in such a way…
A reliability analysis of the revised competitiveness index.
Harris, Paul B; Houston, John M
2010-06-01
This study examined the reliability of the Revised Competitiveness Index by investigating the test-retest reliability, interitem reliability, and factor structure of the measure based on a sample of 280 undergraduates (200 women, 80 men) ranging in age from 18 to 28 years (M = 20.1, SD = 2.1). The findings indicate that the Revised Competitiveness Index has high test-retest reliability, high inter-item reliability, and a stable factor structure. The results support the assertion that the Revised Competitiveness Index assesses competitiveness as a stable trait rather than a dynamic state.
Smith, Toby O; Clark, Allan; Neda, Sophia; Arendt, Elizabeth A; Post, William R; Grelsamer, Ronald P; Dejour, David; Almqvist, Karl Fredrik; Donell, Simon T
2012-08-01
An accurate physical examination of patients with patellar instability is an important aspect of the diagnosis and treatment. While previous studies have assessed the diagnostic accuracy of such physical examination tests, little has been undertaken to assess the inter- and intra-tester reliability of such techniques. The purpose of this study was to determine the inter- and intra-tester reliability of the physical examination tests used for patients with patellar instability. Five patients (10 knees) with bilateral recurrent patellar instability were assessed by five members of the International Patellofemoral Study Group. Each surgeon assessed each patient twice using 18 reported physical examination tests. The inter- and intra-observer reliability was assessed using weighted Kappa statistics with 95% confidence intervals. The findings of the study suggested that there were very poor inter-observer reliability for the majority of the physical tests, with only the assessments of patellofemoral crepitus, foot arch position and the J-sign presenting with fair to moderate agreement respectively. The intra-observer reliability indicated largely moderate to substantial agreement between the first and second tests performed by each assessor, with the greatest agreement seen for the assessment of tibial torsion, popliteal angle and the Bassett's sign. For the common physical examination tests used in the management of patients with patellar instability inter-observer reliability is poor, while intra-observer reliability is moderate. Standardization of physical exam assessments and further study of these results among different clinicians and more divergent patient groups is indicated. Copyright © 2011 Elsevier B.V. All rights reserved.
Hughes, Michael; Tracey, Andrew; Bhushan, Monica; Chakravarty, Kuntal; Denton, Christopher P; Dubey, Shirish; Guiducci, Serena; Muir, Lindsay; Ong, Voon; Parker, Louise; Pauling, John D; Prabu, Athiveeraramapandian; Rogers, Christine; Roberts, Christopher; Herrick, Ariane L
2018-06-01
The reliability of clinician grading of systemic sclerosis-related digital ulcers has been reported to be poor to moderate at best, which has important implications for clinical trial design. The aim of this study was to examine the reliability of new proposed UK Scleroderma Study Group digital ulcer definitions among UK clinicians with an interest in systemic sclerosis. Raters graded (through a custom-built interface) 90 images (80 unique and 10 repeat) of a range of digital lesions collected from patients with systemic sclerosis. Lesions were graded on an ordinal scale of severity: 'no ulcer', 'healed ulcer' or 'digital ulcer'. A total of 23 clinicians - 18 rheumatologists, 3 dermatologists, 1 hand surgeon and 1 specialist rheumatology nurse - completed the study. A total of 2070 (1840 unique + 230 repeat) image gradings were obtained. For intra-rater reliability, across all images, the overall weighted kappa coefficient was high (0.71) and was moderate (0.55) when averaged across individual raters. Overall inter-rater reliability was poor (0.15). Although our proposed digital ulcer definitions had high intra-rater reliability, the overall inter-rater reliability was poor. Our study highlights the challenges of digital ulcer assessment by clinicians with an interest in systemic sclerosis and provides a number of useful insights for future clinical trial design. Further research is warranted to improve the reliability of digital ulcer definition/rating as an outcome measure in clinical trials, including examining the role for objective measurement techniques, and the development of digital ulcer patient-reported outcome measures.
Romli, Muhammad Hibatullah; Mackenzie, Lynette; Lovarini, Meryl; Tan, Maw Pin; Clemson, Lindy
2017-06-01
Falls can be a devastating issue for older people living in the community, including those living in Malaysia. Health professionals and community members have a responsibility to ensure that older people have a safe home environment to reduce the risk of falls. Using a standardised screening tool is beneficial to intervene early with this group. The Home Falls and Accidents Screening Tool (HOME FAST) should be considered for this purpose; however, its use in Malaysia has not been studied. Therefore, the aim of this study was to evaluate the interrater and test-retest reliability of the HOME FAST with multiple professionals in the Malaysian context. A cross-sectional design was used to evaluate interrater reliability where the HOME FAST was used simultaneously in the homes of older people by 2 raters and a prospective design was used to evaluate test-retest reliability with a separate group of older people at different times in their homes. Both studies took place in an urban area of Kuala Lumpur. Professionals from 9 professional backgrounds participated as raters in this study, and a group of 51 community older people were recruited for the interrater reliability study and another group of 30 for the test-retest reliability study. The overall agreement was moderate for interrater reliability and good for test-retest reliability. The HOME FAST was consistently rated by different professionals, and no bias was found among the multiple raters. The HOME FAST can be used with confidence by a variety of professionals across different settings. The HOME FAST can become a universal tool to screen for home hazards related to falls. © 2017 John Wiley & Sons, Ltd.
Empirical Recommendations for Improving the Stability of the Dot-Probe Task in Clinical Research
Price, Rebecca B.; Kuckertz, Jennie M.; Siegle, Greg J.; Ladouceur, Cecile D.; Silk, Jennifer S.; Ryan, Neal D.; Dahl, Ronald E.; Amir, Nader
2014-01-01
The dot-probe task has been widely used in research to produce an index of biased attention based on reaction times (RTs). Despite its popularity, very few published studies have examined psychometric properties of the task, including test-retest reliability, and no previous study has examined reliability in clinically anxious samples or systematically explored the effects of task design and analysis decisions on reliability. In the current analysis, we utilized dot-probe data from three studies where attention bias towards threat-related faces was assessed at multiple (≥5) timepoints. Two of the studies were similar (adults with Social Anxiety Disorder, similar design features) while one was much more disparate (pediatric healthy volunteers, distinct task design). We explored the effects of analysis choices (e.g., bias score calculation formula, methods for outlier handling) on reliability and searched for convergence of findings across the three studies. We found that, when considering the three studies concurrently, the most reliable RT bias index utilized data from dot-bottom trials, comparing congruent to incongruent trials, with rescaled outliers, particularly after averaging across more than one assessment point. Although reliability of RT bias indices was moderate to low under most circumstances, within-session variability in bias (attention bias variability; ABV), a recently proposed RT index, was more reliable across sessions. Several eyetracking-based indices of attention bias (available in the pediatric healthy sample only) showed reliability that matched the optimal RT index (ABV). On the basis of these findings, we make specific recommendations to researchers using the dot probe, particularly those wishing to investigate individual differences and/or single-patient applications. PMID:25419646
Ustün, B; Compton, W; Mager, D; Babor, T; Baiyewu, O; Chatterji, S; Cottler, L; Göğüş, A; Mavreas, V; Peters, L; Pull, C; Saunders, J; Smeets, R; Stipec, M R; Vrasti, R; Hasin, D; Room, R; Van den Brink, W; Regier, D; Blaine, J; Grant, B F; Sartorius, N
1997-09-25
The WHO Study on the reliability and validity of the alcohol and drug use disorder instruments in an international study which has taken place in centres in ten countries, aiming to test the reliability and validity of three diagnostic instruments for alcohol and drug use disorders: the Composite International Diagnostic Interview (CIDI), the Schedules for Clinical Assessment in Neuropsychiatry (SCAN) and a special version of the Alcohol Use Disorder and Associated Disabilities Interview schedule-alcohol/drug-revised (AUDADIS-ADR). The purpose of the reliability and validity (R&V) study is to further develop the alcohol and drug sections of these instruments so that a range of substance-related diagnoses can be made in a systematic, consistent, and reliable way. The study focuses on new criteria proposed in the tenth revision of the International Classification of Diseases (ICD-10) and the fourth revision of the diagnostic and statistical manual of mental disorders (DSM-IV) for dependence, harmful use and abuse categories for alcohol and psychoactive substance use disorders. A systematic study including a scientifically rigorous measure of reliability (i.e. 1 week test-retest reliability) and validity (i.e. comparison between clinical and non-clinical measures) has been undertaken. Results have yielded useful information on reliability and validity of these instruments at diagnosis, criteria and question level. Overall the diagnostic concordance coefficients (kappa, kappa) were very good for dependence disorders (0.7-0.9), but were somewhat lower for the abuse and harmful use categories. The comparisons among instruments and independent clinical evaluations and debriefing interviews gave important information about possible sources of unreliability, and provided useful clues on the applicability and consistency of nosological concepts across cultures.
DiCesare, Christopher A.; Bates, Nathaniel A.; Barber Foss, Kim D.; Thomas, Staci M.; Wordeman, Samuel C.; Sugimoto, Dai; Roewer, Benjamin D.; Medina McKeon, Jennifer M.; Di Stasi, Stephanie; Noehren, Brian W.; Ford, Kevin R.; Kiefer, Adam W.; Hewett, Timothy E.; Myer, Gregory D.
2015-01-01
Background: Anterior cruciate ligament (ACL) injuries are physically and financially devastating but affect a relatively small percentage of the population. Prospective identification of risk factors for ACL injury necessitates a large sample size; therefore, study of this injury would benefit from a multicenter approach. Purpose: To determine the reliability of kinematic and kinetic measures of a single-leg cross drop task across 3 institutions. Study Design: Controlled laboratory study. Methods: Twenty-five female high school volleyball players participated in this study. Three-dimensional motion data of each participant performing the single-leg cross drop were collected at 3 institutions over a period of 4 weeks. Coefficients of multiple correlation were calculated to assess the reliability of kinematic and kinetic measures during the landing phase of the movement. Results: Between-centers reliability for kinematic waveforms in the frontal and sagittal planes was good, but moderate in the transverse plane. Between-centers reliability for kinetic waveforms was good in the sagittal, frontal, and transverse planes. Conclusion: Based on these findings, the single-leg cross drop task has moderate to good reliability of kinematic and kinetic measures across institutions after implementation of a standardized testing protocol. Clinical Relevance: Multicenter collaborations can increase study numbers and generalize results, which is beneficial for studies of relatively rare phenomena, such as ACL injury. An important step is to determine the reliability of risk assessments across institutions before a multicenter collaboration can be initiated. PMID:26779550
Uysal, Hilal; Ozcan, Şeyda
2011-06-01
Many new measuring devices have been developed so that broader psychometric measurements in the coronary artery disease, disease-specific health status measurements, and identification of the broader quality of life can be performed in the recent years. The study was intended to determine whether, and to what extent, MIDAS is a valid and reliable measurement to the patients suffering from myocardial infarction for the first time in Turkey. The research was conducted with the patients hospitalized and treated with myocardial infarction in the cardiology departments of 2 hospitals in Istanbul, Turkey, between 2007 and 2008. Psychometric evaluations of TR-MIDAS were used for validity studies; language validity, content validity, construct validity were examined. For reliability studies; the tool's internal consistency reliability, Cronbach's alpha reliability coefficient, and test-retest reliability were completed. The instrument's content validity index was determined to be "0.95". Principal component analysis revealed six factors with an eigenvalue >1.5. Cronbach's alpha was found to be 0.89 for total scale which was an acceptable value. The total's test-retest reliability was 0.51 (p<0.01). Data obtained at the end of the study supports that Turkish Myocardial Infarction Dimensional Assessment Scale is a valid and reliable instrument as a disease-specific scale to assess the patients' quality of life suffering from myocardial infarction in Turkey. Copyright © 2010 European Society of Cardiology. Published by Elsevier B.V. All rights reserved.
Reliability of conditioned pain modulation: a systematic review
Kennedy, Donna L.; Kemp, Harriet I.; Ridout, Deborah; Yarnitsky, David; Rice, Andrew S.C.
2016-01-01
Abstract A systematic literature review was undertaken to determine if conditioned pain modulation (CPM) is reliable. Longitudinal, English language observational studies of the repeatability of a CPM test paradigm in adult humans were included. Two independent reviewers assessed the risk of bias in 6 domains; study participation; study attrition; prognostic factor measurement; outcome measurement; confounding and analysis using the Quality in Prognosis Studies (QUIPS) critical assessment tool. Intraclass correlation coefficients (ICCs) less than 0.4 were considered to be poor; 0.4 and 0.59 to be fair; 0.6 and 0.75 good and greater than 0.75 excellent. Ten studies were included in the final review. Meta-analysis was not appropriate because of differences between studies. The intersession reliability of the CPM effect was investigated in 8 studies and reported as good (ICC = 0.6-0.75) in 3 studies and excellent (ICC > 0.75) in subgroups in 2 of those 3. The assessment of risk of bias demonstrated that reporting is not comprehensive for the description of sample demographics, recruitment strategy, and study attrition. The absence of blinding, a lack of control for confounding factors, and lack of standardisation in statistical analysis are common. Conditioned pain modulation is a reliable measure; however, the degree of reliability is heavily dependent on stimulation parameters and study methodology and this warrants consideration for investigators. The validation of CPM as a robust prognostic factor in experimental and clinical pain studies may be facilitated by improvements in the reporting of CPM reliability studies. PMID:27559835
Piqueras, Jose A; Martín-Vivar, María; Sandin, Bonifacio; San Luis, Concepción; Pineda, David
2017-08-15
Anxiety and depression are among the most common mental disorders during childhood and adolescence. Among the instruments for the brief screening assessment of symptoms of anxiety and depression, the Revised Child Anxiety and Depression Scale (RCADS) is one of the more widely used. Previous studies have demonstrated the reliability of the RCADS for different assessment settings and different versions. The aims of this study were to examine the mean reliability of the RCADS and the influence of the moderators on the RCADS reliability. We searched in EBSCO, PsycINFO, Google Scholar, Web of Science, and NCBI databases and other articles manually from lists of references of extracted articles. A total of 146 studies were included in our meta-analysis. The RCADS showed robust internal consistency reliability in different assessment settings, countries, and languages. We only found that reliability of the RCADS was significantly moderated by the version of RCADS. However, these differences in reliability between different versions of the RCADS were slight and can be due to the number of items. We did not examine factor structure, factorial invariance across gender, age, or country, and test-retest reliability of the RCADS. The RCADS is a reliable instrument for cross-cultural use, with the advantage of providing more information with a low number of items in the assessment of both anxiety and depression symptoms in children and adolescents. Copyright © 2017. Published by Elsevier B.V.
Reliability-based structural optimization: A proposed analytical-experimental study
NASA Technical Reports Server (NTRS)
Stroud, W. Jefferson; Nikolaidis, Efstratios
1993-01-01
An analytical and experimental study for assessing the potential of reliability-based structural optimization is proposed and described. In the study, competing designs obtained by deterministic and reliability-based optimization are compared. The experimental portion of the study is practical because the structure selected is a modular, actively and passively controlled truss that consists of many identical members, and because the competing designs are compared in terms of their dynamic performance and are not destroyed if failure occurs. The analytical portion of this study is illustrated on a 10-bar truss example. In the illustrative example, it is shown that reliability-based optimization can yield a design that is superior to an alternative design obtained by deterministic optimization. These analytical results provide motivation for the proposed study, which is underway.
Flight control electronics reliability/maintenance study
NASA Technical Reports Server (NTRS)
Dade, W. W.; Edwards, R. H.; Katt, G. T.; Mcclellan, K. L.; Shomber, H. A.
1977-01-01
Collection and analysis of data are reported that concern the reliability and maintenance experience of flight control system electronics currently in use on passenger carrying jet aircraft. Two airlines B-747 airplane fleets were analyzed to assess the component reliability, system functional reliability, and achieved availability of the CAT II configuration flight control system. Also assessed were the costs generated by this system in the categories of spare equipment, schedule irregularity, and line and shop maintenance. The results indicate that although there is a marked difference in the geographic location and route pattern between the airlines studied, there is a close similarity in the reliability and the maintenance costs associated with the flight control electronics.
The Yale-Brown Obsessive Compulsive Scale: A Reliability Generalization Meta-Analysis.
López-Pina, José Antonio; Sánchez-Meca, Julio; López-López, José Antonio; Marín-Martínez, Fulgencio; Núñez-Núñez, Rosa Maria; Rosa-Alcázar, Ana I; Gómez-Conesa, Antonia; Ferrer-Requena, Josefa
2015-10-01
The Yale-Brown Obsessive Compulsive Scale (Y-BOCS) is the most frequently applied test to assess obsessive compulsive symptoms. We conducted a reliability generalization meta-analysis on the Y-BOCS to estimate the average reliability, examine the variability among the reliability estimates, search for moderators, and propose a predictive model that researchers and clinicians can use to estimate the expected reliability of the Y-BOCS. We included studies where the Y-BOCS was applied to a sample of adults and reliability estimate was reported. Out of the 11,490 references located, 144 studies met the selection criteria. For the total scale, the mean reliability was 0.866 for coefficients alpha, 0.848 for test-retest correlations, and 0.922 for intraclass correlations. The moderator analyses led to a predictive model where the standard deviation of the total test and the target population (clinical vs. nonclinical) explained 38.6% of the total variability among coefficients alpha. Finally, clinical implications of the results are discussed. © The Author(s) 2014.
Larson, Tomas; Kerekes, Nóra; Selinus, Eva Norén; Lichtenstein, Paul; Gumpert, Clara Hellner; Anckarsäter, Henrik; Nilsson, Thomas; Lundström, Sebastian
2014-02-01
The Autism-Tics, AD/HD, and other Comorbidities (A-TAC) inventory is used in epidemiological research to assess neurodevelopmental problems and coexisting conditions. Although the A-TAC has been applied in various populations, data on retest reliability are limited. The objective of the present study was to present additional reliability data. The A-TAC was administered by lay assessors and was completed on two occasions by parents of 400 individual twins, with an average interval of 70 days between test sessions. Intra- and inter-rater reliability were analysed with intraclass correlations and Cohen's kappa. A-TAC showed excellent test-retest intraclass correlations for both autism spectrum disorder and attention deficit hyperactivity disorder (each at .84). Most modules in the A-TAC had intra- and inter-rater reliability intraclass correlation coefficients of > or = .60. Cohen's kappa indi- cated acceptable reliability. The current study provides statistical evidence that the A-TAC yields good test-retest reliability in a population-based cohort of children.
Scheduling for energy and reliability management on multiprocessor real-time systems
NASA Astrophysics Data System (ADS)
Qi, Xuan
Scheduling algorithms for multiprocessor real-time systems have been studied for years with many well-recognized algorithms proposed. However, it is still an evolving research area and many problems remain open due to their intrinsic complexities. With the emergence of multicore processors, it is necessary to re-investigate the scheduling problems and design/develop efficient algorithms for better system utilization, low scheduling overhead, high energy efficiency, and better system reliability. Focusing cluster schedulings with optimal global schedulers, we study the utilization bound and scheduling overhead for a class of cluster-optimal schedulers. Then, taking energy/power consumption into consideration, we developed energy-efficient scheduling algorithms for real-time systems, especially for the proliferating embedded systems with limited energy budget. As the commonly deployed energy-saving technique (e.g. dynamic voltage frequency scaling (DVFS)) will significantly affect system reliability, we study schedulers that have intelligent mechanisms to recuperate system reliability to satisfy the quality assurance requirements. Extensive simulation is conducted to evaluate the performance of the proposed algorithms on reduction of scheduling overhead, energy saving, and reliability improvement. The simulation results show that the proposed reliability-aware power management schemes could preserve the system reliability while still achieving substantial energy saving.
RELIABILITY AND VALIDITY OF SUBJECTIVE ASSESSMENT OF LUMBAR LORDOSIS IN CONVENTIONAL RADIOGRAPHY.
Ruhinda, E; Byanyima, R K; Mugerwa, H
2014-10-01
Reliability and validity studies of different lumbar curvature analysis and measurement techniques have been documented however there is limited literature on the reliability and validity of subjective visual analysis. Radiological assessment of lumbar lordotic curve aids in early diagnosis of conditions even before neurologic changes set in. To ascertain the level of reliability and validity of subjective assessment of lumbar lordosis in conventional radiography. A blinded, repeated-measures diagnostic test was carried out on lumbar spine x-ray radiographs. Radiology Department at Joint Clinical Research Centre (JCRC), Mengo-Kampala-Uganda. Seventy (70) lateral lumbar x-ray films were used for this study and were obtained from the archive of JCRC radiology department at Butikiro house, Mengo-Kampala. Poor observer agreement, both inter- and intra-observer, with kappa values of 0.16 was found. Inter-observer agreement was poorer than intra-observer agreement. Kappa values significantly rose when the lumbar lordosis was clustered into four categories without grading each abnormality. The results confirm that subjective assessment of lumbar lordosis has low reliability and validity. Film quality has limited influence on the observer reliability. This study further shows that fewer scale categories of lordosis abnormalities produce better observer reliability.
Lange, Toni; Struyf, Filip; Schmitt, Jochen; Lützner, Jörg; Kopkow, Christian
2017-07-01
Systematic review. The aim of this systematic review was to summarize and evaluate intra- and interrater reliability research of physical examination tests used for the assessment of scapular dyskinesis. Scapular dyskinesis, defined as alteration of normal scapular kinematics, is described as a non-specific response to different shoulder pathologies. A systematic literature search was conducted in MEDLINE, EMBASE, AMED and PEDro until March 20th, 2015. Methodological quality was assessed with the Quality Appraisal of Reliability Studies (QAREL) by two independent reviewers. The search strategy revealed 3259 articles, of which 15 met the inclusion criteria. These studies evaluated the reliability of 41 test and test variations used for the assessment of scapular dyskinesis. This review identified a lack of high-quality studies evaluating intra- as well as interrater reliability of tests used for the assessment of scapular dyskinesis. In addition, reliability measures differed between included studies hindering proper cross-study comparisons. The effect of manual correction of the scapula on shoulder symptoms was evaluated in only one study, which is striking, since symptom alteration tests are used in routine care to guide further treatment. Thus, there is a strong need for further research in this area. Diagnosis, level 3a. Copyright © 2016. Published by Elsevier Ltd.
Reducing random measurement error in assessing postural load on the back in epidemiologic surveys.
Burdorf, A
1995-02-01
The goal of this study was to design strategies to assess postural load on the back in occupational epidemiology by taking into account the reliability of measurement methods and the variability of exposure among the workers under study. Intermethod reliability studies were evaluated to estimate the systematic bias (accuracy) and random measurement error (precision) of various methods to assess postural load on the back. Intramethod reliability studies were reviewed to estimate random variability of back load over time. Intermethod surveys have shown that questionnaires have a moderate reliability for gross activities such as sitting, whereas duration of trunk flexion and rotation should be assessed by observation methods or inclinometers. Intramethod surveys indicate that exposure variability can markedly affect the reliability of estimates of back load if the estimates are based upon a single measurement over a certain time period. Equations have been presented to evaluate various study designs according to the reliability of the measurement method, the optimum allocation of the number of repeated measurements per subject, and the number of subjects in the study. Prior to a large epidemiologic study, an exposure-oriented survey should be conducted to evaluate the performance of measurement instruments and to estimate sources of variability for back load. The strategy for assessing back load can be optimized by balancing the number of workers under study and the number of repeated measurements per worker.
A Reliability Generalization Meta-Analysis of Coefficient Alpha for the Maslach Burnout Inventory
ERIC Educational Resources Information Center
Wheeler, Denna L.; Vassar, Matt; Worley, Jody A.; Barnes, Laura L. B.
2011-01-01
The purpose of this study was to synthesize internal consistency reliability for the subscale scores on the Maslach Burnout Inventory (MBI). The authors addressed three research questions: (a) What is the mean subscale score reliability for the MBI across studies? (b) What factors are associated with observed variance in MBI subscale score…
Reliability of Total Test Scores When Considered as Ordinal Measurements
ERIC Educational Resources Information Center
Biswas, Ajoy Kumar
2006-01-01
This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…
ERIC Educational Resources Information Center
Uzun, N. Bilge; Aktas, Mehtap; Asiret, Semih; Yormaz, Seha
2018-01-01
The goal of this study is to determine the reliability of the performance points of dentistry students regarding communication skills and to examine the scoring reliability by generalizability theory in balanced random and fixed facet (mixed design) data, considering also the interactions of student, rater and duty. The study group of the research…
ERIC Educational Resources Information Center
Nordness, Philip D.; Epstein, Michael H.; Cullinan, Douglas; Pierce, Corey D.
2014-01-01
The Emotional and Behavioral Screener (EBS) is a universal screening instrument designed to identify students whose excessive problem behaviors put them at risk of the education disability category of emotional disturbance (ED). This article reports findings from three studies that address the reliability and validity of the EBS. Studies 1 and 2…
The Validity and Reliability of the Mobbing Scale (MS)
ERIC Educational Resources Information Center
Yaman, Erkan
2009-01-01
The aim of this research is to develop the Mobbing Scale and examine its validity and reliability. The sample of the study consisted of 515 persons from Sakarya and Bursa. In this study, construct validity, internal consistency, test-retest reliability, and item analysis of the scale were examined. As a result of factor analysis for construct…
Validity and Reliability of the Arabic Token Test for Children
ERIC Educational Resources Information Center
Alkhamra, Rana A.; Al-Jazi, Aya B.
2016-01-01
Background: The Token Test for Children (2nd edition) (TTFC) is a measure for assessing receptive language. In this study we describe the translation process, validity and reliability of the Arabic Token Test for Children (A-TTFC). Aims: The aim of this study is to translate, validate and establish the reliability of the Arabic Token Test for…
A Pilot Study Examining the Test-Retest and Internal Consistency Reliability of the ABLLS-R
ERIC Educational Resources Information Center
Partington, James W.; Bailey, Autumn; Partington, Scott W.
2018-01-01
The literature contains a variety of assessment tools for measuring the skills of individuals with autism or other developmental delays, but most lack adequate empirical evidence supporting their reliability and validity. The current pilot study sought to examine the reliability of scores obtained from the Assessment of Basic Language and Learning…
Koontz, Alicia M; Lin, Yen-Sheng; Kankipati, Padmaja; Boninger, Michael L; Cooper, Rory A
2011-01-01
This study describes a new custom measurement system designed to investigate the biomechanics of sitting-pivot wheelchair transfers and assesses the reliability of selected biomechanical variables. Variables assessed include horizontal and vertical reaction forces underneath both hands and three-dimensional trunk, shoulder, and elbow range of motion. We examined the reliability of these measures between 5 consecutive transfer trials for 5 subjects with spinal cord injury and 12 nondisabled subjects while they performed a self-selected sitting pivot transfer from a wheelchair to a level bench. A majority of the biomechanical variables demonstrated moderate to excellent reliability (r > 0.6). The transfer measurement system recorded reliable and valid biomechanical data for future studies of sitting-pivot wheelchair transfers.We recommend a minimum of five transfer trials to obtain a reliable measure of transfer technique for future studies.
Interformat reliability of digital psychiatric self-report questionnaires: a systematic review.
Alfonsson, Sven; Maathz, Pernilla; Hursti, Timo
2014-12-03
Research on Internet-based interventions typically use digital versions of pen and paper self-report symptom scales. However, adaptation into the digital format could affect the psychometric properties of established self-report scales. Several studies have investigated differences between digital and pen and paper versions of instruments, but no systematic review of the results has yet been done. This review aims to assess the interformat reliability of self-report symptom scales used in digital or online psychotherapy research. Three databases (MEDLINE, Embase, and PsycINFO) were systematically reviewed for studies investigating the reliability between digital and pen and paper versions of psychiatric symptom scales. From a total of 1504 publications, 33 were included in the review, and interformat reliability of 40 different symptom scales was assessed. Significant differences in mean total scores between formats were found in 10 of 62 analyses. These differences were found in just a few studies, which indicates that the results were due to study effects and sample effects rather than unreliable instruments. The interformat reliability ranged from r=.35 to r=.99; however, the majority of instruments showed a strong correlation between format scores. The quality of the included studies varied, and several studies had insufficient power to detect small differences between formats. When digital versions of self-report symptom scales are compared to pen and paper versions, most scales show high interformat reliability. This supports the reliability of results obtained in psychotherapy research on the Internet and the comparability of the results to traditional psychotherapy research. There are, however, some instruments that consistently show low interformat reliability, suggesting that these conclusions cannot be generalized to all questionnaires. Most studies had at least some methodological issues with insufficient statistical power being the most common issue. Future studies should preferably provide information about the transformation of the instrument into digital format and the procedure for data collection in more detail.
ERIC Educational Resources Information Center
Dorn, Lorah D.; Sontag-Padilla, Lisa M.; Pabst, Stephanie; Tissot, Abbigail; Susman, Elizabeth J.
2013-01-01
Age at menarche is critical in research and clinical settings, yet there is a dearth of studies examining its reliability in adolescents. We examined age at menarche during adolescence, specifically, (a) average method reliability across 3 years, (b) test-retest reliability between time points and methods, (c) intraindividual variability of…
ERIC Educational Resources Information Center
Vacha-Haase, Tammi; Kogan, Lori R.; Tani, Crystal R.; Woodall, Renee A.
2001-01-01
Used reliability generalization to explore the variance of scores on 10 Minnesota Multiphasic Personality Inventory (MMPI) clinical scales drawing on 1,972 articles in the literature on the MMPI. Results highlight the premise that scores, not tests, are reliable or unreliable, and they show that study characteristics do influence scores on the…
ERIC Educational Resources Information Center
Pantzare, Anna Lind
2015-01-01
In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examines if teachers' ratings of national tests in mathematics can be reliable without using monitoring, training, or other methods of external quality…
Psychometric Inferences from a Meta-Analysis of Reliability and Internal Consistency Coefficients
ERIC Educational Resources Information Center
Botella, Juan; Suero, Manuel; Gambara, Hilda
2010-01-01
A meta-analysis of the reliability of the scores from a specific test, also called reliability generalization, allows the quantitative synthesis of its properties from a set of studies. It is usually assumed that part of the variation in the reliability coefficients is due to some unknown and implicit mechanism that restricts and biases the…
ERIC Educational Resources Information Center
Jones, Corinne A.; Hoffman, Matthew R.; Geng, Zhixian; Abdelhalim, Suzan M.; Jiang, Jack J.; McCulloch, Timothy M.
2014-01-01
Purpose: The purpose of this study was to investigate inter- and intrarater reliability among expert users, novice users, and speech-language pathologists with a semiautomated high-resolution manometry analysis program. We hypothesized that all users would have high intrarater reliability and high interrater reliability. Method: Three expert…
Intra- and Interobserver Reliability of Three Classification Systems for Hallux Rigidus.
Dillard, Sarita; Schilero, Christina; Chiang, Sharon; Pham, Peter
2018-04-18
There are over ten classification systems currently used in the staging of hallux rigidus. This results in confusion and inconsistency with radiographic interpretation and treatment. The reliability of hallux rigidus classification systems has not yet been tested. The purpose of this study was to evaluate intra- and interobserver reliability using three commonly used classifications for hallux rigidus. Twenty-one plain radiograph sets were presented to ten ACFAS board-certified foot and ankle surgeons. Each physician classified each radiograph based on clinical experience and knowledge according to the Regnauld, Roukis, and Hattrup and Johnson classification systems. The two-way mixed single-measure consistency intraclass correlation was used to calculate intra- and interrater reliability. The intrarater reliability of individual sets for the Roukis and Hattrup and Johnson classification systems was "fair to good" (Roukis, 0.62±0.19; Hattrup and Johnson, 0.62±0.28), whereas the intrarater reliability of individual sets for the Regnauld system bordered between "fair to good" and "poor" (0.43±0.24). The interrater reliability of the mean classification was "excellent" for all three classification systems. Conclusions Reliable and reproducible classification systems are essential for treatment and prognostic implications in hallux rigidus. In our study, Roukis classification system had the best intrarater reliability. Although there are various classification systems for hallux rigidus, our results indicate that all three of these classification systems show reliability and reproducibility.
Impact of data source on travel time reliability assessment.
DOT National Transportation Integrated Search
2014-08-01
Travel time reliability measures are becoming an increasingly important input to the mobility and : congestion management studies. In the case of Maryland State Highway Administration, reliability : measures are key elements in the agencys Annual ...
Reliability studies of Integrated Modular Engine system designs
NASA Technical Reports Server (NTRS)
Hardy, Terry L.; Rapp, Douglas C.
1993-01-01
A study was performed to evaluate the reliability of Integrated Modular Engine (IME) concepts. Comparisons were made between networked IME systems and non-networked discrete systems using expander cycle configurations. Both redundant and non-redundant systems were analyzed. Binomial approximation and Markov analysis techniques were employed to evaluate total system reliability. In addition, Failure Modes and Effects Analyses (FMEA), Preliminary Hazard Analyses (PHA), and Fault Tree Analysis (FTA) were performed to allow detailed evaluation of the IME concept. A discussion of these system reliability concepts is also presented.
Reliability studies of integrated modular engine system designs
NASA Technical Reports Server (NTRS)
Hardy, Terry L.; Rapp, Douglas C.
1993-01-01
A study was performed to evaluate the reliability of Integrated Modular Engine (IME) concepts. Comparisons were made between networked IME systems and non-networked discrete systems using expander cycle configurations. Both redundant and non-redundant systems were analyzed. Binomial approximation and Markov analysis techniques were employed to evaluate total system reliability. In addition, Failure Modes and Effects Analyses (FMEA), Preliminary Hazard Analyses (PHA), and Fault Tree Analysis (FTA) were performed to allow detailed evaluation of the IME concept. A discussion of these system reliability concepts is also presented.
Reliability studies of integrated modular engine system designs
NASA Astrophysics Data System (ADS)
Hardy, Terry L.; Rapp, Douglas C.
1993-06-01
A study was performed to evaluate the reliability of Integrated Modular Engine (IME) concepts. Comparisons were made between networked IME systems and non-networked discrete systems using expander cycle configurations. Both redundant and non-redundant systems were analyzed. Binomial approximation and Markov analysis techniques were employed to evaluate total system reliability. In addition, Failure Modes and Effects Analyses (FMEA), Preliminary Hazard Analyses (PHA), and Fault Tree Analysis (FTA) were performed to allow detailed evaluation of the IME concept. A discussion of these system reliability concepts is also presented.
Reliability studies of Integrated Modular Engine system designs
NASA Astrophysics Data System (ADS)
Hardy, Terry L.; Rapp, Douglas C.
1993-06-01
A study was performed to evaluate the reliability of Integrated Modular Engine (IME) concepts. Comparisons were made between networked IME systems and non-networked discrete systems using expander cycle configurations. Both redundant and non-redundant systems were analyzed. Binomial approximation and Markov analysis techniques were employed to evaluate total system reliability. In addition, Failure Modes and Effects Analyses (FMEA), Preliminary Hazard Analyses (PHA), and Fault Tree Analysis (FTA) were performed to allow detailed evaluation of the IME concept. A discussion of these system reliability concepts is also presented.
Clinical assessment of scapular positioning in musicians: an intertester reliability study.
Struyf, Filip; Nijs, Jo; De Coninck, Kris; Giunta, Marco; Mottram, Sarah; Meeusen, Romain
2009-01-01
The reliability of the measurement of the distance between the posterior border of the acromion and the wall and the reliability of the modified lateral scapular slide test have not been studied. Overall, the reliability of the clinical tools used to assess scapular positioning has not been studied in musicians. To examine the intertester reliability of scapular observation and 2 clinical tests for the assessment of scapular positioning in musicians. Intertester reliability study. University research laboratory. Thirty healthy student musicians at a single university. Two assessors performed a standardized observation protocol, the measurement of the distance between the posterior border of the acromion and the wall, and the modified lateral scapular slide test. Each assessor was blinded to the other's findings. The intertester reliability coefficients (kappa) for the observation in relaxed position, during unloaded movement, and during loaded movement were 0.41, 0.63, and 0.36, respectively. The kappa values for the observation of tilting and winging at rest were 0.48 and 0.42, respectively; during unloaded movement, the kappa values were 0.52 and 0.78, respectively; and with a 1-kg load, the kappa values were 0.24 and 0.50, respectively. The intraclass correlation coefficient (ICC) of the measurement of the acromial distance was 0.72 in relaxed position and 0.75 with the participant actively retracting both shoulders. The ICCs for the modified lateral scapular slide test varied between 0.63 and 0.58. Our results demonstrated that the modified lateral scapular slide test was not a reliable tool to assess scapular positioning in these participants. Our data indicated that scapular observation in the relaxed position and during unloaded abduction in the frontal plane was a reliable assessment tool. The reliability of the measurement of the distance between the posterior border of the acromion and the wall in healthy musicians was moderate.
Clinical Assessment of Scapular Positioning in Musicians: An Intertester Reliability Study
Struyf, Filip; Nijs, Jo; De Coninck, Kris; Giunta, Marco; Mottram, Sarah; Meeusen, Romain
2009-01-01
Abstract Context: The reliability of the measurement of the distance between the posterior border of the acromion and the wall and the reliability of the modified lateral scapular slide test have not been studied. Overall, the reliability of the clinical tools used to assess scapular positioning has not been studied in musicians. Objective: To examine the intertester reliability of scapular observation and 2 clinical tests for the assessment of scapular positioning in musicians. Design: Intertester reliability study. Setting: University research laboratory. Patients or Other Participants: Thirty healthy student musicians at a single university. Main Outcome Measure(s): Two assessors performed a standardized observation protocol, the measurement of the distance between the posterior border of the acromion and the wall, and the modified lateral scapular slide test. Each assessor was blinded to the other's findings. Results: The intertester reliability coefficients (κ) for the observation in relaxed position, during unloaded movement, and during loaded movement were 0.41, 0.63, and 0.36, respectively. The κ values for the observation of tilting and winging at rest were 0.48 and 0.42, respectively; during unloaded movement, the κ values were 0.52 and 0.78, respectively; and with a 1-kg load, the κ values were 0.24 and 0.50, respectively. The intraclass correlation coefficient (ICC) of the measurement of the acromial distance was 0.72 in relaxed position and 0.75 with the participant actively retracting both shoulders. The ICCs for the modified lateral scapular slide test varied between 0.63 and 0.58. Conclusions: Our results demonstrated that the modified lateral scapular slide test was not a reliable tool to assess scapular positioning in these participants. Our data indicated that scapular observation in the relaxed position and during unloaded abduction in the frontal plane was a reliable assessment tool. The reliability of the measurement of the distance between the posterior border of the acromion and the wall in healthy musicians was moderate. PMID:19771291
Tabard-Fougère, Anne; Bonnefoy-Mazure, Alice; Hanquinet, Sylviane; Lascombes, Pierre; Armand, Stéphane; Dayer, Romain
2017-01-15
Test-retest study. This study aimed to evaluate the validity and reliability of rasterstereography in patients with adolescent idiopathic scoliosis (AIS) with a major curve Cobb angle (CA) between 10° and 40° for frontal, sagittal, and transverse parameters. Previous studies evaluating the validity and reliability of rasterstereography concluded that this technique had good accuracy compared with radiographs and a high intra- and interday reliability in healthy volunteers. To the best of our knowledge, the validity and reliability have not been assessed in AIS patients. Thirty-five adolescents with AIS (male = 13) aged 13.1 ± 2.0 years were included. To evaluate the validity of the scoliosis angle (SA) provided by rasterstereography, a comparison (t test, Pearson correlation) was performed with the CA obtained using 2D EOS® radiography (XR). Three rasterstereographic repeated measurements were independently performed by two operators on the same day (interrater reliability) and again by the first operator 1 week later (intrarater reliability). The variables of interest were the SA, lumbar lordosis, and thoracic kyphosis angle, trunk length, pelvic obliquity, and maximum, root mean square and amplitude of vertebral rotations. The data analyses used intraclass correlation coefficients (ICCs). The CA and SA were strongly correlated (R = 0.70) and were nonsignificantly different (P = 0.60). The intrarater reliability (same day: ICC [1, 1], n = 35; 1 week later: ICC [1, 3], n = 28) and interrater reliability (ICC [3, 3], n = 16) were globally excellent (ICC > 0.75) except for the assessment of pelvic obliquity. This study showed that the rasterstereographic system allows for the evaluation of AIS patients with a good validity compared with XR with an overall excellent intra- and interrater reliability. Based on these results, this automatic, fast, and noninvasive system can be used for monitoring the evolution of AIS in growing patients instead of repetitive radiographs, thereby reducing radiation exposure and decreasing costs. 4.
Inter-rater and intra-rater reliability of a movement control test in shoulder.
Rajasekar, S; Bangera, Rakshith K; Sekaran, Padmanaban
2017-07-01
Movement faults are commonly observed in patients with musculoskeletal pain. The Kinetic Medial Rotation Test (KMRT) is a movement control test used to identify movement faults of the scapula and gleno-humeral joints during arm movement. Objective tests such as the KMRT need to be reliable and valid for the results to be applied across different clinical settings and patient populations. The primary objective of the present study was to determine the intra-rater and inter-rater reliability of KMRT in subjects with and without shoulder pain. Sixty subjects were included in this study based on specific inclusion and exclusion criteria. Two musculoskeletal physiotherapists with different levels of clinical experience performed the tests. The intra-rater reliability was tested in twenty asymptomatic subjects by a single assessor at two week intervals. An equal number of subjects with and without shoulder pain were tested by both the assessors to determine the inter-rater reliability. Both components of the KMRT, the Gleno- Humeral Anterior Translation (GHAT) and the Scapular Forward Tilt (SCFT) were tested. The Kappa values for inter-rater reliability of the GHAT and SCFT were K = 0.68 & K = 0.65 respectively in subjects with shoulder pain. In asymptomatic subjects, the inter-rater reliability of GHAT was K = 0.61 and SCFT was K = 0.85. Intra-rater reliability ranged from K = 0.66 for GHAT to K = 0.87 for SCFT. Our study found substantial agreement in inter-rater reliability of KMRT in subjects with shoulder pain, whereas substantial to near perfect agreement was found in intra-rater and inter-rater reliability of KMRT in subjects without shoulder pain. Copyright © 2017 Elsevier Ltd. All rights reserved.
Reliability of risk-adjusted outcomes for profiling hospital surgical quality.
Krell, Robert W; Hozain, Ahmed; Kao, Lillian S; Dimick, Justin B
2014-05-01
Quality improvement platforms commonly use risk-adjusted morbidity and mortality to profile hospital performance. However, given small hospital caseloads and low event rates for some procedures, it is unclear whether these outcomes reliably reflect hospital performance. To determine the reliability of risk-adjusted morbidity and mortality for hospital performance profiling using clinical registry data. A retrospective cohort study was conducted using data from the American College of Surgeons National Surgical Quality Improvement Program, 2009. Participants included all patients (N = 55,466) who underwent colon resection, pancreatic resection, laparoscopic gastric bypass, ventral hernia repair, abdominal aortic aneurysm repair, and lower extremity bypass. Outcomes included risk-adjusted overall morbidity, severe morbidity, and mortality. We assessed reliability (0-1 scale: 0, completely unreliable; and 1, perfectly reliable) for all 3 outcomes. We also quantified the number of hospitals meeting minimum acceptable reliability thresholds (>0.70, good reliability; and >0.50, fair reliability) for each outcome. For overall morbidity, the most common outcome studied, the mean reliability depended on sample size (ie, how high the hospital caseload was) and the event rate (ie, how frequently the outcome occurred). For example, mean reliability for overall morbidity was low for abdominal aortic aneurysm repair (reliability, 0.29; sample size, 25 cases per year; and event rate, 18.3%). In contrast, mean reliability for overall morbidity was higher for colon resection (reliability, 0.61; sample size, 114 cases per year; and event rate, 26.8%). Colon resection (37.7% of hospitals), pancreatic resection (7.1% of hospitals), and laparoscopic gastric bypass (11.5% of hospitals) were the only procedures for which any hospitals met a reliability threshold of 0.70 for overall morbidity. Because severe morbidity and mortality are less frequent outcomes, their mean reliability was lower, and even fewer hospitals met the thresholds for minimum reliability. Most commonly reported outcome measures have low reliability for differentiating hospital performance. This is especially important for clinical registries that sample rather than collect 100% of cases, which can limit hospital case accrual. Eliminating sampling to achieve the highest possible caseloads, adjusting for reliability, and using advanced modeling strategies (eg, hierarchical modeling) are necessary for clinical registries to increase their benchmarking reliability.
Lee, Ji-Hyun; Cynn, Heon-Seock; Choi, Woo-Jeong; Jeong, Hyo-Jung; Yoon, Tae-Lim
2016-05-01
The objective of this study was to introduce levator scapulae (LS) measurement using a caliper and the levator scapulae index (LSI) and to investigate intra- and interrater reliability of the LSI in subjects with and without scapular downward rotation syndrome (SDRS). Two raters measured LS length twice in 38 subjects (19 with SDRS and 19 without SDRS). For reliability testing, intraclass correlation coefficients (ICCs), standard error of measurement (SEM), and minimal detectable change (MDC) were calculated. Intrarater reliability analysis resulted with ICCs ranging from 0.94 to 0.98 in subjects with SDRS and 0.96 to 0.98 in subjects without SDRS. These results represented that intrarater reliability in both groups were excellent for measuring LS length with the LSI. Interrater reliability was good (ICC: 0.82) in subjects with SDRS; however, interrater reliability was moderate (ICC: 0.75) in subjects without SDRS. Additionally, SEM and MDC were 0.13% and 0.36% in subjects with SDRS and 0.35% and 0.97% in subjects without SDRS. In subjects with SDRS, low dispersion of the measurement errors and MDC were shown. This study suggested that the LSI is a reliable method to measure LS length and is more reliable for subjects with SDRS. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Shi, Yu-Fang; Ma, Yi-Yi; Song, Ping-Ping
2018-03-01
System Reliability Theory is a research hotspot of management science and system engineering in recent years, and construction reliability is useful for quantitative evaluation of project management level. According to reliability theory and target system of engineering project management, the defination of construction reliability appears. Based on fuzzy mathematics theory and language operator, value space of construction reliability is divided into seven fuzzy subsets and correspondingly, seven membership function and fuzzy evaluation intervals are got with the operation of language operator, which provides the basis of corresponding method and parameter for the evaluation of construction reliability. This method is proved to be scientific and reasonable for construction condition and an useful attempt for theory and method research of engineering project system reliability.
Examination of Anomalous World Experience: A Report on Reliability.
Conerty, Joseph; Skodlar, Borut; Pienkos, Elizabeth; Zadravek, Tina; Byrom, Greg; Sass, Louis
2017-01-01
The EAWE (Examination of Anomalous World Experience) is a newly developed, semi-structured interview that aims to capture anomalies of subjectivity, common in schizophrenia spectrum disorders, that pertain to experiences of the lived world, including space, time, people, language, atmosphere, and certain existential attitudes. By contrast, previous empirical studies of subjective experience in schizophrenia have focused largely on disturbances in self-experience. To assess the reliability of the EAWE, including internal consistency and interrater reliability. In the course of developing the EAWE, two distinct studies were conducted, one in the United States and the other in Slovenia. Thirteen patients diagnosed with schizophrenia spectrum or mood disorders were recruited for the US study. Fifteen such patients were recruited for the Slovenian study. Two live interviewers conducted the EAWE in the US. The Slovenian interviews were completed by one live interviewer with a second rater reviewing audiorecordings of the interview. Internal consistency and interrater reliability were calculated independently for each study, utilizing Cronbach's α, Spearman's ρ, and Cohen's κ. Each study yielded high internal consistency (Cronbach's α >0.82) and high interrater reliability for total EAWE scores (ρ > 0.83; average κ values were at least 0.78 for each study, with EAWE domain-specific κ not lower than 0.73). The EAWE, containing world-oriented inquiries into anomalies in subjective experience, has adequate reliability for use in a clinical or research setting. © 2017 S. Karger AG, Basel.
Wong, M T P; Ho, T P; Ho, M Y; Yu, C S; Wong, Y H; Lee, S Y
2002-05-01
The Geriatric Depression Scale (GDS) is a common screening tool for elderly depression in Hong Kong. This study aimed at (1) developing a standardized manual for the verbal administration and scoring of the GDS-SF, and (2) comparing the inter-rater reliability between the standardized and non-standardized verbal administration of GDS-SF. Two studies were reported. In Study 1, the process of developing the manual was described. In Study 2, we compared the inter-rater reliabilities of GDS-SF scores using the standardized verbal instructions and the traditional non-standardized administration. Results of Study 2 indicated that the standardized procedure in verbal administration and scoring improved the inter-rater reliabilities of GDS-SF. Copyright 2002 John Wiley & Sons, Ltd.
Critically re-evaluating a common technique
Geisbush, Thomas; Jones, Lyell; Weiss, Michael; Mozaffar, Tahseen; Gronseth, Gary; Rutkove, Seward B.
2016-01-01
Objectives: (1) To assess the diagnostic accuracy of EMG in radiculopathy. (2) To evaluate the intrarater reliability and interrater reliability of EMG in radiculopathy. (3) To assess the presence of confirmation bias in EMG. Methods: Three experienced academic electromyographers interpreted 3 compact discs with 20 EMG videos (10 normal, 10 radiculopathy) in a blinded, standardized fashion without information regarding the nature of the study. The EMGs were interpreted 3 times (discs A, B, C) 1 month apart. Clinical information was provided only with disc C. Intrarater reliability was calculated by comparing interpretations in discs A and B, interrater reliability by comparing interpretation between reviewers. Confirmation bias was estimated by the difference in correct interpretations when clinical information was provided. Results: Sensitivity was similar to previous reports (77%, confidence interval [CI] 63%–90%); specificity was 71%, CI 56%–85%. Intrarater reliability was good (κ 0.61, 95% CI 0.41–0.81); interrater reliability was lower (κ 0.53, CI 0.35–0.71). There was no substantial confirmation bias when clinical information was provided (absolute difference in correct responses 2.2%, CI −13.3% to 17.7%); the study lacked precision to exclude moderate confirmation bias. Conclusions: This study supports that (1) serial EMG studies should be performed by the same electromyographer since intrarater reliability is better than interrater reliability; (2) knowledge of clinical information does not bias EMG interpretation substantially; (3) EMG has moderate diagnostic accuracy for radiculopathy with modest specificity and electromyographers should exercise caution interpreting mild abnormalities. Classification of evidence: This study provides Class III evidence that EMG has moderate diagnostic accuracy and specificity for radiculopathy. PMID:26701380
Muhamad, Zailani; Ramli, Ayiesah; Amat, Salleh
2015-05-01
The aim of this study was to determine the content validity, internal consistency, test-retest reliability and inter-rater reliability of the Clinical Competency Evaluation Instrument (CCEVI) in assessing the clinical performance of physiotherapy students. This study was carried out between June and September 2013 at University Kebangsaan Malaysia (UKM), Kuala Lumpur, Malaysia. A panel of 10 experts were identified to establish content validity by evaluating and rating each of the items used in the CCEVI with regards to their relevance in measuring students' clinical competency. A total of 50 UKM undergraduate physiotherapy students were assessed throughout their clinical placement to determine the construct validity of these items. The instrument's reliability was determined through a cross-sectional study involving a clinical performance assessment of 14 final-year undergraduate physiotherapy students. The content validity index of the entire CCEVI was 0.91, while the proportion of agreement on the content validity indices ranged from 0.83-1.00. The CCEVI construct validity was established with factor loading of ≥0.6, while internal consistency (Cronbach's alpha) overall was 0.97. Test-retest reliability of the CCEVI was confirmed with a Pearson's correlation range of 0.91-0.97 and an intraclass coefficient correlation range of 0.95-0.98. Inter-rater reliability of the CCEVI domains ranged from 0.59 to 0.97 on initial and subsequent assessments. This pilot study confirmed the content validity of the CCEVI. It showed high internal consistency, thereby providing evidence that the CCEVI has moderate to excellent inter-rater reliability. However, additional refinement in the wording of the CCEVI items, particularly in the domains of safety and documentation, is recommended to further improve the validity and reliability of the instrument.
NASA Astrophysics Data System (ADS)
Saini, K. K.; Sehgal, R. K.; Sethi, B. L.
2008-10-01
In this paper major reliability estimators are analyzed and there comparatively result are discussed. There strengths and weaknesses are evaluated in this case study. Each of the reliability estimators has certain advantages and disadvantages. Inter-rater reliability is one of the best ways to estimate reliability when your measure is an observation. However, it requires multiple raters or observers. As an alternative, you could look at the correlation of ratings of the same single observer repeated on two different occasions. Each of the reliability estimators will give a different value for reliability. In general, the test-retest and inter-rater reliability estimates will be lower in value than the parallel forms and internal consistency ones because they involve measuring at different times or with different raters. Since reliability estimates are often used in statistical analyses of quasi-experimental designs.
Relating design and environmental variables to reliability
NASA Astrophysics Data System (ADS)
Kolarik, William J.; Landers, Thomas L.
The combination of space application and nuclear power source demands high reliability hardware. The possibilities of failure, either an inability to provide power or a catastrophic accident, must be minimized. Nuclear power experiences on the ground have led to highly sophisticated probabilistic risk assessment procedures, most of which require quantitative information to adequately assess such risks. In the area of hardware risk analysis, reliability information plays a key role. One of the lessons learned from the Three Mile Island experience is that thorough analyses of critical components are essential. Nuclear grade equipment shows some reliability advantages over commercial. However, no statistically significant difference has been found. A recent study pertaining to spacecraft electronics reliability, examined some 2500 malfunctions on more than 300 aircraft. The study classified the equipment failures into seven general categories. Design deficiencies and lack of environmental protection accounted for about half of all failures. Within each class, limited reliability modeling was performed using a Weibull failure model.
DiCesare, Christopher A; Bates, Nathaniel A; Barber Foss, Kim D; Thomas, Staci M; Wordeman, Samuel C; Sugimoto, Dai; Roewer, Benjamin D; Medina McKeon, Jennifer M; Di Stasi, Stephanie; Noehren, Brian W; Ford, Kevin R; Kiefer, Adam W; Hewett, Timothy E; Myer, Gregory D
2015-12-01
Anterior cruciate ligament (ACL) injuries are physically and financially devastating but affect a relatively small percentage of the population. Prospective identification of risk factors for ACL injury necessitates a large sample size; therefore, study of this injury would benefit from a multicenter approach. To determine the reliability of kinematic and kinetic measures of a single-leg cross drop task across 3 institutions. Controlled laboratory study. Twenty-five female high school volleyball players participated in this study. Three-dimensional motion data of each participant performing the single-leg cross drop were collected at 3 institutions over a period of 4 weeks. Coefficients of multiple correlation were calculated to assess the reliability of kinematic and kinetic measures during the landing phase of the movement. Between-centers reliability for kinematic waveforms in the frontal and sagittal planes was good, but moderate in the transverse plane. Between-centers reliability for kinetic waveforms was good in the sagittal, frontal, and transverse planes. Based on these findings, the single-leg cross drop task has moderate to good reliability of kinematic and kinetic measures across institutions after implementation of a standardized testing protocol. Multicenter collaborations can increase study numbers and generalize results, which is beneficial for studies of relatively rare phenomena, such as ACL injury. An important step is to determine the reliability of risk assessments across institutions before a multicenter collaboration can be initiated.
The reliability and validity of ultrasound to quantify muscles in older adults: a systematic review
Scafoglieri, Aldo; Jager‐Wittenaar, Harriët; Hobbelen, Johannes S.M.; van der Schans, Cees P.
2017-01-01
Abstract This review evaluates the reliability and validity of ultrasound to quantify muscles in older adults. The databases PubMed, Cochrane, and Cumulative Index to Nursing and Allied Health Literature were systematically searched for studies. In 17 studies, the reliability (n = 13) and validity (n = 8) of ultrasound to quantify muscles in community‐dwelling older adults (≥60 years) or a clinical population were evaluated. Four out of 13 reliability studies investigated both intra‐rater and inter‐rater reliability. Intraclass correlation coefficient (ICC) scores for reliability ranged from −0.26 to 1.00. The highest ICC scores were found for the vastus lateralis, rectus femoris, upper arm anterior, and the trunk (ICC = 0.72 to 1.000). All included validity studies found ICC scores ranging from 0.92 to 0.999. Two studies describing the validity of ultrasound to predict lean body mass showed good validity as compared with dual‐energy X‐ray absorptiometry (r 2 = 0.92 to 0.96). This systematic review shows that ultrasound is a reliable and valid tool for the assessment of muscle size in older adults. More high‐quality research is required to confirm these findings in both clinical and healthy populations. Furthermore, ultrasound assessment of small muscles needs further evaluation. Ultrasound to predict lean body mass is feasible; however, future research is required to validate prediction equations in older adults with varying function and health. PMID:28703496
NDE reliability and probability of detection (POD) evolution and paradigm shift
NASA Astrophysics Data System (ADS)
Singh, Surendra
2014-02-01
The subject of NDE Reliability and POD has gone through multiple phases since its humble beginning in the late 1960s. This was followed by several programs including the important one nicknamed "Have Cracks - Will Travel" or in short "Have Cracks" by Lockheed Georgia Company for US Air Force during 1974-1978. This and other studies ultimately led to a series of developments in the field of reliability and POD starting from the introduction of fracture mechanics and Damaged Tolerant Design (DTD) to statistical framework by Bernes and Hovey in 1981 for POD estimation to MIL-STD HDBK 1823 (1999) and 1823A (2009). During the last decade, various groups and researchers have further studied the reliability and POD using Model Assisted POD (MAPOD), Simulation Assisted POD (SAPOD), and applying Bayesian Statistics. All and each of these developments had one objective, i.e., improving accuracy of life prediction in components that to a large extent depends on the reliability and capability of NDE methods. Therefore, it is essential to have a reliable detection and sizing of large flaws in components. Currently, POD is used for studying reliability and capability of NDE methods, though POD data offers no absolute truth regarding NDE reliability, i.e., system capability, effects of flaw morphology, and quantifying the human factors. Furthermore, reliability and POD have been reported alike in meaning but POD is not NDE reliability. POD is a subset of the reliability that consists of six phases: 1) samples selection using DOE, 2) NDE equipment setup and calibration, 3) System Measurement Evaluation (SME) including Gage Repeatability &Reproducibility (Gage R&R) and Analysis Of Variance (ANOVA), 4) NDE system capability and electronic and physical saturation, 5) acquiring and fitting data to a model, and data analysis, and 6) POD estimation. This paper provides an overview of all major POD milestones for the last several decades and discuss rationale for using Integrated Computational Materials Engineering (ICME), MAPOD, SAPOD, and Bayesian statistics for studying controllable and non-controllable variables including human factors for estimating POD. Another objective is to list gaps between "hoped for" versus validated or fielded failed hardware.
The Reliability and Validity of Big Five Inventory Scores with African American College Students
ERIC Educational Resources Information Center
Worrell, Frank C.; Cross, William E., Jr.
2004-01-01
This article describes a study that examined the reliability and validity of scores on the Big Five Inventory (BFI; O. P. John, E. M. Donahue, & R. L. Kentle, 1991) in a sample of 336 African American college students. Results from the study indicated moderate reliability and structural validity for BFI scores. Additionally, BFI subscales had few…
ERIC Educational Resources Information Center
Ebuoh, Casmir N.; Ezeudu, S. A.
2015-01-01
The study investigated the effects of scoring by section, use of independent scorers and conventional patterns on scorer reliability in Biology essay tests. It was revealed from literature review that conventional pattern of scoring all items at a time in essay tests had been criticized for not being reliable. The study was true experimental study…
Assessing Reliability of Two Versions of Vocabulary Levels Tests in Iranian Context
ERIC Educational Resources Information Center
Bayazidi, Aso; Saeb, Fateme
2017-01-01
This study examined the equivalence and reliability of the two versions of the Vocabulary Levels Test in an Iranian context. This study was motivated by the fact that the Vocabulary Levels test is increasingly being used in Iran for both research and pedagogical purposes without having been checked for validity and reliability in this context. The…
Reliability of anthropometric measurements in European preschool children: the ToyBox-study.
De Miguel-Etayo, P; Mesana, M I; Cardon, G; De Bourdeaudhuij, I; Góźdź, M; Socha, P; Lateva, M; Iotova, V; Koletzko, B V; Duvinage, K; Androutsos, O; Manios, Y; Moreno, L A
2014-08-01
The ToyBox-study aims to develop and test an innovative and evidence-based obesity prevention programme for preschoolers in six European countries: Belgium, Bulgaria, Germany, Greece, Poland and Spain. In multicentre studies, anthropometric measurements using standardized procedures that minimize errors in the data collection are essential to maximize reliability of measurements. The aim of this paper is to describe the standardization process and reliability (intra- and inter-observer) of height, weight and waist circumference (WC) measurements in preschoolers. All technical procedures and devices were standardized and centralized training was given to the fieldworkers. At least seven children per country participated in the intra- and inter-observer reliability testing. Intra-observer technical error ranged from 0.00 to 0.03 kg for weight and from 0.07 to 0.20 cm for height, with the overall reliability being above 99%. A second training was organized for WC due to low reliability observed in the first training. Intra-observer technical error for WC ranged from 0.12 to 0.71 cm during the first training and from 0.05 to 1.11 cm during the second training, and reliability above 92% was achieved. Epidemiological surveys need standardized procedures and training of researchers to reduce measurement error. In the ToyBox-study, very good intra- and-inter-observer agreement was achieved for all anthropometric measurements performed. © 2014 World Obesity.
Aerts, Frank; Carrier, Kathy; Alwood, Becky
2016-01-01
Background: The assessment of clinical manifestation of muscle fatigue is an effective procedure in establishing therapeutic exercise dose. Few studies have evaluated physical therapist reliability in establishing muscle fatigue through detection of changes in quality of movement patterns in a live setting. Objective: The purpose of this study is to evaluate the inter-rater reliability of physical therapists’ ability to detect altered movement patterns due to muscle fatigue. Design: A reliability study in a live setting with multiple raters. Participants: Forty-four healthy individuals (ages 19-35) were evaluated by six physical therapists in a live setting. Methods: Participants were evaluated by physical therapists for altered movement patterns during resisted shoulder rotation. Each participant completed a total of four tests: right shoulder internal rotation, right shoulder external rotation, left shoulder internal rotation and left shoulder external rotation. Results: For all tests combined, the inter-rater reliability for a single rater scoring ICC (2,1) was .65 (95%, .60, .71) This corresponds to moderate inter-rater reliability between physical therapists. Limitations: The results of this study apply only to healthy participants and therefore cannot be generalized to a symptomatic population. Conclusion: Moderate inter-rater reliability was found between physical therapists in establishing muscle fatigue through the observation of sustained altered movement patterns during dynamic resistive shoulder internal and external rotation. PMID:27347241
Poulos, Natalie S; Pasch, Keryn E
2015-07-01
Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8-229 per school). Overall inter-rater reliability of the developed tool ranged from 69-89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. Copyright © 2015 Elsevier Ltd. All rights reserved.
Poulos, Natalie S.; Pasch, Keryn E.
2015-01-01
Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8–229 per school). Overall inter-rater reliability of the developed tool ranged from 69–89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. PMID:26022774
NEPP DDR Device Reliability FY13 Report
NASA Technical Reports Server (NTRS)
Guertin, Steven M.; Armbar, Mehran
2014-01-01
This document reports the status of the NEPP Double Data Rate (DDR) Device Reliability effort for FY2013. The task targeted general reliability of > 100 DDR2 devices from Hynix, Samsung, and Micron. Detailed characterization of some devices when stressed by several data storage patterns was studied, targeting ability of the data cells to store the different data patterns without refresh, highlighting the weakest bits. DDR2, Reliability, Data Retention, Temperature Stress, Test System Evaluation, General Reliability, IDD measurements, electronic parts, parts testing, microcircuits
Su, T A; Hoe, V C W
2008-12-01
Validity and reliability of the information relating to hand-transmitted vibration exposure and vibration-related health outcome are very important for case finding in hand-arm vibration syndrome (HAVS) studies. In a local HAVS study among a group of construction workers in Kuala Lumpur, Malaysia, a questionnaire translated into Malay was created based on the Hand-transmitted Vibration Health Surveillance--Initial Questionnaire and Clinical Assessment, from Vibration Injury Network. This study was conducted to determine the reliability of standardised questions in the questionnaire used in the study. 15 subjects were selected randomly from the sampling frame of the HAVS study. Test-retest reliability was conducted on all items contained in parts 1-6 of the questionnaire and clinical assessment form, with an interval of 13-14 days between the first and second administration. Kappa coefficient and percentage agreement were calculated for all standardised questions. The kappa coefficient and percentage agreement for all standardised questions varied from -0.174 to 1.000 and 66.7 to 100.0 percent, respectively. The kappa coefficient for important questions related to current vibratory tool usage, tingling, numbness and hand grip weakness were 0.714, 0.432, -0.077 and -0.120, respectively, while the percentage agreement for current vibratory tool usage, finger colour change, tingling, numbness and hand grip weakness were 85.7 percent, 92.8 percent, 79.5 percent, 85.7 percent and 71.4 percent, respectively. Intra-rater reliability on the extent of vibration exposure was good, with the intra-class correlation coefficient (95 percent confidence interval) ranging from 0.786 (0.334-0.931) to 0.975 (0.923-0.992). Critical questions on vascular, neurological and musculoskeletal symptoms of HAVS were found to be reliable. The history on the extent of vibration exposure revealed good reliability when explored by the investigator alone. This questionnaire is considered reliable to be used in the study of HAVS among construction workers working in a construction site.
ERIC Educational Resources Information Center
Smith, Stacey L.; Vannest, Kimberly J.; Davis, John L.
2011-01-01
The reliability of data is a critical issue in decision-making for practitioners in the school. Percent Agreement and Cohen's kappa are the two most widely reported indices of inter-rater reliability, however, a recent Monte Carlo study on the reliability of multi-category scales found other indices to be more trustworthy given the type of data…
O'Connor, S; McCaffrey, N; Whyte, E; Moran, K
2016-07-01
To adapt the trunk stability test to facilitate further sub-classification of higher levels of core stability in athletes for use as a screening tool. To establish the inter-tester and intra-tester reliability of this adapted core stability test. Reliability study. Collegiate athletic therapy facilities. Fifteen physically active male subjects (19.46 ± 0.63) free from any orthopaedic or neurological disorders were recruited from a convenience sample of collegiate students. The intraclass correlation coefficients (ICC) and 95% Confidence Intervals (CI) were computed to establish inter-tester and intra-tester reliability. Excellent ICC values were observed in the adapted core stability test for inter-tester reliability (0.97) and good to excellent intra-tester reliability (0.73-0.90). While the 95% CI were narrow for inter-tester reliability, Tester A and C 95% CI's were widely distributed compared to Tester B. The adapted core stability test developed in this study is a quick and simple field based test to administer that can further subdivide athletes with high levels of core stability. The test demonstrated high inter-tester and intra-tester reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Keller, Lisa A; Clauser, Brian E; Swanson, David B
2010-12-01
In recent years, demand for performance assessments has continued to grow. However, performance assessments are notorious for lower reliability, and in particular, low reliability resulting from task specificity. Since reliability analyses typically treat the performance tasks as randomly sampled from an infinite universe of tasks, these estimates of reliability may not be accurate. For tests built according to a table of specifications, tasks are randomly sampled from different strata (content domains, skill areas, etc.). If these strata remain fixed in the test construction process, ignoring this stratification in the reliability analysis results in an underestimate of "parallel forms" reliability, and an overestimate of the person-by-task component. This research explores the effect of representing and misrepresenting the stratification appropriately in estimation of reliability and the standard error of measurement. Both multivariate and univariate generalizability studies are reported. Results indicate that the proper specification of the analytic design is essential in yielding the proper information both about the generalizability of the assessment and the standard error of measurement. Further, illustrative D studies present the effect under a variety of situations and test designs. Additional benefits of multivariate generalizability theory in test design and evaluation are also discussed.
Skinner, Ian W; Hübscher, Markus; Moseley, G Lorimer; Lee, Hopin; Wand, Benedict M; Traeger, Adrian C; Gustin, Sylvia M; McAuley, James H
2017-08-15
Eyetracking is commonly used to investigate attentional bias. Although some studies have investigated the internal consistency of eyetracking, data are scarce on the test-retest reliability and agreement of eyetracking to investigate attentional bias. This study reports the test-retest reliability, measurement error, and internal consistency of 12 commonly used outcome measures thought to reflect the different components of attentional bias: overall attention, early attention, and late attention. Healthy participants completed a preferential-looking eyetracking task that involved the presentation of threatening (sensory words, general threat words, and affective words) and nonthreatening words. We used intraclass correlation coefficients (ICCs) to measure test-retest reliability (ICC > .70 indicates adequate reliability). The ICCs(2, 1) ranged from -.31 to .71. Reliability varied according to the outcome measure and threat word category. Sensory words had a lower mean ICC (.08) than either affective words (.32) or general threat words (.29). A longer exposure time was associated with higher test-retest reliability. All of the outcome measures, except second-run dwell time, demonstrated low measurement error (<6%). Most of the outcome measures reported high internal consistency (α > .93). Recommendations are discussed for improving the reliability of eyetracking tasks in future research.
Objective measurements of excess skin in post bariatric patients--inter-rater reliability.
Biörserud, Christina; Fagevik Olsén, Monika; Elander, Anna; Wiklund, Malin
2016-01-01
An ability to reliably assess excess skin after massive weight loss using well-described and transferrable methods is important. The aim of this trial was to evaluate inter-rater reliability of ptosis and circumference measurements in patients with excess skin after bariatric surgery. Twenty-five postbariatric patients were included in the study, and their excess skin was measured 18 months after surgery. A protocol was designed to measure excess skin in a standardised way. To evaluate the inter-rater reliability in the measuring protocol, all patients were measured twice, by a specialist nurse and a specialist physiotherapist. All circumference measurements on different body parts had an ICC > 0.9, indicating high reliability. Furthermore, all breast and abdominal ptosis measurements had high reliability. In contrast, visual evaluation of abdominal ptosis had poor reliability. Measurements of ptoses on different body parts had an ICC > 0.6. There were no systematic differences between the results of the two testers, except for measurements of the buttocks and maximal knee circumference. The measuring protocol presented in this study has high reliability and, therefore, represents a useful instrument to provide a consistent and objective assessment of excess skin in the postbariatric patient.
Reliability of the Test of Integrated Language and Literacy Skills (TILLS).
Mailend, Marja-Liisa; Plante, Elena; Anderson, Michele A; Applegate, E Brooks; Nelson, Nickola W
2016-07-01
As new standardized tests become commercially available, it is critical that clinicians have access to the information about a test's psychometric properties, including aspects of reliability. The purpose of the three studies reported in this article was to investigate the reliability of a new test, the Test of Integrated Language and Literacy Skills (TILLS), with consideration of both internal and external sources of measurement error. The TILLS was administered to children aged 6;0-18;11 years. The participants varied in terms of their language and literacy skills and included children with typical language development as well as those diagnosed with language or learning disability. The sample of children also varied in terms of their racial and socioeconomic backgrounds. Study 1 (N = 1056) assessed the internal consistency of TILLS calculating the coefficient omega for each subtest. Study 2 (N = 103) and Study 3 (N = 39) used the intra-class correlation coefficients to report on test-retest and inter-rater reliability respectively. The results indicate strong internal consistency and inter-rater reliability for all subtests of TILLS. The test-retest reliability was strong for all but one subtest, for which the intra-class correlation coefficient was in the acceptable range. This article provides clinicians with essential scientific information that supports the internal and external reliability of a new test of oral and written language skills, the TILLS. Information about reliability is critical for guiding the selection of an appropriate diagnostic tool amongst a number of options. © 2016 Royal College of Speech and Language Therapists.
Venkatraman, Vijay K; Gonzalez, Christopher E.; Landman, Bennett; Goh, Joshua; Reiter, David A.; An, Yang; Resnick, Susan M.
2017-01-01
Diffusion tensor imaging (DTI) measures are commonly used as imaging markers to investigate individual differences in relation to behavioral and health-related characteristics. However, the ability to detect reliable associations in cross-sectional or longitudinal studies is limited by the reliability of the diffusion measures. Several studies have examined reliability of diffusion measures within (i.e. intra-site) and across (i.e. inter-site) scanners with mixed results. Our study compares the test-retest reliability of diffusion measures within and across scanners and field strengths in cognitively normal older adults with a follow-up interval less than 2.25 years. Intra-class correlation (ICC) and coefficient of variation (CoV) of fractional anisotropy (FA) and mean diffusivity (MD) were evaluated in sixteen white matter and twenty-six gray matter bilateral regions. The ICC for intra-site reliability (0.32 to 0.96 for FA and 0.18 to 0.95 for MD in white matter regions; 0.27 to 0.89 for MD and 0.03 to 0.79 for FA in gray matter regions) and inter-site reliability (0.28 to 0.95 for FA in white matter regions, 0.02 to 0.86 for MD in gray matter regions) with longer follow-up intervals were similar to earlier studies using shorter follow-up intervals. The reliability of across field strengths comparisons was lower than intra- and inter-site reliability. Within and across scanner comparisons showed that diffusion measures were more stable in larger white matter regions (> 1500 mm3). For gray matter regions, the MD measure showed stability in specific regions and was not dependent on region size. Linear correction factor estimated from cross-sectional or longitudinal data improved the reliability across field strengths. Our findings indicate that investigations relating diffusion measures to external variables must consider variable reliability across the distinct regions of interest and that correction factors can be used to improve consistency of measurement across field strengths. An important result of this work is that inter-scanner and field strength effects can be partially mitigated with linear correction factors specific to regions of interest. These data-driven linear correction techniques can be applied in cross-sectional or longitudinal studies. PMID:26146196
Validity and Reliability Study of the Korean Tinetti Mobility Test for Parkinson's Disease.
Park, Jinse; Koh, Seong-Beom; Kim, Hee Jin; Oh, Eungseok; Kim, Joong-Seok; Yun, Ji Young; Kwon, Do-Young; Kim, Younsoo; Kim, Ji Seon; Kwon, Kyum-Yil; Park, Jeong-Ho; Youn, Jinyoung; Jang, Wooyoung
2018-01-01
Postural instability and gait disturbance are the cardinal symptoms associated with falling among patients with Parkinson's disease (PD). The Tinetti mobility test (TMT) is a well-established measurement tool used to predict falls among elderly people. However, the TMT has not been established or widely used among PD patients in Korea. The purpose of this study was to evaluate the reliability and validity of the Korean version of the TMT for PD patients. Twenty-four patients diagnosed with PD were enrolled in this study. For the interrater reliability test, thirteen clinicians scored the TMT after watching a video clip. We also used the test-retest method to determine intrarater reliability. For concurrent validation, the unified Parkinson's disease rating scale, Hoehn and Yahr staging, Berg Balance Scale, Timed-Up and Go test, 10-m walk test, and gait analysis by three-dimensional motion capture were also used. We analyzed receiver operating characteristic curve to predict falling. The interrater reliability and intrarater reliability of the Korean Tinetti balance scale were 0.97 and 0.98, respectively. The interrater reliability and intra-rater reliability of the Korean Tinetti gait scale were 0.94 and 0.96, respectively. The Korean TMT scores were significantly correlated with the other clinical scales and three-dimensional motion capture. The cutoff values for predicting falling were 14 points (balance subscale) and 10 points (gait subscale). We found that the Korean version of the TMT showed excellent validity and reliability for gait and balance and had high sensitivity and specificity for predicting falls among patients with PD.
Koontz, Alicia M.; Lin, Yen-Sheng; Kankipati, Padmaja; Boninger, Michael L.; Cooper, Rory A.
2017-01-01
This study describes a new custom measurement system designed to investigate the biomechanics of sitting-pivot wheelchair transfers and assesses the reliability of selected biomechanical variables. Variables assessed include horizontal and vertical reaction forces underneath both hands and three-dimensional trunk, shoulder, and elbow range of motion. We examined the reliability of these measures between 5 consecutive transfer trials for 5 subjects with spinal cord injury and 12 non-disabled subjects while they performed a self-selected sitting pivot transfer from a wheelchair to a level bench. A majority of the biomechanical variables demonstrated moderate to excellent reliability (r > 0.6). The transfer measurement system recorded reliable and valid biomechanical data for future studies of sitting-pivot wheelchair transfers. We recommend a minimum of five transfer trials to obtain a reliable measure of transfer technique for future studies. PMID:22068376
The effect of Web-based Braden Scale training on the reliability of Braden subscale ratings.
Magnan, Morris A; Maklebust, JoAnn
2009-01-01
The primary purpose of this study was to evaluate the effect of Web-based Braden Scale training on the reliability of Braden Scale subscale ratings made by nurses working in acute care hospitals. A secondary purpose was to describe the distribution of reliable Braden subscale ratings before and after Web-based Braden Scale training. Secondary analysis of data from a recently completed quasi-experimental, pretest-posttest, interrater reliability study. A convenience sample of RNs working at 3 Michigan medical centers voluntarily participated in the study. RN participants included nurses who used the Braden Scale regularly at their place of employment ("regular users") as well as nurses who did not use the Braden Scale at their place of employment ("new users"). Using a pretest-posttest, quasi-experimental design, pretest interrater reliability data were collected to identify the percentage of nurses making reliable Braden subscale assessments. Nurses then completed a Web-based Braden Scale training module after which posttest interrater reliability data were collected. The reliability of nurses' Braden subscale ratings was determined by examining the level of agreement/disagreement between ratings made by an RN and an "expert" rating the same patient. In total, 381 RN-to-expert dyads were available for analysis. During both the pretest and posttest periods, the percentage of reliable subscale ratings was highest for the activity subscale, lowest for the moisture subscale, and second lowest for the nutrition subscale. With Web-based Braden Scale training, the percentage of reliable Braden subscale ratings made by new users increased for all 6 subscales with statistically significant improvements in the percentage of reliable assessments made on 3 subscales: sensory-perception, moisture, and mobility. Training had virtually no effect on the percentage of reliable subscale ratings made by regular users of the Braden Scale. With Web-based Braden Scale training the percentage of nurses making reliable ratings increased for all 6 subscales, but this was true for new users only. Additional research is needed to identify educational approaches that effectively improve and sustain the reliability of subscale ratings among regular users of the Braden Scale. Moreover, special attention needs to be given to ensuring that all nurses working with the Braden Scale have a clear understanding of the intended meanings and correct approaches to rating moisture and nutrition subscales.
Reliability Stress-Strength Models for Dependent Observations with Applications in Clinical Trials
NASA Technical Reports Server (NTRS)
Kushary, Debashis; Kulkarni, Pandurang M.
1995-01-01
We consider the applications of stress-strength models in studies involving clinical trials. When studying the effects and side effects of certain procedures (treatments), it is often the case that observations are correlated due to subject effect, repeated measurements and observing many characteristics simultaneously. We develop maximum likelihood estimator (MLE) and uniform minimum variance unbiased estimator (UMVUE) of the reliability which in clinical trial studies could be considered as the chances of increased side effects due to a particular procedure compared to another. The results developed apply to both univariate and multivariate situations. Also, for the univariate situations we develop simple to use lower confidence bounds for the reliability. Further, we consider the cases when both stress and strength constitute time dependent processes. We define the future reliability and obtain methods of constructing lower confidence bounds for this reliability. Finally, we conduct simulation studies to evaluate all the procedures developed and also to compare the MLE and the UMVUE.
The Bahasa Melayu version of the Nursing Stress Scale among nurses: a reliability study in Malaysia.
Rosnawati, Muhamad Robat; Moe, Htay; Masilamani, Retneswari; Darus, A
2010-10-01
The Nursing Stress Scale (NSS) has been shown to be a valid and reliable instrument to assess occupational stressors among nurses. The NSS, which was previously used in the English version, was translated and back-translated into Bahasa Melayu. This study was conducted to assess the reliability of the Bahasa Melayu version of the NSS among nurses for future studies in this country. The reliability of the NSS was assessed after its readministration to 30 nurses with a 2-week interval. The Spearman coefficient was calculated to assess its stability. The internal consistency was measured through 4 measures: Cronbach's α, Spearman-Brown, Guttman split-half, and standardized item α coefficients. The total response rate was 70%. Test-retest reliability showed remarkable stability (Spearman's ρ exceeded .70). All 4 measures of internal consistency among items indicated a satisfactory level (coefficients in the range of .68 to .87). In conclusion, the Bahasa Melayu version of the NSS is a reliable and useful instrument for measuring the possible stressors at the workplace among nurses.
Reliability and Probabilistic Risk Assessment - How They Play Together
NASA Technical Reports Server (NTRS)
Safie, Fayssal M.; Stutts, Richard G.; Zhaofeng, Huang
2015-01-01
PRA methodology is one of the probabilistic analysis methods that NASA brought from the nuclear industry to assess the risk of LOM, LOV and LOC for launch vehicles. PRA is a system scenario based risk assessment that uses a combination of fault trees, event trees, event sequence diagrams, and probability and statistical data to analyze the risk of a system, a process, or an activity. It is a process designed to answer three basic questions: What can go wrong? How likely is it? What is the severity of the degradation? Since 1986, NASA, along with industry partners, has conducted a number of PRA studies to predict the overall launch vehicles risks. Planning Research Corporation conducted the first of these studies in 1988. In 1995, Science Applications International Corporation (SAIC) conducted a comprehensive PRA study. In July 1996, NASA conducted a two-year study (October 1996 - September 1998) to develop a model that provided the overall Space Shuttle risk and estimates of risk changes due to proposed Space Shuttle upgrades. After the Columbia accident, NASA conducted a PRA on the Shuttle External Tank (ET) foam. This study was the most focused and extensive risk assessment that NASA has conducted in recent years. It used a dynamic, physics-based, integrated system analysis approach to understand the integrated system risk due to ET foam loss in flight. Most recently, a PRA for Ares I launch vehicle has been performed in support of the Constellation program. Reliability, on the other hand, addresses the loss of functions. In a broader sense, reliability engineering is a discipline that involves the application of engineering principles to the design and processing of products, both hardware and software, for meeting product reliability requirements or goals. It is a very broad design-support discipline. It has important interfaces with many other engineering disciplines. Reliability as a figure of merit (i.e. the metric) is the probability that an item will perform its intended function(s) for a specified mission profile. In general, the reliability metric can be calculated through the analyses using reliability demonstration and reliability prediction methodologies. Reliability analysis is very critical for understanding component failure mechanisms and in identifying reliability critical design and process drivers. The following sections discuss the PRA process and reliability engineering in detail and provide an application where reliability analysis and PRA were jointly used in a complementary manner to support a Space Shuttle flight risk assessment.
Gaur, Sonia; Harmon, Stephanie; Gupta, Rajan T; Margolis, Daniel J; Lay, Nathan; Mehralivand, Sherif; Merino, Maria J; Wood, Bradford J; Pinto, Peter A; Shih, Joanna H; Choyke, Peter L; Turkbey, Baris
2018-04-25
To determine independent contribution of each prostate multiparametric magnetic resonance imaging (mpMRI) sequence to cancer detection when read in isolation. Prostate mpMRI at 3-Tesla with endorectal coil from 45 patients (n = 30 prostatectomy cases, n = 15 controls with negative magnetic resonance imaging [MRI] or biopsy) were retrospectively interpreted. Sequences (T2-weighted [T2W] MRI, diffusion-weighted imaging [DWI], and dynamic contrast-enhanced [DCE] MRI; N = 135) were separately distributed to three radiologists at different institutions. Readers evaluated each sequence blinded to other mpMRI sequences. Findings were correlated to whole-mount pathology. Cancer detection sensitivity, positive predictive value for whole prostate (WP), transition zone, and peripheral zone were evaluated per sequence by reader, with reader concordance measured by index of specific agreement. Cancer detection rates (CDRs) were calculated for combinations of independently read sequences. 44 patients were evaluable (cases median prostate-specific antigen 6.83 [ range 1.95-51.13] ng/mL, age 62 [45-71] years; controls prostate-specific antigen 6.85 [2.4-10.87] ng/mL, age 65.5 [47-71] years). Readers had highest sensitivity on DWI (59%) vs T2W MRI (48%) and DCE (23%) in WP. DWI-only positivity (DWI+/T2W-/DCE-) achieved highest CDR in WP (38%), compared to T2W-only (CDR 24%) and DCE-only (CDR 8%). DWI+/T2W+/DCE- achieved CDR 80%, an added benefit of 56.4% from T2W-only and of 42% from DWI-only (P < .0001). All three sequences interpreted independently positive gave highest CDR of 90%. Reader agreement was moderate (index of specific agreement: T2W = 54%, DWI = 58%, DCE = 33%). When prostate mpMRI sequences are interpreted independently by multiple observers, DWI achieves highest sensitivity and CDR in transition zone and peripheral zone. T2W and DCE MRI both add value to detection; mpMRI achieves highest detection sensitivity when all three mpMRI sequences are positive. Published by Elsevier Inc.
Seo, Hyun-Ju; Kim, Soo Young; Lee, Yoon Jae; Jang, Bo-Hyoung; Park, Ji-Eun; Sheen, Seung-Soo; Hahn, Seo Kyung
2016-02-01
To develop a study Design Algorithm for Medical Literature on Intervention (DAMI) and test its interrater reliability, construct validity, and ease of use. We developed and then revised the DAMI to include detailed instructions. To test the DAMI's reliability, we used a purposive sample of 134 primary, mainly nonrandomized studies. We then compared the study designs as classified by the original authors and through the DAMI. Unweighted kappa statistics were computed to test interrater reliability and construct validity based on the level of agreement between the original and DAMI classifications. Assessment time was also recorded to evaluate ease of use. The DAMI includes 13 study designs, including experimental and observational studies of interventions and exposure. Both the interrater reliability (unweighted kappa = 0.67; 95% CI [0.64-0.75]) and construct validity (unweighted kappa = 0.63, 95% CI [0.52-0.67]) were substantial. Mean classification time using the DAMI was 4.08 ± 2.44 minutes (range, 0.51-10.92). The DAMI showed substantial interrater reliability and construct validity. Furthermore, given its ease of use, it could be used to accurately classify medical literature for systematic reviews of interventions although minimizing disagreement between authors of such reviews. Copyright © 2016 Elsevier Inc. All rights reserved.
An In vitro evaluation of the reliability of QR code denture labeling technique.
Poovannan, Sindhu; Jain, Ashish R; Krishnan, Cakku Jalliah Venkata; Chandran, Chitraa R
2016-01-01
Positive identification of the dead after accidents and disasters through labeled dentures plays a key role in forensic scenario. A number of denture labeling methods are available, and studies evaluating their reliability under drastic conditions are vital. This study was conducted to evaluate the reliability of QR (Quick Response) Code labeled at various depths in heat-cured acrylic blocks after acid treatment, heat treatment (burns), and fracture in forensics. It was an in vitro study. This study included 160 specimens of heat-cured acrylic blocks (1.8 cm × 1.8 cm) and these were divided into 4 groups (40 samples per group). QR Codes were incorporated in the samples using clear acrylic sheet and they were assessed for reliability under various depths, acid, heat, and fracture. Data were analyzed using Chi-square test, test of proportion. The QR Code inclusion technique was reliable under various depths of acrylic sheet, acid (sulfuric acid 99%, hydrochloric acid 40%) and heat (up to 370°C). Results were variable with fracture of QR Code labeled acrylic blocks. Within the limitations of the study, by analyzing the results, it was clearly indicated that the QR Code technique was reliable under various depths of acrylic sheet, acid, and heat (370°C). Effectiveness varied in fracture and depended on the level of distortion. This study thus suggests that QR Code is an effective and simpler denture labeling method.
Relevance and reliability of experimental data in human health risk assessment of pesticides.
Kaltenhäuser, Johanna; Kneuer, Carsten; Marx-Stoelting, Philip; Niemann, Lars; Schubert, Jens; Stein, Bernd; Solecki, Roland
2017-08-01
Evaluation of data relevance, reliability and contribution to uncertainty is crucial in regulatory health risk assessment if robust conclusions are to be drawn. Whether a specific study is used as key study, as additional information or not accepted depends in part on the criteria according to which its relevance and reliability are judged. In addition to GLP-compliant regulatory studies following OECD Test Guidelines, data from peer-reviewed scientific literature have to be evaluated in regulatory risk assessment of pesticide active substances. Publications should be taken into account if they are of acceptable relevance and reliability. Their contribution to the overall weight of evidence is influenced by factors including test organism, study design and statistical methods, as well as test item identification, documentation and reporting of results. Various reports make recommendations for improving the quality of risk assessments and different criteria catalogues have been published to support evaluation of data relevance and reliability. Their intention was to guide transparent decision making on the integration of the respective information into the regulatory process. This article describes an approach to assess the relevance and reliability of experimental data from guideline-compliant studies as well as from non-guideline studies published in the scientific literature in the specific context of uncertainty and risk assessment of pesticides. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Stroke and aphasia quality-of-life scale-39: Reliability and validity of the Turkish version.
Noyan-ErbaŞ, AyŞin; Toğram, Bülent
2016-10-01
The aim of this study was to adapt the stroke and aphasia quality-of-life scale-39 (SAQoL-39) to the Turkish language and carry out a reliability and validity study of the instrument in a group of patients with aphasia. The study was a descriptive study and contained three phases: adaptation of the SAQoL-39 to the Turkish language, administration of the scale to 30 aphasia patients and reliability and validity studies of the scale. Internal consistency was assessed with Cronbach's alpha and test-re-test reliability was explored (n = 14). The adaptation process was completed based on inter-rater agreement on the translated items and within the scope of final editing by the authors of the study. The SAQoL-39 in Turkish exhibited high test-re-test reliability (ICC =0.97) as well as acceptability with minimal missing data (0-1.4). This instrument exhibited high internal consistency (Cronbach's α = 0.70-0.97), domain-total correlations (r = 0.76-0.85) and inter-domain correlations (r = 0.40-0.68). The analysis shows that the Turkish version of SAQoL-39 is a scale that is highly acceptable, valid and reliable and can be easily used in evaluating the quality-of-life of Turkish people with aphasia.
Quinn, Amity E; Rosen, Rochelle K; McGeary, John E; Amoa, Francine; Kranzler, Henry R; Francazio, Sarah; McGarvey, Stephen T; Swift, Robert M
2014-01-01
The aims of this study were to develop a bilingual version of the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA) in English and Samoan and determine the reliability of assessments of alcohol dependence in American Samoa. The study consisted of development and reliability-testing phases. In the development phase, the SSADDA alcohol module was translated and the translation was evaluated through cognitive interviews. In the reliability-testing phase, the bilingual SSADDA was administered to 40 ethnic Samoans, including a sub-sample of 26 individuals who were retested. Cognitive interviews indicated the initial translation was culturally and linguistically appropriate except items pertaining to alcohol tolerance, which were modified to reflect Samoan concepts. SSADDA reliability testing indicated diagnoses of DSM-III-R and DSM-IV alcohol dependence were reliable. Reliability varied by language of administration. The English/Samoan version of the SSADDA is appropriate for the diagnosis of DSM-III-R alcohol dependence, which may be useful in advancing research and public health efforts to address alcohol problems in American Samoa and the Western Pacific. The translation methods may inform researchers translating diagnostic and assessment tools into different languages and cultures. © The Author 2014. Medical Council on Alcohol and Oxford University Press. All rights reserved.
Multidisciplinary System Reliability Analysis
NASA Technical Reports Server (NTRS)
Mahadevan, Sankaran; Han, Song; Chamis, Christos C. (Technical Monitor)
2001-01-01
The objective of this study is to develop a new methodology for estimating the reliability of engineering systems that encompass multiple disciplines. The methodology is formulated in the context of the NESSUS probabilistic structural analysis code, developed under the leadership of NASA Glenn Research Center. The NESSUS code has been successfully applied to the reliability estimation of a variety of structural engineering systems. This study examines whether the features of NESSUS could be used to investigate the reliability of systems in other disciplines such as heat transfer, fluid mechanics, electrical circuits etc., without considerable programming effort specific to each discipline. In this study, the mechanical equivalence between system behavior models in different disciplines are investigated to achieve this objective. A new methodology is presented for the analysis of heat transfer, fluid flow, and electrical circuit problems using the structural analysis routines within NESSUS, by utilizing the equivalence between the computational quantities in different disciplines. This technique is integrated with the fast probability integration and system reliability techniques within the NESSUS code, to successfully compute the system reliability of multidisciplinary systems. Traditional as well as progressive failure analysis methods for system reliability estimation are demonstrated, through a numerical example of a heat exchanger system involving failure modes in structural, heat transfer and fluid flow disciplines.
Alberta infant motor scale: reliability and validity when used on preterm infants in Taiwan.
Jeng, S F; Yau, K I; Chen, L C; Hsiao, S F
2000-02-01
The goal of this study was to examine the reliability and validity of measurements obtained with the Alberta Infant Motor Scale (AIMS) for evaluation of preterm infants in Taiwan. Two independent groups of preterm infants were used to investigate the reliability (n=45) and validity (n=41) for the AIMS. In the reliability study, the AIMS was administered to the infants by a physical therapist, and infant performance was videotaped. The performance was then rescored by the same therapist and by 2 other therapists to examine the intrarater and interrater reliability. In the validity study, the AIMS and the Bayley Motor Scale were administered to the infants at 6 and 12 months of age to examine criterion-related validity. Intraclass correlation coefficients (ICCs) for intrarater and interrater reliability of measurements obtained with the AIMS were high (ICC=.97-.99). The AIMS scores correlated with the Bayley Motor Scale scores at 6 and 12 months (r=.78 and.90), although the AIMS scores at 6 months were only moderately predictive of the motor function at 12 months (r=.56). The results suggest that measurements obtained with the AIMS have acceptable reliability and concurrent validity but limited predictive value for evaluating preterm Taiwanese infants.
Multi-Disciplinary System Reliability Analysis
NASA Technical Reports Server (NTRS)
Mahadevan, Sankaran; Han, Song
1997-01-01
The objective of this study is to develop a new methodology for estimating the reliability of engineering systems that encompass multiple disciplines. The methodology is formulated in the context of the NESSUS probabilistic structural analysis code developed under the leadership of NASA Lewis Research Center. The NESSUS code has been successfully applied to the reliability estimation of a variety of structural engineering systems. This study examines whether the features of NESSUS could be used to investigate the reliability of systems in other disciplines such as heat transfer, fluid mechanics, electrical circuits etc., without considerable programming effort specific to each discipline. In this study, the mechanical equivalence between system behavior models in different disciplines are investigated to achieve this objective. A new methodology is presented for the analysis of heat transfer, fluid flow, and electrical circuit problems using the structural analysis routines within NESSUS, by utilizing the equivalence between the computational quantities in different disciplines. This technique is integrated with the fast probability integration and system reliability techniques within the NESSUS code, to successfully compute the system reliability of multi-disciplinary systems. Traditional as well as progressive failure analysis methods for system reliability estimation are demonstrated, through a numerical example of a heat exchanger system involving failure modes in structural, heat transfer and fluid flow disciplines.
Porter, Anna K; Wen, Fang; Herring, Amy H; Rodríguez, Daniel A; Messer, Lynne C; Laraia, Barbara A; Evenson, Kelly R
2018-06-01
Reliable and stable environmental audit instruments are needed to successfully identify the physical and social attributes that may influence physical activity. This study described the reliability and stability of the PIN3 environmental audit instrument in both urban and rural neighborhoods. Four randomly sampled road segments in and around a one-quarter mile buffer of participants' residences from the Pregnancy, Infection, and Nutrition (PIN3) study were rated twice, approximately 2 weeks apart. One year later, 253 of the year 1 sampled roads were re-audited. The instrument included 43 measures that resulted in 73 item scores for calculation of percent overall agreement, kappa statistics, and log-linear models. For same-day reliability, 81% of items had moderate to outstanding kappa statistics (kappas ≥ 0.4). Two-week reliability was slightly lower, with 77% of items having moderate to outstanding agreement using kappa statistics. One-year stability had 68% of items showing moderate to outstanding agreement using kappa statistics. The reliability of the audit measures was largely consistent when comparing urban to rural locations, with only 8% of items exhibiting significant differences (α < 0.05) by urbanicity. The PIN3 instrument is a reliable and stable audit tool for studies assessing neighborhood attributes in urban and rural environments.
Test-retest and interrater reliability of the functional lower extremity evaluation.
Haitz, Karyn; Shultz, Rebecca; Hodgins, Melissa; Matheson, Gordon O
2014-12-01
Repeated-measures clinical measurement reliability study. To establish the reliability and face validity of the Functional Lower Extremity Evaluation (FLEE). The FLEE is a 45-minute battery of 8 standardized functional performance tests that measures 3 components of lower extremity function: control, power, and endurance. The reliability and normative values for the FLEE in healthy athletes are unknown. A face validity survey for the FLEE was sent to sports medicine personnel to evaluate the level of importance and frequency of clinical usage of each test included in the FLEE. The FLEE was then administered and rated for 40 uninjured athletes. To assess test-retest reliability, each athlete was tested twice, 1 week apart, by the same rater. To assess interrater reliability, 3 raters scored each athlete during 1 of the testing sessions. Intraclass correlation coefficients were used to assess the test-retest and interrater reliability of each of the FLEE tests. In the face validity survey, the FLEE tests were rated as highly important by 58% to 71% of respondents but frequently used by only 26% to 45% of respondents. Interrater reliability intraclass correlation coefficients ranged from 0.83 to 1.00, and test-retest reliability ranged from 0.71 to 0.95. The FLEE tests are considered clinically important for assessing lower extremity function by sports medicine personnel but are underused. The FLEE also is a reliable assessment tool. Future studies are required to determine if use of the FLEE to make return-to-play decisions may reduce reinjury rates.
Large-scale-system effectiveness analysis. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patton, A.D.; Ayoub, A.K.; Foster, J.W.
1979-11-01
Objective of the research project has been the investigation and development of methods for calculating system reliability indices which have absolute, and measurable, significance to consumers. Such indices are a necessary prerequisite to any scheme for system optimization which includes the economic consequences of consumer service interruptions. A further area of investigation has been joint consideration of generation and transmission in reliability studies. Methods for finding or estimating the probability distributions of some measures of reliability performance have been developed. The application of modern Monte Carlo simulation methods to compute reliability indices in generating systems has been studied.
NASA Astrophysics Data System (ADS)
Gilmanshin, I. R.; Kirpichnikov, A. P.
2017-09-01
In the result of study of the algorithm of the functioning of the early detection module of excessive losses, it is proven the ability to model it by using absorbing Markov chains. The particular interest is in the study of probability characteristics of early detection module functioning algorithm of losses in order to identify the relationship of indicators of reliability of individual elements, or the probability of occurrence of certain events and the likelihood of transmission of reliable information. The identified relations during the analysis allow to set thresholds reliability characteristics of the system components.
A study of the longevity and operational reliability of Goddard Spacecraft, 1960-1980
NASA Technical Reports Server (NTRS)
Shockey, E. F.
1981-01-01
Compiled data regarding the design lives and lifetimes actually achieved by 104 orbiting satellites launched by the Goddard Spaceflight Center between the years 1960 and 1980 is analyzed. Historical trends over the entire 21 year period are reviewed, and the more recent data is subjected to an examination of several key parameters. An empirical reliability function is derived, and compared with various mathematical models. Data from related studies is also discussed. The results provide insight into the reliability history of Goddard spacecraft an guidance for estimating the reliability of future programs.
The Nordic concept of reactive psychosis--a multicenter reliability study.
Hansen, H; Dahl, A A; Bertelsen, A; Birket-Smith, M; von Knorring, L; Ottosson, J O; Pakaslahti, A; Retterstøl, N; Salvesen, C; Thorsteinsson, G
1992-07-01
Reactive psychosis is a common diagnosis in the Nordic countries (Norway, Sweden, Denmark, Finland and Iceland) and in several other parts of the world. In ICD-9 and DSM-III-R, the concept is defined more narrowly than in the Nordic tradition. In this study we examined the interrater reliability of the Nordic concept by the case-summary method between clinicians from 9 university departments in the Nordic countries. The results show that Nordic psychiatrists have a reasonably reliable concept of reactive psychosis, and that this psychosis can be diagnosed as reliably as schizophrenia and affective psychosis.
The reliability of in-training assessment when performance improvement is taken into account.
van Lohuizen, Mirjam T; Kuks, Jan B M; van Hell, Elisabeth A; Raat, A N; Stewart, Roy E; Cohen-Schotanus, Janke
2010-12-01
During in-training assessment students are frequently assessed over a longer period of time and therefore it can be expected that their performance will improve. We studied whether there really is a measurable performance improvement when students are assessed over an extended period of time and how this improvement affects the reliability of the overall judgement. In-training assessment results were obtained from 104 students on rotation at our university hospital or at one of the six affiliated hospitals. Generalisability theory was used in combination with multilevel analysis to obtain reliability coefficients and to estimate the number of assessments needed for reliable overall judgement, both including and excluding performance improvement. Students' clinical performance ratings improved significantly from a mean of 7.6 at the start to a mean of 7.8 at the end of their clerkship. When taking performance improvement into account, reliability coefficients were higher. The number of assessments needed to achieve a reliability of 0.80 or higher decreased from 17 to 11. Therefore, when studying reliability of in-training assessment, performance improvement should be considered.
NASA Technical Reports Server (NTRS)
Wilson, Larry W.
1989-01-01
The longterm goal of this research is to identify or create a model for use in analyzing the reliability of flight control software. The immediate tasks addressed are the creation of data useful to the study of software reliability and production of results pertinent to software reliability through the analysis of existing reliability models and data. The completed data creation portion of this research consists of a Generic Checkout System (GCS) design document created in cooperation with NASA and Research Triangle Institute (RTI) experimenters. This will lead to design and code reviews with the resulting product being one of the versions used in the Terminal Descent Experiment being conducted by the Systems Validations Methods Branch (SVMB) of NASA/Langley. An appended paper details an investigation of the Jelinski-Moranda and Geometric models for software reliability. The models were given data from a process that they have correctly simulated and asked to make predictions about the reliability of that process. It was found that either model will usually fail to make good predictions. These problems were attributed to randomness in the data and replication of data was recommended.
Keppler, Hannah; Dhooge, Ingeborg; Maes, Leen; D'haenens, Wendy; Bockstael, Annelies; Philips, Birgit; Swinnen, Freya; Vinck, Bart
2010-02-01
Knowledge regarding the variability of transient-evoked otoacoustic emissions (TEOAEs) and distortion product otoacoustic emissions (DPOAEs) is essential in clinical settings and improves their utility in monitoring hearing status over time. In the current study, TEOAEs and DPOAEs were measured with commercially available OAE-equipment in 56 normally-hearing ears during three sessions. Reliability was analysed for the retest measurement without probe-refitting, the immediate retest measurement with probe-refitting, and retest measurements after one hour and one week. The highest reliability was obtained in the retest measurement without probe-refitting, and decreased with increasing time-interval between measurements. For TEOAEs, the lowest reliability was seen at half-octave frequency bands 1.0 and 1.4 kHz; whereas for DPOAEs half-octave frequency band 8.0 kHz had also poor reliability. Higher primary tone level combination for DPOAEs yielded to a better reliability of DPOAE amplitudes. External environmental noise seemed to be the dominating noise source in normal-hearing subjects, decreasing the reliability of emission amplitudes especially in the low-frequency region.
Reliability in perceptual analysis of voice quality.
Bele, Irene Velsvik
2005-12-01
This study focuses on speaking voice quality in male teachers (n = 35) and male actors (n = 36), who represent untrained and trained voice users, because we wanted to investigate normal and supranormal voices. In this study, both substantial and methodologic aspects were considered. It includes a method for perceptual voice evaluation, and a basic issue was rater reliability. A listening group of 10 listeners, 7 experienced speech-language therapists, and 3 speech-language therapist students evaluated the voices by 15 vocal characteristics using VA scales. Two sets of voice signals were investigated: text reading (2 loudness levels) and sustained vowel (3 levels). The results indicated a high interrater reliability for most perceptual characteristics. Connected speech was evaluated more reliably, especially at the normal level, but both types of voice signals were evaluated reliably, although the reliability for connected speech was somewhat higher than for vowels. Experienced listeners tended to be more consistent in their ratings than did the student raters. Some vocal characteristics achieved acceptable reliability even with a smaller panel of listeners. The perceptual characteristics grouped in 4 factors reflected perceptual dimensions.
Michels, Nele R M; Driessen, Erik W; Muijtjens, Arno M M; Van Gaal, Luc F; Bossaert, Leo L; De Winter, Benedicte Y
2009-12-01
A portfolio is used to mentor and assess students' clinical performance at the workplace. However, students and raters often perceive the portfolio as a time-consuming instrument. In this study, we investigated whether assessment during medical internship by a portfolio can combine reliability and feasibility. The domain-oriented reliability of 61 double-rated portfolios was measured, using a generalisability analysis with portfolio tasks and raters as sources of variation in measuring the performance of a student. We obtained reliability (Phi coefficient) of 0.87 with this internship portfolio containing 15 double-rated tasks. The generalisability analysis showed that an acceptable level of reliability (Phi = 0.80) was maintained when the amount of portfolio tasks was decreased to 13 or 9 using one and two raters, respectively. Our study shows that a portfolio can be a reliable method for the assessment of workplace learning. The possibility of reducing the amount of tasks or raters while maintaining a sufficient level of reliability suggests an increase in feasibility of portfolio use for both students and raters.
ERIC Educational Resources Information Center
Ghazali, Nor Hasnida Md
2016-01-01
A valid, reliable and practical instrument is needed to evaluate the implementation of the school-based assessment (SBA) system. The aim of this study is to develop and assess the validity and reliability of an instrument to measure the perception of teachers towards the SBA implementation in schools. The instrument is developed based on a…
Loeding, B L; Greenan, J P
1998-12-01
The study examined the validity and reliability of four assessments, with three instruments per domain. Domains included generalizable mathematics, communication, interpersonal relations, and reasoning skills. Participants were deaf, legally blind, or visually impaired students enrolled in vocational classes at residential secondary schools. The researchers estimated the internal consistency reliability, test-retest reliability, and construct validity correlations of three subinstruments: student self-ratings, teacher ratings, and performance assessments. The data suggest that these instruments are highly internally consistent measures of generalizable vocational skills. Four performance assessments have high-to-moderate test-retest reliability estimates, and were generally considered to possess acceptable validity and reliability.
ERIC Educational Resources Information Center
Lee, Guemin; Park, In-Yong
2012-01-01
Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
ERIC Educational Resources Information Center
Kim, Seonghoon; Feldt, Leonard S.
2010-01-01
The primary purpose of this study is to investigate the mathematical characteristics of the test reliability coefficient rho[subscript XX'] as a function of item response theory (IRT) parameters and present the lower and upper bounds of the coefficient. Another purpose is to examine relative performances of the IRT reliability statistics and two…
Kalichman, Leonid; Klindukhov, Alexander; Li, Ling; Linov, Lina
2016-11-01
A reliability and cross-sectional observational study. To introduce a scoring system for visible fat infiltration in paraspinal muscles; to evaluate intertester and intratester reliability of this system and its relationship with indices of muscle density; to evaluate the association between indices of paraspinal muscle degeneration and facet joint osteoarthritis. Current evidence suggests that the paraspinal muscles degeneration is associated with low back pain, facet joint osteoarthritis, spondylolisthesis, and degenerative disc disease. However, the evaluation of paraspinal muscles on computed tomography is not radiological routine, probably because of absence of simple and reliable indices of paraspinal degeneration. One hundred fifty consecutive computed tomography scans of the lower back (N=75) or abdomen (N=75) were evaluated. Mean radiographic density (in Hounsfield units) and SD of the density of multifidus and erector spinae were evaluated at the L4-L5 spinal level. A new index of muscle degeneration, radiographic density ratio=muscle density/SD of density, was calculated. To evaluate the visible fat infiltration in paraspinal muscles, we proposed a 3-graded scoring system. The prevalence of facet joint osteoarthritis was also evaluated. Intraclass correlation and κ statistics were used to evaluate inter-rater and intra-rater reliability. Logistic regression examined the association between paraspinal muscle indices and facet joint osteoarthritis. Intra-rater reliability for fat infiltration score (κ) ranged between 0.87 and 0.92; inter-rater reliability between 0.70 and 0.81. Intra-rater reliability (intraclass correlation) for mean density of paraspinal muscles ranged between 0.96 and 0.99, inter-rater reliability between 0.95 and 0.99; SD intra-rater reliability ranged between 0.82 and 0.91, inter-rater reliability between 0.80 and 0.89. Significant associations (P<0.01) were found between facet joint osteoarthritis, fat infiltration score, and radiographic density ratio. Two suggested indices of paraspinal muscle degeneration showed excellent reliability and were significantly associated with facet joint osteoarthritis. Additional studies are needed to evaluate the associations with other spinal degeneration features and low back pain.
Report: Studies Addressing EPA’s Organizational Structure
Report #2006-P-00029, August 16, 2006. The 13 studies, articles, publications, and reports we reviewed identified issues with cross-media management, regional offices, reliable information, and reliable science.
Berger, Aaron J; Momeni, Arash; Ladd, Amy L
2014-04-01
Trapeziometacarpal, or thumb carpometacarpal (CMC), arthritis is a common problem with a variety of treatment options. Although widely used, the Eaton radiographic staging system for CMC arthritis is of questionable clinical utility, as disease severity does not predictably correlate with symptoms or treatment recommendations. A possible reason for this is that the classification itself may not be reliable, but the literature on this has not, to our knowledge, been systematically reviewed. We therefore performed a systematic review to determine the intra- and interobserver reliability of the Eaton staging system. We systematically reviewed English-language studies published between 1973 and 2013 to assess the degree of intra- and interobserver reliability of the Eaton classification for determining the stage of trapeziometacarpal joint arthritis and pantrapezial arthritis based on plain radiographic imaging. Search engines included: PubMed, Scopus(®), and CINAHL. Four studies, which included a total of 163 patients, met our inclusion criteria and were evaluated. The level of evidence of the studies included in this analysis was determined using the Oxford Centre for Evidence Based Medicine Levels of Evidence Classification by two independent observers. A limited number of studies have been performed to assess intra- and interobserver reliability of the Eaton classification system. The four studies included were determined to be Level 3b. These studies collectively indicate that the Eaton classification demonstrates poor to fair interobserver reliability (kappa values: 0.11-0.56) and fair to moderate intraobserver reliability (kappa values: 0.54-0.657). Review of the literature demonstrates that radiographs assist in the assessment of CMC joint disease, but there is not a reliable system for classification of disease severity. Currently, diagnosis and treatment of thumb CMC arthritis are based on the surgeon's qualitative assessment combining history, physical examination, and radiographic evaluation. Inconsistent agreement using the current common radiographic classification system suggests a need for better radiographic tools to quantify disease severity.
ShahAli, Shabnam; Arab, Amir Massoud; Talebian, Saeed; Ebrahimi, Esmaeil; Bahmani, Andia; Karimi, Noureddin; Nabavi, Hoda
2015-07-01
The study was designed to evaluate the intra-examiner reliability of ultrasound (US) thickness measurement of abdominal muscles activity when supine lying and during two isometric endurance tests in subjects with and without Low back pain (LBP). A total of 19 women (9 with LBP, 10 without LBP) participated in the study. Within-day reliability of the US thickness measurements at supine lying and the two isometric endurance tests were assessed in all subjects. The intra-class correlation coefficient (ICC) was used to assess the relative reliability of thickness measurement. The standard error of measurement (SEM), minimal detectable change (MDC) and the coefficient of variation (CV) were used to evaluate the absolute reliability. Results indicated high ICC scores (0.73-0.99) and also small SEM and MDC scores for within-day reliability assessment. The Bland-Altman plots of agreement in US measurement of the abdominal muscles during the two isometric endurance tests demonstrated that 95% of the observations fall between the limits of agreement for test and retest measurements. Together the results indicate high intra-tester reliability for the US measurement of the thickness of abdominal muscles in all the positions tested. According to the study's findings, US imaging can be used as a reliable method for assessment of abdominal muscles activity in supine lying and the two isometric endurance tests employed, in participants with and without LBP. Copyright © 2014 Elsevier Ltd. All rights reserved.
Santelmann, Hanno; Franklin, Jeremy; Bußhoff, Jana; Baethge, Christopher
2016-10-01
Schizoaffective disorder is a common diagnosis in clinical practice but its nosological status has been subject to debate ever since it was conceptualized. Although it is key that diagnostic reliability is sufficient, schizoaffective disorder has been reported to have low interrater reliability. Evidence based on systematic review and meta-analysis methods, however, is lacking. Using a highly sensitive literature search in Medline, Embase, and PsycInfo we identified studies measuring the interrater reliability of schizoaffective disorder in comparison to schizophrenia, bipolar disorder, and unipolar disorder. Out of 4126 records screened we included 25 studies reporting on 7912 patients diagnosed by different raters. The interrater reliability of schizoaffective disorder was moderate (meta-analytic estimate of Cohen's kappa 0.57 [95% CI: 0.41-0.73]), and substantially lower than that of its main differential diagnoses (difference in kappa between 0.22 and 0.19). Although there was considerable heterogeneity, analyses revealed that the interrater reliability of schizoaffective disorder was consistently lower in the overwhelming majority of studies. The results remained robust in subgroup and sensitivity analyses (e.g., diagnostic manual used) as well as in meta-regressions (e.g., publication year) and analyses of publication bias. Clinically, the results highlight the particular importance of diagnostic re-evaluation in patients diagnosed with schizoaffective disorder. They also quantify a widely held clinical impression of lower interrater reliability and agree with earlier meta-analysis reporting low test-retest reliability. Copyright © 2016. Published by Elsevier B.V.
Reliability of TMS metrics in patients with chronic incomplete spinal cord injury.
Potter-Baker, K A; Janini, D P; Frost, F S; Chabra, P; Varnerin, N; Cunningham, D A; Sankarasubramanian, V; Plow, E B
2016-11-01
Test-retest reliability analysis in individuals with chronic incomplete spinal cord injury (iSCI). The purpose of this study was to examine the reliability of neurophysiological metrics acquired with transcranial magnetic stimulation (TMS) in individuals with chronic incomplete tetraplegia. Cleveland Clinic Foundation, Cleveland, Ohio, USA. TMS metrics of corticospinal excitability, output, inhibition and motor map distribution were collected in muscles with a higher MRC grade and muscles with a lower MRC grade on the more affected side of the body. Metrics denoting upper limb function were also collected. All metrics were collected at two sessions separated by a minimum of two weeks. Reliability between sessions was determined using Spearman's correlation coefficients and concordance correlation coefficients (CCCs). We found that TMS metrics that were acquired in higher MRC grade muscles were approximately two times more reliable than those collected in lower MRC grade muscles. TMS metrics of motor map output, however, demonstrated poor reliability regardless of muscle choice (P=0.34; CCC=0.51). Correlation analysis indicated that patients with more baseline impairment and/or those in a more chronic phase of iSCI demonstrated greater variability of metrics. In iSCI, reliability of TMS metrics varies depending on the muscle grade of the tested muscle. Variability is also influenced by factors such as baseline motor function and time post SCI. Future studies that use TMS metrics in longitudinal study designs to understand functional recovery should be cautious as choice of muscle and clinical characteristics can influence reliability.
Reliability of movement control tests in the lumbar spine
Luomajoki, Hannu; Kool, Jan; de Bruin, Eling D; Airaksinen, Olavi
2007-01-01
Background Movement control dysfunction [MCD] reduces active control of movements. Patients with MCD might form an important subgroup among patients with non specific low back pain. The diagnosis is based on the observation of active movements. Although widely used clinically, only a few studies have been performed to determine the test reliability. The aim of this study was to determine the inter- and intra-observer reliability of movement control dysfunction tests of the lumbar spine. Methods We videoed patients performing a standardized test battery consisting of 10 active movement tests for motor control in 27 patients with non specific low back pain and 13 patients with other diagnoses but without back pain. Four physiotherapists independently rated test performances as correct or incorrect per observation, blinded to all other patient information and to each other. The study was conducted in a private physiotherapy outpatient practice in Reinach, Switzerland. Kappa coefficients, percentage agreements and confidence intervals for inter- and intra-rater results were calculated. Results The kappa values for inter-tester reliability ranged between 0.24 – 0.71. Six tests out of ten showed a substantial reliability [k > 0.6]. Intra-tester reliability was between 0.51 – 0.96, all tests but one showed substantial reliability [k > 0.6]. Conclusion Physiotherapists were able to reliably rate most of the tests in this series of motor control tasks as being performed correctly or not, by viewing films of patients with and without back pain performing the task. PMID:17850669
Intraday and Interday Reliability of Ultra-Short-Term Heart Rate Variability in Rugby Union Players.
Nakamura, Fábio Y; Pereira, Lucas A; Esco, Michael R; Flatt, Andrew A; Moraes, José E; Cal Abad, Cesar C; Loturco, Irineu
2017-02-01
Nakamura, FY, Pereira, LA, Esco, MR, Flatt, AA, Moraes, JE, Cal Abad, CC, and Loturco, I. Intraday and interday reliability of ultra-short-term heart rate variability in rugby union players. J Strength Cond Res 31(2): 548-551, 2017-The aim of this study was to examine the intraday and interday reliability of ultra-short-term vagal-related heart rate variability (HRV) in elite rugby union players. Forty players from the Brazilian National Rugby Team volunteered to participate in this study. The natural log of the root mean square of successive RR interval differences (lnRMSSD) assessments were performed on 4 different days. The HRV was assessed twice (intraday reliability) on the first day and once per day on the following 3 days (interday reliability). The RR interval recordings were obtained from 2-minute recordings using a portable heart rate monitor. The relative reliability of intraday and interday lnRMSSD measures was analyzed using the intraclass correlation coefficient (ICC). The typical error of measurement (absolute reliability) of intraday and interday lnRMSSD assessments was analyzed using the coefficient of variation (CV). Both intraday (ICC = 0.96; CV = 3.99%) and interday (ICC = 0.90; CV = 7.65%) measures were highly reliable. The ultra-short-term lnRMSSD is a consistent measure for evaluating elite rugby union players, in both intraday and interday settings. This study provides further validity to using this shortened method in practical field conditions with highly trained team sports athletes.
Reliability analysis of a sensitive and independent stabilometry parameter set
Nagymáté, Gergely; Orlovits, Zsanett
2018-01-01
Recent studies have suggested reduced independent and sensitive parameter sets for stabilometry measurements based on correlation and variance analyses. However, the reliability of these recommended parameter sets has not been studied in the literature or not in every stance type used in stabilometry assessments, for example, single leg stances. The goal of this study is to evaluate the test-retest reliability of different time-based and frequency-based parameters that are calculated from the center of pressure (CoP) during bipedal and single leg stance for 30- and 60-second measurement intervals. Thirty healthy subjects performed repeated standing trials in a bipedal stance with eyes open and eyes closed conditions and in a single leg stance with eyes open for 60 seconds. A force distribution measuring plate was used to record the CoP. The reliability of the CoP parameters was characterized by using the intraclass correlation coefficient (ICC), standard error of measurement (SEM), minimal detectable change (MDC), coefficient of variation (CV) and CV compliance rate (CVCR). Based on the ICC, SEM and MDC results, many parameters yielded fair to good reliability values, while the CoP path length yielded the highest reliability (smallest ICC > 0.67 (0.54–0.79), largest SEM% = 19.2%). Usually, frequency type parameters and extreme value parameters yielded poor reliability values. There were differences in the reliability of the maximum CoP velocity (better with 30 seconds) and mean power frequency (better with 60 seconds) parameters between the different sampling intervals. PMID:29664938
Reliability analysis of a sensitive and independent stabilometry parameter set.
Nagymáté, Gergely; Orlovits, Zsanett; Kiss, Rita M
2018-01-01
Recent studies have suggested reduced independent and sensitive parameter sets for stabilometry measurements based on correlation and variance analyses. However, the reliability of these recommended parameter sets has not been studied in the literature or not in every stance type used in stabilometry assessments, for example, single leg stances. The goal of this study is to evaluate the test-retest reliability of different time-based and frequency-based parameters that are calculated from the center of pressure (CoP) during bipedal and single leg stance for 30- and 60-second measurement intervals. Thirty healthy subjects performed repeated standing trials in a bipedal stance with eyes open and eyes closed conditions and in a single leg stance with eyes open for 60 seconds. A force distribution measuring plate was used to record the CoP. The reliability of the CoP parameters was characterized by using the intraclass correlation coefficient (ICC), standard error of measurement (SEM), minimal detectable change (MDC), coefficient of variation (CV) and CV compliance rate (CVCR). Based on the ICC, SEM and MDC results, many parameters yielded fair to good reliability values, while the CoP path length yielded the highest reliability (smallest ICC > 0.67 (0.54-0.79), largest SEM% = 19.2%). Usually, frequency type parameters and extreme value parameters yielded poor reliability values. There were differences in the reliability of the maximum CoP velocity (better with 30 seconds) and mean power frequency (better with 60 seconds) parameters between the different sampling intervals.
Test-retest reliability of infant event related potentials evoked by faces.
Munsters, N M; van Ravenswaaij, H; van den Boomen, C; Kemner, C
2017-04-05
Reliable measures are required to draw meaningful conclusions regarding developmental changes in longitudinal studies. Little is known, however, about the test-retest reliability of face-sensitive event related potentials (ERPs), a frequently used neural measure in infants. The aim of the current study is to investigate the test-retest reliability of ERPs typically evoked by faces in 9-10 month-old infants. The infants (N=31) were presented with neutral, fearful and happy faces that contained only the lower or higher spatial frequency information. They were tested twice within two weeks. The present results show that the test-retest reliability of the face-sensitive ERP components is moderate (P400 and Nc) to substantial (N290). However, there is low test-retest reliability for the effects of the specific experimental manipulations (i.e. emotion and spatial frequency) on the face-sensitive ERPs. To conclude, in infants the face-sensitive ERP components (i.e. N290, P400 and Nc) show adequate test-retest reliability, but not the effects of emotion and spatial frequency on these ERP components. We propose that further research focuses on investigating elements that might increase the test-retest reliability, as adequate test-retest reliability is necessary to draw meaningful conclusions on individual developmental trajectories of the face-sensitive ERPs in infants. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Scoping study on trends in the economic value of electricity reliability to the U.S. economy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eto, Joseph; Koomey, Jonathan; Lehman, Bryan
During the past three years, working with more than 150 organizations representing public and private stakeholders, EPRI has developed the Electricity Technology Roadmap. The Roadmap identifies several major strategic challenges that must be successfully addressed to ensure a sustainable future in which electricity continues to play an important role in economic growth. Articulation of these anticipated trends and challenges requires a detailed understanding of the role and importance of reliable electricity in different sectors of the economy. This report is intended to contribute to that understanding by analyzing key aspects of trends in the economic value of electricity reliability inmore » the U.S. economy. We first present a review of recent literature on electricity reliability costs. Next, we describe three distinct end-use approaches for tracking trends in reliability needs: (1) an analysis of the electricity-use requirements of office equipment in different commercial sectors; (2) an examination of the use of aggregate statistical indicators of industrial electricity use and economic activity to identify high reliability-requirement customer market segments; and (3) a case study of cleanrooms, which is a cross-cutting market segment known to have high reliability requirements. Finally, we present insurance industry perspectives on electricity reliability as an example of a financial tool for addressing customers' reliability needs.« less
González-Gil, E M; Mouratidou, T; Cardon, G; Androutsos, O; De Bourdeaudhuij, I; Góźdź, M; Usheva, N; Birnbaum, J; Manios, Y; Moreno, L A
2014-08-01
Reliable assessments of health-related behaviours are necessary for accurate evaluation on the efficiency of public health interventions. The aim of the current study was to examine the reliability of a self-administered primary caregivers questionnaire (PCQ) used in the ToyBox-intervention. The questionnaire consisted of six sections addressing sociodemographic and perinatal factors, water and beverages consumption, physical activity, snacking and sedentary behaviours. Parents/caregivers from six countries (Belgium, Bulgaria, Germany, Greece, Poland and Spain) were asked to complete the questionnaire twice within a 2-week interval. A total of 93 questionnaires were collected. Test-retest reliability was assessed using intra-class correlation coefficient (ICC). Reliability of the six questionnaire sections was assessed. A stronger agreement was observed in the questions addressing sociodemographic and perinatal factors as opposed to questions addressing behaviours. Findings showed that 92% of the ToyBox PCQ had a moderate-to-excellent test-retest reliability (defined as ICC values from 0.41 to 1) and less than 8% poor test-retest reliability (ICC < 0.40). Out of the total ICC values, 67% showed good-to-excellent reliability (ICC from 0.61 to 1). We conclude that the PCQ is a reliable tool to assess sociodemographic characteristics, perinatal factors and lifestyle behaviours of pre-school children and their families participating in the ToyBox-intervention. © 2014 World Obesity.
Night-to-night arousal variability and interscorer reliability of arousal measurements.
Loredo, J S; Clausen, J L; Ancoli-Israel, S; Dimsdale, J E
1999-11-01
Measurement of arousals from sleep is clinically important, however, their definition is not well standardized, and little data exist on reliability. The purpose of this study is to determine factors that affect arousal scoring reliability and night-to-night arousal variability. The night-to-night arousal variability and interscorer reliability was assessed in 20 subjects with and without obstructive sleep apnea undergoing attended polysomnography during two consecutive nights. Five definitions of arousal were studied, assessing duration of electroencephalographic (EEG) frequency changes, increases in electromyographic (EMG) activity and leg movement, association with respiratory events, as well as the American Sleep Disorders Association (ASDA) definition of arousals. NA. NA. NA. Interscorer reliability varied with the definition of arousal and ranged from an Intraclass correlation (ICC) of 0.19 to 0.92. Arousals that included increases in EMG activity or leg movement had the greatest reliability, especially when associated with respiratory events (ICC 0.76 to 0.92). The ASDA arousal definition had high interscorer reliability (ICC 0.84). Reliability was lowest for arousals consisting of EEG changes lasting <3 seconds (ICC 0.19 to 0.37). The within subjects night-to-night arousal variability was low for all arousal definitions In a heterogeneous population, interscorer arousal reliability is enhanced by increases in EMG activity, leg movements, and respiratory events and decreased by short duration EEG arousals. The arousal index night-to-night variability was low for all definitions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamachi La Commare, Kristina
Metrics for reliability, such as the frequency and duration of power interruptions, have been reported by electric utilities for many years. This study examines current utility practices for collecting and reporting electricity reliability information and discusses challenges that arise in assessing reliability because of differences among these practices. The study is based on reliability information for year 2006 reported by 123 utilities in 37 states representing over 60percent of total U.S. electricity sales. We quantify the effects that inconsistencies among current utility reporting practices have on comparisons of System Average Interruption Duration Index (SAIDI) and System Average Interruption Frequency Indexmore » (SAIFI) reported by utilities. We recommend immediate adoption of IEEE Std. 1366-2003 as a consistent method for measuring and reporting reliability statistics.« less
Validity and Reliability of Accelerometers in Patients With COPD: A SYSTEMATIC REVIEW.
Gore, Shweta; Blackwood, Jennifer; Guyette, Mary; Alsalaheen, Bara
2018-05-01
Reduced physical activity is associated with poor prognosis in chronic obstructive pulmonary disease (COPD). Accelerometers have greatly improved quantification of physical activity by providing information on step counts, body positions, energy expenditure, and magnitude of force. The purpose of this systematic review was to compare the validity and reliability of accelerometers used in patients with COPD. An electronic database search of MEDLINE and CINAHL was performed. Study quality was assessed with the Strengthening the Reporting of Observational Studies in Epidemiology checklist while methodological quality was assessed using the modified Quality Appraisal Tool for Reliability Studies. The search yielded 5392 studies; 25 met inclusion criteria. The SenseWear Pro armband reported high criterion validity under controlled conditions (r = 0.75-0.93) and high reliability (ICC = 0.84-0.86) for step counts. The DynaPort MiniMod demonstrated highest concurrent validity for step count using both video and manual methods. Validity of the SenseWear Pro armband varied between studies especially in free-living conditions, slower walking speeds, and with addition of weights during gait. A high degree of variability was found in the outcomes used and statistical analyses performed between studies, indicating a need for further studies to measure reliability and validity of accelerometers in COPD. The SenseWear Pro armband is the most commonly used accelerometer in COPD, but measurement properties are limited by gait speed variability and assistive device use. DynaPort MiniMod and Stepwatch accelerometers demonstrated high validity in patients with COPD but lack reliability data.
Rater methodology for stroboscopy: a systematic review.
Bonilha, Heather Shaw; Focht, Kendrea L; Martin-Harris, Bonnie
2015-01-01
Laryngeal endoscopy with stroboscopy (LES) remains the clinical gold standard for assessing vocal fold function. LES is used to evaluate the efficacy of voice treatments in research studies and clinical practice. LES as a voice treatment outcome tool is only as good as the clinician interpreting the recordings. Research using LES as a treatment outcome measure should be evaluated based on rater methodology and reliability. The purpose of this literature review was to evaluate the rater-related methodology from studies that use stroboscopic findings as voice treatment outcome measures. Systematic literature review. Computerized journal databases were searched for relevant articles using terms: stroboscopy and treatment. Eligible articles were categorized and evaluated for the use of rater-related methodology, reporting of number of raters, types of raters, blinding, and rater reliability. Of the 738 articles reviewed, 80 articles met inclusion criteria. More than one-third of the studies included in the review did not report the number of raters who participated in the study. Eleven studies reported results of rater reliability analysis with only two studies reporting good inter- and intrarater reliability. The comparability and use of results from treatment studies that use LES are limited by a lack of rigor in rater methodology and variable, mostly poor, inter- and intrarater reliability. To improve our ability to evaluate and use the findings from voice treatment studies that use LES features as outcome measures, greater consistency of reporting rater methodology characteristics across studies and improved rater reliability is needed. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Martín-Rodríguez, Saúl; Loturco, Irineu; Hunter, Angus M; Rodríguez-Ruiz, David; Munguia-Izquierdo, Diego
2017-12-01
Martín-Rodríguez, S, Loturco, I, Hunter, AM, Rodríguez-Ruiz, D, and Munguia-Izquierdo, D. Reliability and measurement error of tensiomyography to assess mechanical muscle function: A systematic review. J Strength Cond Res 31(12): 3524-3536, 2017-Interest in studying mechanical skeletal muscle function through tensiomyography (TMG) has increased in recent years. This systematic review aimed to (a) report the reliability and measurement error of all TMG parameters (i.e., maximum radial displacement of the muscle belly [Dm], contraction time [Tc], delay time [Td], half-relaxation time [½ Tr], and sustained contraction time [Ts]) and (b) to provide critical reflection on how to perform accurate and appropriate measurements for informing clinicians, exercise professionals, and researchers. A comprehensive literature search was performed of the Pubmed, Scopus, Science Direct, and Cochrane databases up to July 2017. Eight studies were included in this systematic review. Meta-analysis could not be performed because of the low quality of the evidence of some studies evaluated. Overall, the review of the 9 studies involving 158 participants revealed high relative reliability (intraclass correlation coefficient [ICC]) for Dm (0.91-0.99); moderate-to-high ICC for Ts (0.80-0.96), Tc (0.70-0.98), and ½ Tr (0.77-0.93); and low-to-high ICC for Td (0.60-0.98), independently of the evaluated muscles. In addition, absolute reliability (coefficient of variation [CV]) was low for all TMG parameters except for ½ Tr (CV = >20%), whereas measurement error indexes were high for this parameter. In conclusion, this study indicates that 3 of the TMG parameters (Dm, Td, and Tc) are highly reliable, whereas ½ Tr demonstrate insufficient reliability, and thus should not be used in future studies.
Patterson, P Daniel; Weaver, Matthew D; Fabio, Anthony; Teasley, Ellen M; Renn, Megan L; Curtis, Brett R; Matthews, Margaret E; Kroemer, Andrew J; Xun, Xiaoshuang; Bizhanova, Zhadyra; Weiss, Patricia M; Sequeira, Denisse J; Coppler, Patrick J; Lang, Eddy S; Higgins, J Stephen
2018-02-15
This study sought to systematically search the literature to identify reliable and valid survey instruments for fatigue measurement in the Emergency Medical Services (EMS) occupational setting. A systematic review study design was used and searched six databases, including one website. The research question guiding the search was developed a priori and registered with the PROSPERO database of systematic reviews: "Are there reliable and valid instruments for measuring fatigue among EMS personnel?" (2016:CRD42016040097). The primary outcome of interest was criterion-related validity. Important outcomes of interest included reliability (e.g., internal consistency), and indicators of sensitivity and specificity. Members of the research team independently screened records from the databases. Full-text articles were evaluated by adapting the Bolster and Rourke system for categorizing findings of systematic reviews, and the rated data abstracted from the body of literature as favorable, unfavorable, mixed/inconclusive, or no impact. The Grading of Recommendations, Assessment, Development and Evaluation (GRADE) methodology was used to evaluate the quality of evidence. The search strategy yielded 1,257 unique records. Thirty-four unique experimental and non-experimental studies were determined relevant following full-text review. Nineteen studies reported on the reliability and/or validity of ten different fatigue survey instruments. Eighteen different studies evaluated the reliability and/or validity of four different sleepiness survey instruments. None of the retained studies reported sensitivity or specificity. Evidence quality was rated as very low across all outcomes. In this systematic review, limited evidence of the reliability and validity of 14 different survey instruments to assess the fatigue and/or sleepiness status of EMS personnel and related shift worker groups was identified.
Santelmann, Hanno; Franklin, Jeremy; Bußhoff, Jana; Baethge, Christopher
2015-11-01
Schizoaffective disorder is a frequent diagnosis, and its reliability is subject to ongoing discussion. We compared the diagnostic reliability of schizoaffective disorder with its main differential diagnoses. We systematically searched Medline, Embase, and PsycInfo for all studies on the test-retest reliability of the diagnosis of schizoaffective disorder as compared with schizophrenia, bipolar disorder, and unipolar depression. We used meta-analytic methods to describe and compare Cohen's kappa as well as positive and negative agreement. In addition, multiple pre-specified and post hoc subgroup and sensitivity analyses were carried out. Out of 4,415 studies screened, 49 studies were included. Test-retest reliability of schizoaffective disorder was consistently lower than that of schizophrenia (in 39 out of 42 studies), bipolar disorder (27/33), and unipolar depression (29/35). The mean difference in kappa between schizoaffective disorder and the other diagnoses was approximately 0.2, and mean Cohen's kappa for schizoaffective disorder was 0.50 (95% confidence interval: 0.40-0.59). While findings were unequivocal and homogeneous for schizoaffective disorder's diagnostic reliability relative to its three main differential diagnoses (dichotomous: smaller versus larger), heterogeneity was substantial for continuous measures, even after subgroup and sensitivity analyses. In clinical practice and research, schizoaffective disorder's comparatively low diagnostic reliability should lead to increased efforts to correctly diagnose the disorder. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Partido, Brian B; Jones, Archie A; English, Dana L; Nguyen, Carol A; Jacks, Mary E
2015-02-01
Dental and dental hygiene faculty members often do not provide consistent instruction in the clinical environment, especially in tasks requiring clinical judgment. From previous efforts to calibrate faculty members in calculus detection using typodonts, researchers have suggested using human subjects and emerging technology to improve consistency in clinical instruction. The purpose of this pilot study was to determine if a dental endoscopy-assisted training program would improve intra- and interrater reliability of dental hygiene faculty members in calculus detection. Training included an ODU 11/12 explorer, typodonts, and dental endoscopy. A convenience sample of six participants was recruited from the dental hygiene faculty at a California community college, and a two-group randomized experimental design was utilized. Intra- and interrater reliability was measured before and after calibration training. Pretest and posttest Kappa averages of all participants were compared using repeated measures (split-plot) ANOVA to determine the effectiveness of the calibration training on intra- and interrater reliability. The results showed that both kinds of reliability significantly improved for all participants and the training group improved significantly in interrater reliability from pretest to posttest. Calibration training was beneficial to these dental hygiene faculty members, especially those beginning with less than full agreement. This study suggests that calculus detection calibration training utilizing dental endoscopy can effectively improve interrater reliability of dental and dental hygiene clinical educators. Future studies should include human subjects, involve more participants at multiple locations, and determine whether improved rater reliability can be sustained over time.
Yapali, Gökmen; Günel, Mintaze Kerem; Karahan, Sevilay
2012-05-15
The study design was cross-cultural adaptation and investigation of reliability and validity of the Copenhagen Neck Functional Disability Scale (CNFDS). The aim of this study was to translate the CNFDS into Turkish language and assess its reliability and validity among patients with neck pain in Turkish population. The CNFDS is a reliable and valid evaluation instrument for disability, but there is no published the Turkish version of the CNFDS. One hundred one subjects who had chronic neck pain were included in this study. The CNFDS, Neck Pain and Disability Scale, and visual analogue scale were administered to all subjects. For investigating test-retest reliability, correlation between CNFDS scores, applied at 1-week interval, intraclass correlation coefficient score for test-retest reliability was 0.86 (95% confidence interval = 0.679-0.935). There was no difference between test-retest scores (P < 0.001). For investigating concurrent validity, correlation between total score of the CNFDS and the mean visual analogue scale was r = 0.73 (P < 0.001). Concurrent validity of the CNFDS was very good. For investigating construct validity, correlation between total score of the CNFDS and the Neck Pain and Disability Scale was r = 0.78 (P < 0.001). Construct validity of the CNFDS was also very good. Our results suggest that the Turkish version of the CNFDS is a reliable and valid instrument for Turkish people.
Test-retest reliability of resting-state magnetoencephalography power in sensor and source space.
Martín-Buro, María Carmen; Garcés, Pilar; Maestú, Fernando
2016-01-01
Several studies have reported changes in spontaneous brain rhythms that could be used as clinical biomarkers or in the evaluation of neuropsychological and drug treatments in longitudinal studies using magnetoencephalography (MEG). There is an increasing necessity to use these measures in early diagnosis and pathology progression; however, there is a lack of studies addressing how reliable they are. Here, we provide the first test-retest reliability estimate of MEG power in resting-state at sensor and source space. In this study, we recorded 3 sessions of resting-state MEG activity from 24 healthy subjects with an interval of a week between each session. Power values were estimated at sensor and source space with beamforming for classical frequency bands: delta (2-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), low beta (13-20 Hz), high beta (20-30 Hz), and gamma (30-45 Hz). Then, test-retest reliability was evaluated using the intraclass correlation coefficient (ICC). We also evaluated the relation between source power and the within-subject variability. In general, ICC of theta, alpha, and low beta power was fairly high (ICC > 0.6) while in delta and gamma power was lower. In source space, fronto-posterior alpha, frontal beta, and medial temporal theta showed the most reliable profiles. Signal-to-noise ratio could be partially responsible for reliability as low signal intensity resulted in high within-subject variability, but also the inherent nature of some brain rhythms in resting-state might be driving these reliability patterns. In conclusion, our results described the reliability of MEG power estimates in each frequency band, which could be considered in disease characterization or clinical trials. © 2015 Wiley Periodicals, Inc.
Research on Novel Algorithms for Smart Grid Reliability Assessment and Economic Dispatch
NASA Astrophysics Data System (ADS)
Luo, Wenjin
In this dissertation, several studies of electric power system reliability and economy assessment methods are presented. To be more precise, several algorithms in evaluating power system reliability and economy are studied. Furthermore, two novel algorithms are applied to this field and their simulation results are compared with conventional results. As the electrical power system develops towards extra high voltage, remote distance, large capacity and regional networking, the application of a number of new technique equipments and the electric market system have be gradually established, and the results caused by power cut has become more and more serious. The electrical power system needs the highest possible reliability due to its complication and security. In this dissertation the Boolean logic Driven Markov Process (BDMP) method is studied and applied to evaluate power system reliability. This approach has several benefits. It allows complex dynamic models to be defined, while maintaining its easy readability as conventional methods. This method has been applied to evaluate IEEE reliability test system. The simulation results obtained are close to IEEE experimental data which means that it could be used for future study of the system reliability. Besides reliability, modern power system is expected to be more economic. This dissertation presents a novel evolutionary algorithm named as quantum evolutionary membrane algorithm (QEPS), which combines the concept and theory of quantum-inspired evolutionary algorithm and membrane computation, to solve the economic dispatch problem in renewable power system with on land and offshore wind farms. The case derived from real data is used for simulation tests. Another conventional evolutionary algorithm is also used to solve the same problem for comparison. The experimental results show that the proposed method is quick and accurate to obtain the optimal solution which is the minimum cost for electricity supplied by wind farm system.
FY12 End of Year Report for NEPP DDR2 Reliability
NASA Technical Reports Server (NTRS)
Guertin, Steven M.
2013-01-01
This document reports the status of the NASA Electronic Parts and Packaging (NEPP) Double Data Rate 2 (DDR2) Reliability effort for FY2012. The task expanded the focus of evaluating reliability effects targeted for device examination. FY11 work highlighted the need to test many more parts and to examine more operating conditions, in order to provide useful recommendations for NASA users of these devices. This year's efforts focused on development of test capabilities, particularly focusing on those that can be used to determine overall lot quality and identify outlier devices, and test methods that can be employed on components for flight use. Flight acceptance of components potentially includes considerable time for up-screening (though this time may not currently be used for much reliability testing). Manufacturers are much more knowledgeable about the relevant reliability mechanisms for each of their devices. We are not in a position to know what the appropriate reliability tests are for any given device, so although reliability testing could be focused for a given device, we are forced to perform a large campaign of reliability tests to identify devices with degraded reliability. With the available up-screening time for NASA parts, it is possible to run many device performance studies. This includes verification of basic datasheet characteristics. Furthermore, it is possible to perform significant pattern sensitivity studies. By doing these studies we can establish higher reliability of flight components. In order to develop these approaches, it is necessary to develop test capability that can identify reliability outliers. To do this we must test many devices to ensure outliers are in the sample, and we must develop characterization capability to measure many different parameters. For FY12 we increased capability for reliability characterization and sample size. We increased sample size this year by moving from loose devices to dual inline memory modules (DIMMs) with an approximate reduction of 20 to 50 times in terms of per device under test (DUT) cost. By increasing sample size we have improved our ability to characterize devices that may be considered reliability outliers. This report provides an update on the effort to improve DDR2 testing capability. Although focused on DDR2, the methods being used can be extended to DDR and DDR3 with relative ease.
de Vries, Nienke M; Staal, J Bart; Olde Rikkert, Marcel G M; Nijhuis-van der Sanden, Maria W G
2013-04-01
Physical activity is assumed to be important in the prevention and treatment of frailty. It is unclear, however, to what extent frailty can be influenced because instruments designed to assess frailty have not been validated as evaluative outcome instruments in clinical practice. The aims of this study were: (1) to develop a frailty index (i.e., the evaluative frailty index for physical activity [EFIP]) based on the method of deficit accumulation and (2) to test the clinimetric properties of the EFIP. The content of the EFIP was determined using a written Delphi procedure. Intrarater reliability, interrater reliability, and construct validity were determined in an observational study (n=24). Intrarater reliability and interrater reliability were calculated using Cohen kappa and intraclass correlation coefficients (ICCs). Construct validity was determined by correlating the score on the EFIP with those on the timed "up & go" test (TUG), the performance-oriented mobility assessment (POMA), and the Cumulative Illness Rating Scale for Geriatrics (CIRS-G). Fifty items were included in the EFIP. Interrater reliability (Cohen kappa=0.72, ICC=.96) and intrarater reliability (Cohen kappa=0.77 and 0.80, ICC=.93 and .98) were good. As expected, a fair to moderate correlation with the TUG, POMA, and CIRS-G was found (.61, -.70, and .66, respectively). Reliability and validity of the EFIP have been tested in a small sample. These and other clinimetric properties, such as responsiveness, will be assessed or reassessed in a larger study population. The EFIP is a reliable and valid instrument to evaluate the effect of physical activity on frailty in research and in clinical practice.
Burke, Shane M; Hwang, Steven W; Mehan, William A; Bedi, Harprit S; Ogbuji, Richard; Riesenburger, Ron I
2016-07-01
Cross-specialty inter-rater reliability has not been explicitly reported for imaging characteristics that are thought to be important in lumbar intervertebral disc degeneration. Sufficient cross-specialty reliability is an essential consideration if radiographic stratification of symptomatic patients to specific treatment modalities is to ever be realized. Therefore the purpose of this study was to directly compare the assessment of such characteristics between neurosurgeons and neuroradiologists. Sixty consecutive patients with a diagnosis of lumbago and appropriate imaging were selected for inclusion. Lumbar MRI were evaluated using the Tufts Degenerative Disc Classification by two neurosurgeons and two neuroradiologists. Inter-rater reliability was assessed using Cohen's κ values both within and between specialties. A sensitivity analysis was performed for a modified grading system, which excluded high intensity zones (HIZ), due to poor cross-specialty inter-rater reliability of HIZ between specialties. The reliability of HIZ between neurosurgeons and neuroradiologists was fair in two of the four cross-specialty comparisons in this study (neurosurgeon 1 versus both radiologists κ=0.364 and κ=0.290). Removing HIZ from the classification improved inter-rater reliability for all comparisons within and between specialties (0.465⩽κ⩽0.576). In addition, intra-rater reliability remained in the moderate to substantial range (0.523⩽κ⩽0.649). Given our findings and corroboration with previous studies, identification of HIZ seems to have a markedly variable reliability. Thus we recommend modification of the original Tufts Degenerative Disc Classification by removing HIZ in order to make the overall grade provided by this classification more reproducible when scored by practitioners of different training backgrounds. Copyright © 2015 Elsevier Ltd. All rights reserved.
Kevern, Mark A.; Beecher, Michael; Rao, Smita
2014-01-01
Context: Athletes who participate in throwing and racket sports consistently demonstrate adaptive changes in glenohumeral-joint internal and external rotation in the dominant arm. Measurements of these motions have demonstrated excellent intrarater and poor interrater reliability. Objective: To determine intrarater reliability, interrater reliability, and standard error of measurement for shoulder internal rotation, external rotation, and total arc of motion using an inclinometer in 3 testing procedures in National Collegiate Athletic Association Division I baseball and softball athletes. Design: Cross-sectional study. Setting: Athletic department. Patients or Other Participants Thirty-eight players participated in the study. Shoulder internal rotation, external rotation, and total arc of motion were measured by 2 investigators in 3 test positions. The standard supine position was compared with a side-lying test position, as well as a supine test position without examiner overpressure. Results: Excellent intrarater reliability was noted for all 3 test positions and ranges of motion, with intraclass correlation coefficient values ranging from 0.93 to 0.99. Results for interrater reliability were less favorable. Reliability for internal rotation was highest in the side-lying position (0.68) and reliability for external rotation and total arc was highest in the supine-without-overpressure position (0.774 and 0.713, respectively). The supine-with-overpressure position yielded the lowest interrater reliability results in all positions. The side-lying position had the most consistent results, with very little variation among intraclass correlation coefficient values for the various test positions. Conclusions: The results of our study clearly indicate that the side-lying test procedure is of equal or greater value than the traditional supine-with-overpressure method. PMID:25188316
Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.
2012-01-01
The purpose of this article is to help researchers avoid common pitfalls associated with reliability including incorrectly assuming that (a) measurement error always attenuates observed score correlations, (b) different sources of measurement error originate from the same source, and (c) reliability is a function of instrumentation. To accomplish our purpose, we first describe what reliability is and why researchers should care about it with focus on its impact on effect sizes. Second, we review how reliability is assessed with comment on the consequences of cumulative measurement error. Third, we consider how researchers can use reliability generalization as a prescriptive method when designing their research studies to form hypotheses about whether or not reliability estimates will be acceptable given their sample and testing conditions. Finally, we discuss options that researchers may consider when faced with analyzing unreliable data. PMID:22518107
Hu, Zhi-Jun; He, Jian; Zhao, Feng-Dong; Fang, Xiang-Qian; Zhou, Li-Na; Fan, Shun-Wu
2011-06-01
A reliability study was conducted. To estimate the intra- and intermeasurement errors in the measurements of functional cross-sectional area (FCSA), density, and T2 signal intensity of paraspinal muscles using computed tomography (CT) scan and magnetic resonance imaging (MRI). CT scan and MRI had been used widely to measure the cross-sectional area and degeneration of the back muscles in spine and muscle research. But there is still no systemic study to analyze the reliability of these measurements. This study measured the FCSA and fatty infiltration (density on CT scan and T2 signal intensity on MRI) of the paraspinal muscles at L3-L4, L4-L5, and L5-S1 in 29 patients with chronic low back pain. Two experienced musculoskeletal radiologists and one superior spine surgeon traced the region of interest twice within 3 weeks for measurement of the intra- and interobserver reliability. The intraclass correlation coefficients (ICCs) of the intra-reliability ranged from fair to excellent for FCSA, and good to excellent for fatty infiltration. The ICCs of the inter-reliability ranged from fair to excellent for FCSA, and good to excellent for fatty infiltration. There were no significant differences between CT scan and MRI in reliability results, except in the relative standard error of fatty infiltration measurement. The ICCs of the FCSA measurement between CT scan and MRI ranged from poor to good. The reliabilities of the CT scan and MRI for measuring the FCSA and fatty infiltration of the atrophied lumbar paraspinal muscles were acceptable. It was reliable for using uniform one image method for a single paraspinal muscle evaluation study. And the authors preferred to advise the MRI other than CT scan for paraspinal muscles measurements of FCSA and fatty infiltration.
The interrater reliability of DSM III in children.
Werry, J S; Methven, R J; Fitzpatrick, J; Dixon, H
1983-09-01
A total of 195 admissions to a child psychiatric inpatient unit were diagnosed independently by two to four clinicians on the basis of case presentations at the first ward-round after admission. The DSM III as a whole and the major categories were of high or acceptable reliability, though a few were clearly unreliable. The results are generally consistent with other studies. Unlike other studies, the subcategories were examined and found to vary widely in reliability both as a whole across the system and within parent major categories, throwing considerable doubt upon their utility. The results indicate the need both for improved diagnostic data-gathering techniques in child psychiatry and for more better-designed studies of reliability and, most necessarily, of validity.
Comparison of fMRI paradigms assessing visuospatial processing: Robustness and reproducibility
Herholz, Peer; Zimmermann, Kristin M.; Westermann, Stefan; Frässle, Stefan; Jansen, Andreas
2017-01-01
The development of brain imaging techniques, in particular functional magnetic resonance imaging (fMRI), made it possible to non-invasively study the hemispheric lateralization of cognitive brain functions in large cohorts. Comprehensive models of hemispheric lateralization are, however, still missing and should not only account for the hemispheric specialization of individual brain functions, but also for the interactions among different lateralized cognitive processes (e.g., language and visuospatial processing). This calls for robust and reliable paradigms to study hemispheric lateralization for various cognitive functions. While numerous reliable imaging paradigms have been developed for language, which represents the most prominent left-lateralized brain function, the reliability of imaging paradigms investigating typically right-lateralized brain functions, such as visuospatial processing, has received comparatively less attention. In the present study, we aimed to establish an fMRI paradigm that robustly and reliably identifies right-hemispheric activation evoked by visuospatial processing in individual subjects. In a first study, we therefore compared three frequently used paradigms for assessing visuospatial processing and evaluated their utility to robustly detect right-lateralized brain activity on a single-subject level. In a second study, we then assessed the test-retest reliability of the so-called Landmark task–the paradigm that yielded the most robust results in study 1. At the single-voxel level, we found poor reliability of the brain activation underlying visuospatial attention. This suggests that poor signal-to-noise ratios can become a limiting factor for test-retest reliability. This represents a common detriment of fMRI paradigms investigating visuospatial attention in general and therefore highlights the need for careful considerations of both the possibilities and limitations of the respective fMRI paradigm–in particular, when being interested in effects at the single-voxel level. Notably, however, when focusing on the reliability of measures of hemispheric lateralization (which was the main goal of study 2), we show that hemispheric dominance (quantified by the lateralization index, LI, with |LI| >0.4) of the evoked activation could be robustly determined in more than 62% and, if considering only two categories (i.e., left, right), in more than 93% of our subjects. Furthermore, the reliability of the lateralization strength (LI) was “fair” to “good”. In conclusion, our results suggest that the degree of right-hemispheric dominance during visuospatial processing can be reliably determined using the Landmark task, both at the group and single-subject level, while at the same time stressing the need for future refinements of experimental paradigms and more sophisticated fMRI data acquisition techniques. PMID:29059201
NASA Astrophysics Data System (ADS)
Huang, Yuxia; Mao, Mengchai; Zhang, Zong; Zhou, Hui; Zhao, Yang; Duan, Lian; Kreplin, Ute; Xiao, Xiang; Zhu, Chaozhe
2017-01-01
Functional near-infrared spectroscopy (fNIRS) is being increasingly applied to affective and social neuroscience research; however, the reliability of this method is still unclear. This study aimed to evaluate the test-retest reliability of the fNIRS-based prefrontal response to emotional stimuli. Twenty-six participants viewed unpleasant and neutral pictures, and were simultaneously scanned by fNIRS in two sessions three weeks apart. The reproducibility of the prefrontal activation map was evaluated at three spatial scales (mapwise, clusterwise, and channelwise) at both the group and individual levels. The influence of the time interval was also explored and comparisons were made between longer (intersession) and shorter (intrasession) time intervals. The reliabilities of the activation map at the group level for the mapwise (up to 0.88, the highest value appeared in the intersession assessment) and clusterwise scales (up to 0.91, the highest appeared in the intrasession assessment) were acceptable, indicating that fNIRS may be a reliable tool for emotion studies, especially for a group analysis and under larger spatial scales. However, it should be noted that the individual-level and the channelwise fNIRS prefrontal responses were not sufficiently stable. Future studies should investigate which factors influence reliability, as well as the validity of fNIRS used in emotion studies.
Cobb, Stephen C; Joshi, Mukta N; Pomeroy, Robin L
2016-12-01
In-vitro and invasive in-vivo studies have reported relatively independent motion in the medial and lateral forefoot segments during gait. However, most current surface-based models have not defined medial and lateral forefoot or midfoot segments. The purpose of the current study was to determine the reliability of a 7-segment foot model that includes medial and lateral midfoot and forefoot segments during walking gait. Three-dimensional positions of marker clusters located on the leg and 6 foot segments were tracked as 10 participants completed 5 walking trials. To examine the reliability of the foot model, coefficients of multiple correlation (CMC) were calculated across the trials for each participant. Three-dimensional stance time series and range of motion (ROM) during stance were also calculated for each functional articulation. CMCs for all of the functional articulations were ≥ 0.80. Overall, the rearfoot complex (leg-calcaneus segments) was the most reliable articulation and the medial midfoot complex (calcaneus-navicular segments) was the least reliable. With respect to ROM, reliability was greatest for plantarflexion/dorsiflexion and least for abduction/adduction. Further, the stance ROM and time-series patterns results between the current study and previous invasive in-vivo studies that have assessed actual bone motion were generally consistent.
Álvarez-Gallardo, Inmaculada C; Soriano-Maldonado, Alberto; Segura-Jiménez, Víctor; Carbonell-Baeza, Ana; Estévez-López, Fernando; McVeigh, Joseph G; Delgado-Fernández, Manuel; Ortega, Francisco B
2016-03-01
To examine the construct validity of the International FItness Scale (IFIS) (ie, self-reported fitness) against objectively measured physical fitness in women with fibromyalgia and in healthy women; and to study the test-retest reliability of the IFIS in women with fibromyalgia. Cross-sectional study. Fibromyalgia patient support groups. Women with fibromyalgia (n=413) and healthy women (controls) (n=195) for validity purposes and women with fibromyalgia (n=101) for the reliability study. The total sample was N=709. Not applicable. Fitness level was both self-reported (IFIS) and measured using performance-based fitness tests. For the reliability study the IFIS was completed on 2 occasions, 1 week apart. Women with fibromyalgia who reported average fitness had better measured fitness than those reporting very poor fitness (all P<.001, except 6-minute walk test where P<.05), with similar trends observed in healthy control women. The test-retest reliability of the IFIS, as measured by the average weighted κ, was .45. The IFIS was able to identify women with fibromyalgia who had very low fitness and distinguish them from those with higher fitness levels. Furthermore, the IFIS was moderately reliable in women with fibromyalgia. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Best Practices for Reliable and Robust Spacecraft Structures
NASA Technical Reports Server (NTRS)
Raju, Ivatury S.; Murthy, P. L. N.; Patel, Naresh R.; Bonacuse, Peter J.; Elliott, Kenny B.; Gordon, S. A.; Gyekenyesi, J. P.; Daso, E. O.; Aggarwal, P.; Tillman, R. F.
2007-01-01
A study was undertaken to capture the best practices for the development of reliable and robust spacecraft structures for NASA s next generation cargo and crewed launch vehicles. In this study, the NASA heritage programs such as Mercury, Gemini, Apollo, and the Space Shuttle program were examined. A series of lessons learned during the NASA and DoD heritage programs are captured. The processes that "make the right structural system" are examined along with the processes to "make the structural system right". The impact of technology advancements in materials and analysis and testing methods on reliability and robustness of spacecraft structures is studied. The best practices and lessons learned are extracted from these studies. Since the first human space flight, the best practices for reliable and robust spacecraft structures appear to be well established, understood, and articulated by each generation of designers and engineers. However, these best practices apparently have not always been followed. When the best practices are ignored or short cuts are taken, risks accumulate, and reliability suffers. Thus program managers need to be vigilant of circumstances and situations that tend to violate best practices. Adherence to the best practices may help develop spacecraft systems with high reliability and robustness against certain anomalies and unforeseen events.
Bunford, Nora; Kinney, Kerry L; Michael, Jamie; Klumpp, Heide
2017-07-03
Accumulating data from fMRI studies implicate the rostral anterior cingulate cortex (rACC) in inhibition of attention to threat distractors that compete with task-relevant goals for processing resources. However, little data is available on the reliability of rACC activation. Our aim in the current study was to examine test-retest reliability of rACC activation over a 12-week period, in the context of a validated emotional interference paradigm that varied in perceptual load. During functional MRI, 23 healthy volunteers completed a task involving a target letter in a string of identical letters (low load) or in a string of mixed letters (high load) superimposed on angry, fearful, and neutral face distractors. Intraclass correlation coefficients (ICCs) indicated that under low, but not high perceptual load, rACC activation to fearful vs. neutral distractors was moderately reliable. Conversely, regardless of perceptual load, rACC activation to angry vs. neutral distractors was not reliable. Regarding behavioral performance, ICCs indicated that accuracy was not reliable regardless of distractor type or perceptual load. Although reaction time (RT) was similarly not reliable regardless of distractor type under low perceptual load, RT to angry vs. neutral distractors and to fearful vs. neutral distractors was reliable under high perceptual load. Together, results indicate the test-retest reliability of rACC activation and corresponding behavioral performance are context dependent; reliability of the former varies as a function of distractor type and level of cognitive demand, whereas reliability of the latter depends on behavioral index (accuracy vs. RT) and level of cognitive demand but not distractor type. Copyright © 2017 Elsevier Inc. All rights reserved.
Ganasegeran, Kurubaran; Selvaraj, Kamaraj; Rashid, Abdul
2017-08-01
The six item Confusion, Hubbub and Order Scale (CHAOS-6) has been validated as a reliable tool to measure levels of household disorder. We aimed to investigate the goodness of fit and reliability of a new Malay version of the CHAOS-6. The original English version of the CHAOS-6 underwent forward-backward translation into the Malay language. The finalised Malay version was administered to 105 myocardial infarction survivors in a Malaysian cardiac health facility. We performed confirmatory factor analyses (CFAs) using structural equation modelling. A path diagram and fit statistics were yielded to determine the Malay version's validity. Composite reliability was tested to determine the scale's reliability. All 105 myocardial infarction survivors participated in the study. The CFA yielded a six-item, one-factor model with excellent fit statistics. Composite reliability for the single factor CHAOS-6 was 0.65, confirming that the scale is reliable for Malay speakers. The Malay version of the CHAOS-6 was reliable and showed the best fit statistics for our study sample. We thus offer a simple, brief, validated, reliable and novel instrument to measure chaos, the Skala Kecelaruan, Keriuhan & Tertib Terubahsuai (CHAOS-6) , for the Malaysian population.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patton, A.D.; Ayoub, A.K.; Singh, C.
1982-07-01
Existing methods for generating capacity reliability evaluation do not explicitly recognize a number of operating considerations which may have important effects in system reliability performance. Thus, current methods may yield estimates of system reliability which differ appreciably from actual observed reliability. Further, current methods offer no means of accurately studying or evaluating alternatives which may differ in one or more operating considerations. Operating considerations which are considered to be important in generating capacity reliability evaluation include: unit duty cycles as influenced by load cycle shape, reliability performance of other units, unit commitment policy, and operating reserve policy; unit start-up failuresmore » distinct from unit running failures; unit start-up times; and unit outage postponability and the management of postponable outages. A detailed Monte Carlo simulation computer model called GENESIS and two analytical models called OPCON and OPPLAN have been developed which are capable of incorporating the effects of many operating considerations including those noted above. These computer models have been used to study a variety of actual and synthetic systems and are available from EPRI. The new models are shown to produce system reliability indices which differ appreciably from index values computed using traditional models which do not recognize operating considerations.« less
Interexaminer reliability in physical examination of patients with low back pain.
Strender, L E; Sjöblom, A; Sundell, K; Ludwig, R; Taube, A
1997-04-01
Seventy-one patients with low back pain were examined by two physiotherapists (50 patients) and two physicians (21 patients). The two physiotherapists had worked together for many years, but the two physicians had not. The interexaminer reliability of the clinical tests included in the physical examination was evaluated. To evaluate the interexaminer reliability of clinical tests used in the physical examination of patients with low back pain under ideal circumstances, which was the case for the physiotherapists. Numerous clinical tests are used in the evaluation of patients with low back pain. To reach the correct diagnosis, only tests with an acceptable validity and reliability should be used. Previous studies have mainly shown low reliability. It is important that clinical tests not be rejected because of low reliability caused by differences between examiners in performance of the examination and in their definition of normal results. Two examiners, either two physiotherapists or two physicians, independently examined patients with low back pain. In approximately half of the clinical tests studied, an acceptable reliability was demonstrated. On the basis of the physiotherapists series, the reliability was acceptable for a number of clinical tests that are used in the evaluation of patients with low back pain. The results suggest that clinical tests should be standardized to a much higher degree than they are today.
Ganasegeran, Kurubaran; Selvaraj, Kamaraj; Rashid, Abdul
2017-01-01
Background The six item Confusion, Hubbub and Order Scale (CHAOS-6) has been validated as a reliable tool to measure levels of household disorder. We aimed to investigate the goodness of fit and reliability of a new Malay version of the CHAOS-6. Methods The original English version of the CHAOS-6 underwent forward-backward translation into the Malay language. The finalised Malay version was administered to 105 myocardial infarction survivors in a Malaysian cardiac health facility. We performed confirmatory factor analyses (CFAs) using structural equation modelling. A path diagram and fit statistics were yielded to determine the Malay version’s validity. Composite reliability was tested to determine the scale’s reliability. Results All 105 myocardial infarction survivors participated in the study. The CFA yielded a six-item, one-factor model with excellent fit statistics. Composite reliability for the single factor CHAOS-6 was 0.65, confirming that the scale is reliable for Malay speakers. Conclusion The Malay version of the CHAOS-6 was reliable and showed the best fit statistics for our study sample. We thus offer a simple, brief, validated, reliable and novel instrument to measure chaos, the Skala Kecelaruan, Keriuhan & Tertib Terubahsuai (CHAOS-6), for the Malaysian population. PMID:28951688
TCOPPE school environmental audit tool: assessing safety and walkability of school environments.
Lee, Chanam; Kim, Hyung Jin; Dowdy, Diane M; Hoelscher, Deanna M; Ory, Marcia G
2013-09-01
Several environmental audit instruments have been developed for assessing streets, parks and trails, but none for schools. This paper introduces a school audit tool that includes 3 subcomponents: 1) street audit, 2) school site audit, and 3) map audit. It presents the conceptual basis and the development process of this instrument, and the methods and results of the reliability assessments. Reliability tests were conducted by 2 trained auditors on 12 study schools (high-low income and urban-suburban-rural settings). Kappa statistics (categorical, factual items) and ICC (Likert-scale, perceptual items) were used to assess a) interrater, b) test-retest, and c) peak vs. off-peak hour reliability tests. For the interrater reliability test, the average Kappa was 0.839 and the ICC was 0.602. For the test-retest reliability, the average Kappa was 0.903 and the ICC was 0.774. The peak-off peak reliability was 0.801. Rural schools showed the most consistent results in the peak-off peak and test-retest assessments. For interrater tests, urban schools showed the highest ICC, and rural schools showed the highest Kappa. Most items achieved moderate to high levels of reliabilities in all study schools. With proper training, this audit can be used to assess school environments reliably for research, outreach, and policy-support purposes.
Lange, Toni; Matthijs, Omer; Jain, Nitin B; Schmitt, Jochen; Lützner, Jörg; Kopkow, Christian
2017-03-01
Shoulder pain in the general population is common and to identify the aetiology of shoulder pain, history, motion and muscle testing, and physical examination tests are usually performed. The aim of this systematic review was to summarise and evaluate intrarater and inter-rater reliability of physical examination tests in the diagnosis of shoulder pathologies. A comprehensive systematic literature search was conducted using MEDLINE, EMBASE, Allied and Complementary Medicine Database (AMED) and Physiotherapy Evidence Database (PEDro) through 20 March 2015. Methodological quality was assessed using the Quality Appraisal of Reliability Studies (QAREL) tool by 2 independent reviewers. The search strategy revealed 3259 articles, of which 18 finally met the inclusion criteria. These studies evaluated the reliability of 62 test and test variations used for the specific physical examination tests for the diagnosis of shoulder pathologies. Methodological quality ranged from 2 to 7 positive criteria of the 11 items of the QAREL tool. This review identified a lack of high-quality studies evaluating inter-rater as well as intrarater reliability of specific physical examination tests for the diagnosis of shoulder pathologies. In addition, reliability measures differed between included studies hindering proper cross-study comparisons. PROSPERO CRD42014009018. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Do you see what I see? Mobile eye-tracker contextual analysis and inter-rater reliability.
Stuart, S; Hunt, D; Nell, J; Godfrey, A; Hausdorff, J M; Rochester, L; Alcock, L
2018-02-01
Mobile eye-trackers are currently used during real-world tasks (e.g. gait) to monitor visual and cognitive processes, particularly in ageing and Parkinson's disease (PD). However, contextual analysis involving fixation locations during such tasks is rarely performed due to its complexity. This study adapted a validated algorithm and developed a classification method to semi-automate contextual analysis of mobile eye-tracking data. We further assessed inter-rater reliability of the proposed classification method. A mobile eye-tracker recorded eye-movements during walking in five healthy older adult controls (HC) and five people with PD. Fixations were identified using a previously validated algorithm, which was adapted to provide still images of fixation locations (n = 116). The fixation location was manually identified by two raters (DH, JN), who classified the locations. Cohen's kappa correlation coefficients determined the inter-rater reliability. The algorithm successfully provided still images for each fixation, allowing manual contextual analysis to be performed. The inter-rater reliability for classifying the fixation location was high for both PD (kappa = 0.80, 95% agreement) and HC groups (kappa = 0.80, 91% agreement), which indicated a reliable classification method. This study developed a reliable semi-automated contextual analysis method for gait studies in HC and PD. Future studies could adapt this methodology for various gait-related eye-tracking studies.
Infant polysomnography: reliability and validity of infant arousal assessment.
Crowell, David H; Kulp, Thomas D; Kapuniai, Linda E; Hunt, Carl E; Brooks, Lee J; Weese-Mayer, Debra E; Silvestri, Jean; Ward, Sally Davidson; Corwin, Michael; Tinsley, Larry; Peucker, Mark
2002-10-01
Infant arousal scoring based on the Atlas Task Force definition of transient EEG arousal was evaluated to determine (1). whether transient arousals can be identified and assessed reliably in infants and (2). whether arousal and no-arousal epochs scored previously by trained raters can be validated reliably by independent sleep experts. Phase I for inter- and intrarater reliability scoring was based on two datasets of sleep epochs selected randomly from nocturnal polysomnograms of healthy full-term, preterm, idiopathic apparent life-threatening event cases, and siblings of Sudden Infant Death Syndrome infants of 35 to 64 weeks postconceptional age. After training, test set 1 reliability was assessed and discrepancies identified. After retraining, test set 2 was scored by the same raters to determine interrater reliability. Later, three raters from the trained group rescored test set 2 to assess inter- and intrarater reliabilities. Interrater and intrarater reliability kappa's, with 95% confidence intervals, ranged from substantial to almost perfect levels of agreement. Interrater reliabilities for spontaneous arousals were initially moderate and then substantial. During the validation phase, 315 previously scored epochs were presented to four sleep experts to rate as containing arousal or no-arousal events. Interrater expert agreements were diverse and considered as noninterpretable. Concordance in sleep experts' agreements, based on identification of the previously sampled arousal and no-arousal epochs, was used as a secondary evaluative technique. Results showed agreement by two or more experts on 86% of the Collaborative Home Infant Monitoring Evaluation Study arousal scored events. Conversely, only 1% of the Collaborative Home Infant Monitoring Evaluation Study-scored no-arousal epochs were rated as an arousal. In summary, this study presents an empirically tested model with procedures and criteria for attaining improved reliability in transient EEG arousal assessments in infants using the modified Atlas Task Force standards. With training based on specific criteria, substantial inter- and intrarater agreement in identifying infant arousals was demonstrated. Corroborative validation results were too disparate for meaningful interpretation. Alternate evaluation based on concordance agreements supports reliance on infant EEG criteria for assessment. Results mandate additional confirmatory validation studies with specific training on infant EEG arousal assessment criteria.
Chapter 15: Reliability of Wind Turbines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sheng, Shuangwen; O'Connor, Ryan
The global wind industry has witnessed exciting developments in recent years. The future will be even brighter with further reductions in capital and operation and maintenance costs, which can be accomplished with improved turbine reliability, especially when turbines are installed offshore. One opportunity for the industry to improve wind turbine reliability is through the exploration of reliability engineering life data analysis based on readily available data or maintenance records collected at typical wind plants. If adopted and conducted appropriately, these analyses can quickly save operation and maintenance costs in a potentially impactful manner. This chapter discusses wind turbine reliability bymore » highlighting the methodology of reliability engineering life data analysis. It first briefly discusses fundamentals for wind turbine reliability and the current industry status. Then, the reliability engineering method for life analysis, including data collection, model development, and forecasting, is presented in detail and illustrated through two case studies. The chapter concludes with some remarks on potential opportunities to improve wind turbine reliability. An owner and operator's perspective is taken and mechanical components are used to exemplify the potential benefits of reliability engineering analysis to improve wind turbine reliability and availability.« less
Reliability and Agreement in Student Ratings of the Class Environment
ERIC Educational Resources Information Center
Nelson, Peter M.; Christ, Theodore J.
2016-01-01
The current study estimated the reliability and agreement of student ratings of the classroom environment obtained using the Responsive Environmental Assessment for Classroom Teaching (REACT; Christ, Nelson, & Demers, 2012; Nelson, Demers, & Christ, 2014). Coefficient alpha, class-level reliability, and class agreement indices were…
Test Assembly Implications for Providing Reliable and Valid Subscores
ERIC Educational Resources Information Center
Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J.
2017-01-01
This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…
Effect of knee angle on neuromuscular assessment of plantar flexor muscles: A reliability study
Cornu, Christophe; Jubeau, Marc
2018-01-01
Introduction This study aimed to determine the intra- and inter-session reliability of neuromuscular assessment of plantar flexor (PF) muscles at three knee angles. Methods Twelve young adults were tested for three knee angles (90°, 30° and 0°) and at three time points separated by 1 hour (intra-session) and 7 days (inter-session). Electrical (H reflex, M wave) and mechanical (evoked and maximal voluntary torque, activation level) parameters were measured on the PF muscles. Intraclass correlation coefficients (ICC) and coefficients of variation were calculated to determine intra- and inter-session reliability. Results The mechanical measurements presented excellent (ICC>0.75) intra- and inter-session reliabilities regardless of the knee angle considered. The reliability of electrical measurements was better for the 90° knee angle compared to the 0° and 30° angles. Conclusions Changes in the knee angle may influence the reliability of neuromuscular assessments, which indicates the importance of considering the knee angle to collect consistent outcomes on the PF muscles. PMID:29596480
Reliability and Validity of the Dyadic Observed Communication Scale (DOCS).
Hadley, Wendy; Stewart, Angela; Hunter, Heather L; Affleck, Katelyn; Donenberg, Geri; Diclemente, Ralph; Brown, Larry K
2013-02-01
We evaluated the reliability and validity of the Dyadic Observed Communication Scale (DOCS) coding scheme, which was developed to capture a range of communication components between parents and adolescents. Adolescents and their caregivers were recruited from mental health facilities for participation in a large, multi-site family-based HIV prevention intervention study. Seventy-one dyads were randomly selected from the larger study sample and coded using the DOCS at baseline. Preliminary validity and reliability of the DOCS was examined using various methods, such as comparing results to self-report measures and examining interrater reliability. Results suggest that the DOCS is a reliable and valid measure of observed communication among parent-adolescent dyads that captures both verbal and nonverbal communication behaviors that are typical intervention targets. The DOCS is a viable coding scheme for use by researchers and clinicians examining parent-adolescent communication. Coders can be trained to reliably capture individual and dyadic components of communication for parents and adolescents and this complex information can be obtained relatively quickly.
Between-day reliability of the trapezius muscle H-reflex and M-wave.
Vangsgaard, Steffen; Hansen, Ernst A; Madeleine, Pascal
2015-12-01
The aim of this study was to investigate the between-day reliability of the trapezius muscle H-reflex and M-wave. Sixteen healthy subjects were studied on 2 consecutive days. Trapezius muscle H-reflexes were evoked by electrical stimulation of the C3/4 cervical nerves; M-waves were evoked by electrical stimulation of the accessory nerve. Relative reliability was estimated by intraclass correlation coefficients (ICC2,1 ). Absolute reliability was estimated by computing the standard error of measurement (SEM) and the smallest real difference (SRD). Bland-Altman plots were constructed to detect any systematic bias. Variables showed substantial to excellent relative reliability (ICC = 0.70-0.99). The relative SEM ranged from 1.4% to 34.8%; relative SRD ranged from 3.8% to 96.5%. No systematic bias was present in the data. The amplitude and latency of the trapezius muscle H-reflex and M-wave in healthy young subjects can be measured reliably across days. © 2015 Wiley Periodicals, Inc.
Inter-rater and intra-rater reliability of the Bahasa Melayu version of Rose Angina Questionnaire.
Hassan, N B; Choudhury, S R; Naing, L; Conroy, R M; Rahman, A R A
2007-01-01
The objective of the study is to translate the Rose Questionnaire (RQ) into a Bahasa Melayu version and adapt it cross-culturally, and to measure its inter-rater and intrarater reliability. This cross sectional study was conducted in the respondents' homes or workplaces in Kelantan, Malaysia. One hundred respondents aged 30 and above with different socio-demographic status were interviewed for face validity. For each inter-rater and intra-rater reliability, a sample of 150 respondents was interviewed. Inter-rater and intra-rater reliabilities were assessed by Cohen's kappa. The overall inter-rater agreements by the five pair of interviewers at point one and two were 0.86, and intrarater reliability by the five interviewers on the seven-item questionnaire at poinone and two was 0.88, as measured by kappa coefficient. The translated Malay version of RQ demonstrated an almost perfect inter-rater and intra-rater reliability and further validation such as sensitivity and specificity analysis of this translated questionnaire is highly recommended.
Proposed reliability cost model
NASA Technical Reports Server (NTRS)
Delionback, L. M.
1973-01-01
The research investigations which were involved in the study include: cost analysis/allocation, reliability and product assurance, forecasting methodology, systems analysis, and model-building. This is a classic example of an interdisciplinary problem, since the model-building requirements include the need for understanding and communication between technical disciplines on one hand, and the financial/accounting skill categories on the other. The systems approach is utilized within this context to establish a clearer and more objective relationship between reliability assurance and the subcategories (or subelements) that provide, or reenforce, the reliability assurance for a system. Subcategories are further subdivided as illustrated by a tree diagram. The reliability assurance elements can be seen to be potential alternative strategies, or approaches, depending on the specific goals/objectives of the trade studies. The scope was limited to the establishment of a proposed reliability cost-model format. The model format/approach is dependent upon the use of a series of subsystem-oriented CER's and sometimes possible CTR's, in devising a suitable cost-effective policy.
Holm, Søren; Hofmann, Bjørn
2017-10-01
A precondition for reducing scientific misconduct is evidence about scientists' attitudes. We need reliable survey instruments, and this study investigates the reliability of Kalichman's "Survey 2: research misconduct" questionnaire. The study is a post hoc analysis of data from three surveys among biomedical doctoral students in Scandinavia (2010-2015). We perform reliability analysis, and exploratory and confirmatory factor analysis using a split-sample design as a partial validation. The results indicate that a reliable 13-item scale can be formed (Cronbach's α = .705), and factor analysis indicates that there are four reliable subscales each tapping a different construct: (a) general attitude to misconduct (α = .768), (b) attitude to personal misconduct (α = .784), (c) attitude to whistleblowing (α = .841), and (d) attitude to blameworthiness/punishment (α = .877). A full validation of the questionnaire requires further research. We, nevertheless, hope that the results will facilitate the increased use of the questionnaire in research.
NASA Astrophysics Data System (ADS)
Chaitusaney, Surachai; Yokoyama, Akihiko
In distribution system, Distributed Generation (DG) is expected to improve the system reliability as its backup generation. However, DG contribution in fault current may cause the loss of the existing protection coordination, e.g. recloser-fuse coordination and breaker-breaker coordination. This problem can drastically deteriorate the system reliability, and it is more serious and complicated when there are several DG sources in the system. Hence, the above conflict in reliability aspect unavoidably needs a detailed investigation before the installation or enhancement of DG is done. The model of composite DG fault current is proposed to find the threshold beyond which existing protection coordination is lost. Cases of protection miscoordination are described, together with their consequences. Since a distribution system may be tied with another system, the issues of tie line and on-site DG are integrated into this study. Reliability indices are evaluated and compared in the distribution reliability test system RBTS Bus 2.
Lesage, A D; Cyr, M; Toupin, J; Cormier, H; Valiquette, C
1991-01-01
Interview questionnaires offer more validity than self-administered format in exploring psychopathological or psychosocial phenomena of interest in psychiatric research. If used, special care needs to be paid to interviewers' training and ensuring that they maintain their reliability. No widespread training standards exist and each schedule may carry its own procedure. Our aims are to indicate how we trained interviewers with the French version of the Present State Examination (Wing, Cooper and Sartorius, 1974) and how we checked and kept acceptable interraters reliability during one study. We will provide data on the interraters reliability during the training and the study, as well as the test-retest reliability. These results will be used to support some guidelines when using this sort of psychiatric research questionnaires in order to ensure comparability both within the study and between studies.
Beardsley, Chris; Egerton, Tim; Skinner, Brendon
2016-01-01
Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81-0.88), test-re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88-0.95), and test-re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Conclusion. Inter-rater reliability and test-re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test-re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test-re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.
Tracking reliability for space cabin-borne equipment in development by Crow model.
Chen, J D; Jiao, S J; Sun, H L
2001-12-01
Objective. To study and track the reliability growth of manned spaceflight cabin-borne equipment in the course of its development. Method. A new technique of reliability growth estimation and prediction, which is composed of the Crow model and test data conversion (TDC) method was used. Result. The estimation and prediction value of the reliability growth conformed to its expectations. Conclusion. The method could dynamically estimate and predict the reliability of the equipment by making full use of various test information in the course of its development. It offered not only a possibility of tracking the equipment reliability growth, but also the reference for quality control in manned spaceflight cabin-borne equipment design and development process.
The Reliability and Validity of the Computerized Double Inclinometer in Measuring Lumbar Mobility
MacDermid, Joy Christine; Arumugam, Vanitha; Vincent, Joshua Israel; Carroll, Krista L
2014-01-01
Study Design : Repeated measures reliability/validity study. Objectives : To determine the concurrent validity, test-retest, inter-rater and intra-rater reliability of lumbar flexion and extension measurements using the Tracker M.E. computerized dual inclinometer (CDI) in comparison to the modified-modified Schober (MMS) Summary of Background : Numerous studies have evaluated the reliability and validity of the various methods of measuring spinal motion, but the results are inconsistent. Differences in equipment and techniques make it difficult to correlate results. Methods : Twenty subjects with back pain and twenty without back pain were selected through convenience sampling. Two examiners measured sagittal plane lumbar range of motion for each subject. Two separate tests with the CDI and one test with the MMS were conducted. Each test consisted of three trials. Instrument and examiner order was randomly assigned. Intra-class correlations (ICCs 2, 2 and 2, 2) and Pearson correlation coefficients (r) were used to calculate reliability and concurrent validity respectively. Results : Intra-trial reliability was high to very high for both the CDI (ICCs 0.85 - 0.96) and MMS (ICCs 0.84 - 0.98). However, the reliability was poor to moderate, when the CDI unit had to be repositioned either by the same rate (ICCs 0.16 - 0.59) or a different rater (ICCs 0.45 - 0.52). Inter-rater reliability for the MMS was moderate to high (ICCs 0.75 - 0.82) which bettered the moderate correlation obtained for the CDI (ICCs 0.45 - 0.52). Correlations between the CDI and MMS were poor for flexion (0.32; p<0.05) and poor to moderate (-0.42 - -0.51; p<0.05) for extension measurements. Conclusion : When using the CDI, an average of subsequent tests is required to obtain moderate reliability. The MMS was highly reliable than the CDI. The MMS and the CDI measure lumbar movement on a different metric that are not highly related to each other. PMID:25352928
Ringdal, Kjetil G; Skaga, Nils Oddvar; Steen, Petter Andreas; Hestnes, Morten; Laake, Petter; Jones, J Mary; Lossius, Hans Morten
2013-01-01
Pre-injury comorbidities can influence the outcomes of severely injured patients. Pre-injury comorbidity status, graded according to the American Society of Anesthesiologists Physical Status (ASA-PS) classification system, is an independent predictor of survival in trauma patients and is recommended as a comorbidity score in the Utstein Trauma Template for Uniform Reporting of Data. Little is known about the reliability of pre-injury ASA-PS scores. The objective of this study was to examine whether the pre-injury ASA-PS system was a reliable scale for grading comorbidity in trauma patients. Nineteen Norwegian trauma registry coders were invited to participate in a reliability study in which 50 real but anonymised patient medical records were distributed. Reliability was analysed using quadratic weighted kappa (κ(w)) analysis with 95% CI as the primary outcome measure and unweighted kappa (κ) analysis, which included unknown values, as a secondary outcome measure. Fifteen of the invitees responded to the invitation, and ten participated. We found moderate (κ(w)=0.77 [95% CI: 0.64-0.87]) to substantial (κ(w)=0.95 [95% CI: 0.89-0.99]) rater-against-reference standard reliability using κ(w) and fair (κ=0.46 [95% CI: 0.29-0.64]) to substantial (κ=0.83 [95% CI: 0.68-0.94]) reliability using κ. The inter-rater reliability ranged from moderate (κ(w)=0.66 [95% CI: 0.45-0.81]) to substantial (κ(w)=0.96 [95% CI: 0.88-1.00]) for κ(w) and from slight (κ=0.36 [95% CI: 0.21-0.54]) to moderate (κ=0.75 [95% CI: 0.62-0.89]) for κ. The rater-against-reference standard reliability varied from moderate to substantial for the primary outcome measure and from fair to substantial for the secondary outcome measure. The study findings indicate that the pre-injury ASA-PS scale is a reliable score for classifying comorbidity in trauma patients. Copyright © 2012 Elsevier Ltd. All rights reserved.
The reliability of cause-of-death coding in The Netherlands.
Harteloh, Peter; de Bruin, Kim; Kardaun, Jan
2010-08-01
Cause-of-death statistics are a major source of information for epidemiological research or policy decisions. Information on the reliability of these statistics is important for interpreting trends in time or differences between populations. Variations in coding the underlying cause of death could hinder the attribution of observed differences to determinants of health. Therefore we studied the reliability of cause-of-death statistics in The Netherlands. We performed a double coding study. Death certificates from the month of May 2005 were coded again in 2007. Each death certificate was coded manually by four coders. Reliability was measured by calculating agreement between coders (intercoder agreement) and by calculating the consistency of each individual coder in time (intracoder agreement). Our analysis covered an amount of 10,833 death certificates. The intercoder agreement of four coders on the underlying cause of death was 78%. In 2.2% of the cases coders agreed on a change of the code assigned in 2005. The (mean) intracoder agreement of four coders was 89%. Agreement was associated with the specificity of the ICD-10 code (chapter, three digits, four digits), the age of the deceased, the number of coders and the number of diseases reported on the death certificate. The reliability of cause-of-death statistics turned out to be high (>90%) for major causes of death such as cancers and acute myocardial infarction. For chronic diseases, such as diabetes and renal insufficiency, reliability was low (<70%). The reliability of cause-of-death statistics varies by ICD-10 code/chapter. A statistical office should provide coders with (additional) rules for coding diseases with a low reliability and evaluate these rules regularly. Users of cause-of-death statistics should exercise caution when interpreting causes of death with a low reliability. Studies of reliability should take into account the number of coders involved and the number of codes on a death certificate.
2014-01-01
Background A balance test provides important information such as the standard to judge an individual’s functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Methods Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). Results The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. Conclusion The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment. PMID:24912769
Park, Dae-Sung; Lee, GyuChang
2014-06-10
A balance test provides important information such as the standard to judge an individual's functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment.
Décary, Simon; Ouellet, Philippe; Vendittoli, Pascal-André; Desmeules, François
2016-12-01
Clinicians often rely on physical examination tests to guide them in the diagnostic process of knee disorders. However, reliability of these tests is often overlooked and may influence the consistency of results and overall diagnostic validity. Therefore, the objective of this study was to systematically review evidence on the reliability of physical examination tests for the diagnosis of knee disorders. A structured literature search was conducted in databases up to January 2016. Included studies needed to report reliability measures of at least one physical test for any knee disorder. Methodological quality was evaluated using the QAREL checklist. A qualitative synthesis of the evidence was performed. Thirty-three studies were included with a mean QAREL score of 5.5 ± 0.5. Based on low to moderate quality evidence, the Thessaly test for meniscal injuries reached moderate inter-rater reliability (k = 0.54). Based on moderate to excellent quality evidence, the Lachman for anterior cruciate ligament injuries reached moderate to excellent inter-rater reliability (k = 0.42 to 0.81). Based on low to moderate quality evidence, the Tibiofemoral Crepitus, Joint Line and Patellofemoral Pain/Tenderness, Bony Enlargement and Joint Pain on Movement tests for knee osteoarthritis reached fair to excellent inter-rater reliability (k = 0.29 to 0.93). Based on low to moderate quality evidence, the Lateral Glide, Lateral Tilt, Lateral Pull and Quality of Movement tests for patellofemoral pain reached moderate to good inter-rater reliability (k = 0.49 to 0.73). Many physical tests appear to reach good inter-rater reliability, but this is based on low-quality and conflicting evidence. High-quality research is required to evaluate the reliability of knee physical examination tests. Copyright © 2016 Elsevier Ltd. All rights reserved.
Interhemispheric Inhibition Measurement Reliability in Stroke: A Pilot Study
Cassidy, Jessica M.; Chu, Haitao; Chen, Mo; Kimberley, Teresa J.; Carey, James R.
2016-01-01
Objective Reliable transcranial magnetic stimulation (TMS) measures for probing corticomotor excitability are important when assessing the physiological effects of non-invasive brain stimulation. The primary objective of this study was to examine test-retest reliability of an interhemispheric inhibition (IHI) index measurement in stroke. Materials and Methods Ten subjects with chronic stroke (≥ 6 months) completed two IHI testing sessions per week for three weeks (six testing sessions total). A single investigator measured IHI in the contra- to-ipsilesional primary motor cortex direction and in the opposite direction using bilateral paired-pulse TMS. Weekly sessions were separated by 24 hours with a 1-week washout period separating testing weeks. To determine if motor-evoked potential (MEP) quantification method affected measurement reliability, IHI indices computed from both MEP amplitude and area responses were found. Reliability was assessed with two-way, mixed intraclass correlation coefficients (ICC(3,k)). Standard error of measurement and minimal detectable difference statistics were also determined. Results With the exception of the initial testing week, IHI indices measured in the contra-to-ipsilesional hemisphere direction demonstrated moderate to excellent reliability (ICC = 0.725 – 0.913). Ipsi-to-contralesional IHI indices depicted poor or invalid reliability estimates throughout the three-week testing duration (ICC= −1.153 – 0.105). The overlap of ICC 95% confidence intervals suggested that IHI indices using MEP amplitude vs. area measures did not differ with respect to reliability. Conclusions IHI indices demonstrated varying magnitudes of reliability irrespective of MEP quantification method. Several strategies for improving IHI index measurement reliability are discussed. PMID:27333364
Multisite Reliability of Cognitive BOLD Data
Brown, Gregory G.; Mathalon, Daniel H.; Stern, Hal; Ford, Judith; Mueller, Bryon; Greve, Douglas N.; McCarthy, Gregory; Voyvodic, Jim; Glover, Gary; Diaz, Michele; Yetter, Elizabeth; Burak Ozyurt, I.; Jorgensen, Kasper W.; Wible, Cynthia G.; Turner, Jessica A.; Thompson, Wesley K.; Potkin, Steven G.
2010-01-01
Investigators perform multi-site functional magnetic resonance imaging studies to increase statistical power, to enhance generalizability, and to improve the likelihood of sampling relevant subgroups. Yet undesired site variation in imaging methods could off-set these potential advantages. We used variance components analysis to investigate sources of variation in the blood oxygen level dependent (BOLD) signal across four 3T magnets in voxelwise and region of interest (ROI) analyses. Eighteen participants traveled to four magnet sites to complete eight runs of a working memory task involving emotional or neutral distraction. Person variance was more than 10 times larger than site variance for five of six ROIs studied. Person-by-site interactions, however, contributed sizable unwanted variance to the total. Averaging over runs increased between-site reliability, with many voxels showing good to excellent between-site reliability when eight runs were averaged and regions of interest showing fair to good reliability. Between-site reliability depended on the specific functional contrast analyzed in addition to the number of runs averaged. Although median effect size was correlated with between-site reliability, dissociations were observed for many voxels. Brain regions where the pooled effect size was large but between-site reliability was poor were associated with reduced individual differences. Brain regions where the pooled effect size was small but between-site reliability was excellent were associated with a balance of participants who displayed consistently positive or consistently negative BOLD responses. Although between-site reliability of BOLD data can be good to excellent, acquiring highly reliable data requires robust activation paradigms, ongoing quality assurance, and careful experimental control. PMID:20932915
Balaguier, Romain; Madeleine, Pascal; Vuillerme, Nicolas
2016-01-01
The assessment of pressure pain threshold (PPT) provides a quantitative value related to the mechanical sensitivity to pain of deep structures. Although excellent reliability of PPT has been reported in numerous anatomical locations, its absolute and relative reliability in the lower back region remains to be determined. Because of the high prevalence of low back pain in the general population and because low back pain is one of the leading causes of disability in industrialized countries, assessing pressure pain thresholds over the low back is particularly of interest. The purpose of this study study was (1) to evaluate the intra- and inter- absolute and relative reliability of PPT within 14 locations covering the low back region of asymptomatic individuals and (2) to determine the number of trial required to ensure reliable PPT measurements. Fifteen asymptomatic subjects were included in this study. PPTs were assessed among 14 anatomical locations in the low back region over two sessions separated by one hour interval. For the two sessions, three PPT assessments were performed on each location. Reliability was assessed computing intraclass correlation coefficients (ICC), standard error of measurement (SEM) and minimum detectable change (MDC) for all possible combinations between trials and sessions. Bland-Altman plots were also generated to assess potential bias in the dataset. Relative reliability for both intra- and inter- session was almost perfect with ICC ranged from 0.85 to 0.99. With respect to the intra-session, no statistical difference was reported for ICCs and SEM regardless of the conducted comparisons between trials. Conversely, for inter-session, ICCs and SEM values were significantly larger when two consecutive PPT measurements were used for data analysis. No significant difference was observed for the comparison between two consecutive measurements and three measurements. Excellent relative and absolute reliabilities were reported for both intra- and inter-session. Reliable measurements can be equally achieved when using the mean of two or three consecutive PPT measurements, as usually proposed in the literature, or with only the first one. Although reliability was almost perfect regardless of the conducted comparison between PPT assessments, our results suggest using two consecutive measurements to obtain higher short term absolute reliability.
NDE reliability and probability of detection (POD) evolution and paradigm shift
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Surendra
2014-02-18
The subject of NDE Reliability and POD has gone through multiple phases since its humble beginning in the late 1960s. This was followed by several programs including the important one nicknamed “Have Cracks – Will Travel” or in short “Have Cracks” by Lockheed Georgia Company for US Air Force during 1974–1978. This and other studies ultimately led to a series of developments in the field of reliability and POD starting from the introduction of fracture mechanics and Damaged Tolerant Design (DTD) to statistical framework by Bernes and Hovey in 1981 for POD estimation to MIL-STD HDBK 1823 (1999) and 1823Amore » (2009). During the last decade, various groups and researchers have further studied the reliability and POD using Model Assisted POD (MAPOD), Simulation Assisted POD (SAPOD), and applying Bayesian Statistics. All and each of these developments had one objective, i.e., improving accuracy of life prediction in components that to a large extent depends on the reliability and capability of NDE methods. Therefore, it is essential to have a reliable detection and sizing of large flaws in components. Currently, POD is used for studying reliability and capability of NDE methods, though POD data offers no absolute truth regarding NDE reliability, i.e., system capability, effects of flaw morphology, and quantifying the human factors. Furthermore, reliability and POD have been reported alike in meaning but POD is not NDE reliability. POD is a subset of the reliability that consists of six phases: 1) samples selection using DOE, 2) NDE equipment setup and calibration, 3) System Measurement Evaluation (SME) including Gage Repeatability and Reproducibility (Gage R and R) and Analysis Of Variance (ANOVA), 4) NDE system capability and electronic and physical saturation, 5) acquiring and fitting data to a model, and data analysis, and 6) POD estimation. This paper provides an overview of all major POD milestones for the last several decades and discuss rationale for using Integrated Computational Materials Engineering (ICME), MAPOD, SAPOD, and Bayesian statistics for studying controllable and non-controllable variables including human factors for estimating POD. Another objective is to list gaps between “hoped for” versus validated or fielded failed hardware.« less
NASA Technical Reports Server (NTRS)
Feldstein, J. F.
1977-01-01
Failure data from 16 commercial spacecraft were analyzed to evaluate failure trends, reliability growth, and effectiveness of tests. It was shown that the test programs were highly effective in ensuring a high level of in-orbit reliability. There was only a single catastrophic problem in 44 years of in-orbit operation on 12 spacecraft. The results also indicate that in-orbit failure rates are highly correlated with unit and systems test failure rates. The data suggest that test effectiveness estimates can be used to guide the content of a test program to ensure that in-orbit reliability goals are achieved.
Reliability analysis of interdependent lattices
NASA Astrophysics Data System (ADS)
Limiao, Zhang; Daqing, Li; Pengju, Qin; Bowen, Fu; Yinan, Jiang; Zio, Enrico; Rui, Kang
2016-06-01
Network reliability analysis has drawn much attention recently due to the risks of catastrophic damage in networked infrastructures. These infrastructures are dependent on each other as a result of various interactions. However, most of the reliability analyses of these interdependent networks do not consider spatial constraints, which are found important for robustness of infrastructures including power grid and transport systems. Here we study the reliability properties of interdependent lattices with different ranges of spatial constraints. Our study shows that interdependent lattices with strong spatial constraints are more resilient than interdependent Erdös-Rényi networks. There exists an intermediate range of spatial constraints, at which the interdependent lattices have minimal resilience.
Study of Fuze Structure and Reliability Design Based on the Direct Search Method
NASA Astrophysics Data System (ADS)
Lin, Zhang; Ning, Wang
2017-03-01
Redundant design is one of the important methods to improve the reliability of the system, but mutual coupling of multiple factors is often involved in the design. In my study, Direct Search Method is introduced into the optimum redundancy configuration for design optimization, in which, the reliability, cost, structural weight and other factors can be taken into account simultaneously, and the redundant allocation and reliability design of aircraft critical system are computed. The results show that this method is convenient and workable, and applicable to the redundancy configurations and optimization of various designs upon appropriate modifications. And this method has a good practical value.
Clinical instruments: reliability and validity critical appraisal.
Brink, Yolandi; Louw, Quinette A
2012-12-01
RATIONALE, AIM AND OBJECTIVES: There is a lack of health care practitioners using objective clinical tools with sound psychometric properties. There is also a need for researchers to improve their reporting of the validity and reliability results of these clinical tools. Therefore, to promote the use of valid and reliable tools or tests for clinical evaluation, this paper reports on the development of a critical appraisal tool to assess the psychometric properties of objective clinical tools. A five-step process was followed to develop the new critical appraisal tool: (1) preliminary conceptual decisions; (2) defining key concepts; (3) item generation; (4) assessment of face validity; and (5) formulation of the final tool. The new critical appraisal tool consists of 13 items, of which five items relate to both validity and reliability studies, four items to validity studies only and four items to reliability studies. The 13 items could be scored as 'yes', 'no' or 'not applicable'. This critical appraisal tool will aid both the health care practitioner to critically appraise the relevant literature and researchers to improve the quality of reporting of the validity and reliability of objective clinical tools. © 2011 Blackwell Publishing Ltd.
Reliability of temporal summation and diffuse noxious inhibitory control
Cathcart, Stuart; Winefield, Anthony H; Rolan, Paul; Lushington, Kurt
2009-01-01
BACKGROUND: The test-retest reliability of temporal summation (TS) and diffuse noxious inhibitory control (DNIC) has not been reported to date. Establishing such reliability would support the possibility of future experimental studies examining factors affecting TS and DNIC. Similarly, the use of manual algometry to induce TS, or an occlusion cuff to induce DNIC of TS to mechanical stimuli, has not been reported to date. Such devices may offer a simpler method than current techniques for inducing TS and DNIC, affording assessment at more anatomical locations and in more varied research settings. METHOD: The present study assessed the test-retest reliability of TS and DNIC using the above techniques. Sex differences on these measures were also investigated. RESULTS: Repeated measures ANOVA indicated successful induction of TS and DNIC, with no significant differences across test-retest occasions. Sex effects were not significant for any measure or interaction. Intraclass correlations indicated high test-retest reliability for all measures; however, there was large interindividual variation between test and retest measurements. CONCLUSION: The present results indicate acceptable within-session test-retest reliability of TS and DNIC. The results support the possibility of future experimental studies examining factors affecting TS and DNIC. PMID:20011713
Nazary-Moghadam, Salman; Zeinalzadeh, Afsaneh; Salavati, Mahyar; Almasi, Simin; Negahban, Hossein
2017-01-01
The aim of the present study was to culturally adapt and evaluate reliability and validity of Health Assessment Questionnaire-Disability Index (HAQ-DI) in Iranian patients with rheumatoid arthritis (RA). 234 patients with RA for validation study, Eighty-six participants for reliability study. Test-retest relative reliability and internal consistency of Persian version of HAQ-DI were examined by intraclass correlation coefficient (ICC) and Cronbach's alpha, respectively. Additionally, HAQ-DI construct validity (Spearman's correlation) was examined using Persian version of Short-Form 36 Health survey (SF-36), activity and severity parameters. Persian version of HAQ-DI total score showed excellent test-retest reliability (ICC = 0.98) and internal consistency (Cronbach's alpha = 0.95). Spearman's correlations between the total PHAQ-DI score and activity and severity parameters were above 0.55. Correlation between PHAQ-DI and SF-36 Physical Health were higher as compared with SF-36 Mental Health. Persian version of HAQ-DI is a reliable and valid culturally-adapted instrument in order to measure functional limitations in Iranian people with RA. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bosakova, Lucia; Kolarcik, Peter; Bobakova, Daniela; Sulcova, Martina; Van Dijk, Jitse P; Reijneveld, Sijmen A; Geckova, Andrea Madarasova
2016-04-01
Participation in organized activities is related with a range of positive outcomes, but the way such participation is measured has not been scrutinized. Test-retest reliability as an important indicator of a scale's reliability has been assessed rarely and for "The scale of participation in organized activities" lacks completely. This test-retest study is based on the Health Behaviour in School-aged Children study and is consistent with its methodology. We obtained data from 353 Czech (51.9 % boys) and 227 Slovak (52.9 % boys) primary school pupils, grades five and nine, who participated in this study in 2013. We used Cohen's kappa statistic and single measures of the intraclass correlation coefficient to estimate the test-retest reliability of all selected items in the sample, stratified by gender, age and country. We mostly observed a large correlation between the test and retest in all of the examined variables (κ ranged from 0.46 to 0.68). Test-retest reliability of the sum score of individual items showed substantial agreement (ICC = 0.64). The scale of participation in organized activities has an acceptable level of agreement, indicating good reliability.
Ertuğ, Nurcan
2018-06-01
The aim of this study was to determine the validity and reliability of the Turkish version of the V-scale, which measures nurses' attitudes towards vital signs monitoring in the detection of clinical deterioration. This validity and reliability study was conducted at a tertiary hospital in Ankara, Turkey, in 2016. A total of 169 ward nurses participated in the study. Exploratory factor analysis, Cronbach's alpha coefficient, and the intraclass correlation coefficient were used to determine the validity and reliability of the scale. A 5-factor, 16-item scale explained 60.823% of the total variance according to the validity analysis. Our version matched the original scale in terms of the number of items and factor structure. Cronbach's alpha coefficient of the Turkish version of the V-scale was 0.764. The test-retest reliability results were 0.855 for the overall intraclass correlation coefficient, and the t-test result was P > 0.05. The V-scale is a reliable and valid instrument to measure Turkish nurses' attitudes towards vital signs monitoring in the detection of clinical deterioration. © 2018 John Wiley & Sons Australia, Ltd.
An In vitro evaluation of the reliability of QR code denture labeling technique
Poovannan, Sindhu; Jain, Ashish R.; Krishnan, Cakku Jalliah Venkata; Chandran, Chitraa R.
2016-01-01
Statement of Problem: Positive identification of the dead after accidents and disasters through labeled dentures plays a key role in forensic scenario. A number of denture labeling methods are available, and studies evaluating their reliability under drastic conditions are vital. Aim: This study was conducted to evaluate the reliability of QR (Quick Response) Code labeled at various depths in heat-cured acrylic blocks after acid treatment, heat treatment (burns), and fracture in forensics. It was an in vitro study. Materials and Methods: This study included 160 specimens of heat-cured acrylic blocks (1.8 cm × 1.8 cm) and these were divided into 4 groups (40 samples per group). QR Codes were incorporated in the samples using clear acrylic sheet and they were assessed for reliability under various depths, acid, heat, and fracture. Data were analyzed using Chi-square test, test of proportion. Results: The QR Code inclusion technique was reliable under various depths of acrylic sheet, acid (sulfuric acid 99%, hydrochloric acid 40%) and heat (up to 370°C). Results were variable with fracture of QR Code labeled acrylic blocks. Conclusion: Within the limitations of the study, by analyzing the results, it was clearly indicated that the QR Code technique was reliable under various depths of acrylic sheet, acid, and heat (370°C). Effectiveness varied in fracture and depended on the level of distortion. This study thus suggests that QR Code is an effective and simpler denture labeling method. PMID:28123284
Sources of medicine information and their reliability evaluated by medicine users.
Närhi, Ulla
2007-12-01
To study the medicine users' sources of medicine information and the perceived reliability of these sources in different age groups. A computer-aided telephone interview (CATI) to Finnish consumers (n = 1,004). Those respondents (n = 714) who reported using any prescription or self-medication medicines more than once a month were included in the study. The respondents were interviewed about their use of sources of medicine information during the previous 6 months. The reliability of sources in different age groups was estimated using a 4-point scale: very reliable, somewhat reliable, somewhat unreliable and very unreliable. The respondents also had the option of being unable to make an appraisal. A proportion of respondents reporting using the source, number of mentioned sources and their reliability evaluated by respondents. About half of the respondents in each age group mentioned two to four sources. The most common sources of information were Patient Information Leaflets (PILs) (74%), doctors (68%) and pharmacists (60%). Next came television (40%), newspapers and magazines (40%), drug advertisements (32%), nurses (28%), drug information leaflets (27%), relatives and friends (24%), medicine guides and books (22%) and the Internet (20%). There was a significant difference between age groups in reporting the Internet as a source of medicine information (15-34-year-old respondents reported the greatest Internet use). The three most reliable sources in every age group were reported to be PILs, doctors and pharmacists. Nurses, drug regulatory authorities, drug information leaflets and medicine guides and books were considered next most reliable. Relatives and friends, television, newspapers and magazines were considered the least reliable. The respondents were most uncertain about the reliability of the Internet, patient organisations and telephone services. There was a significant difference between age groups in evaluating the reliability of telephone services (15-34-year-olds found them more reliable). Medicine users reported receiving medicine information from many sources. The most commonly used sources were perceived as the most reliable, but their reliability did not seem to depend on age. The counsellors should take into account that patients have many sources of medicine information, with varying validity.
The Typical General Aviation Aircraft
NASA Technical Reports Server (NTRS)
Turnbull, Andrew
1999-01-01
The reliability of General Aviation aircraft is unknown. In order to "assist the development of future GA reliability and safety requirements", a reliability study needs to be performed. Before any studies on General Aviation aircraft reliability begins, a definition of a typical aircraft that encompasses most of the general aviation characteristics needs to be defined. In this report, not only is the typical general aviation aircraft defined for the purpose of the follow-on reliability study, but it is also separated, or "sifted" into several different categories where individual analysis can be performed on the reasonably independent systems. In this study, the typical General Aviation aircraft is a four-place, single engine piston, all aluminum fixed-wing certified aircraft with a fixed tricycle landing gear and a cable operated flight control system. The system breakdown of a GA aircraft "sifts" the aircraft systems and components into five categories: Powerplant, Airframe, Aircraft Control Systems, Cockpit Instrumentation Systems, and the Electrical Systems. This breakdown was performed along the lines of a failure of the system. Any component that caused a system to fail was considered a part of that system.
2011-01-01
Background The aim of this study was to develop a child-specific classification system for long bone fractures and to examine its reliability and validity on the basis of a prospective multicentre study. Methods Using the sequentially developed classification system, three samples of between 30 and 185 paediatric limb fractures from a pool of 2308 fractures documented in two multicenter studies were analysed in a blinded fashion by eight orthopaedic surgeons, on a total of 5 occasions. Intra- and interobserver reliability and accuracy were calculated. Results The reliability improved with successive simplification of the classification. The final version resulted in an overall interobserver agreement of κ = 0.71 with no significant difference between experienced and less experienced raters. Conclusions In conclusion, the evaluation of the newly proposed classification system resulted in a reliable and routinely applicable system, for which training in its proper use may further improve the reliability. It can be recommended as a useful tool for clinical practice and offers the option for developing treatment recommendations and outcome predictions in the future. PMID:21548939
Cha, Young Joo; Lee, Jae Jin; Kim, Do Hyun; You, Joshua Sung H
2017-10-23
Core stabilization plays an important role in the regulation of postural stability. To overcome shortcomings associated with pain and severe core instability during conventional core stabilization tests, we recently developed the dynamic neuromuscular stabilization-based heel sliding (DNS-HS) test. The purpose of this study was to establish the criterion validity and test-retest reliability of the novel DNS-HS test. Twenty young adults with core instability completed both the bilateral straight leg lowering test (BSLLT) and DNS-HS test for the criterion validity study and repeated the DNS-HS test for the test-retest reliability study. Criterion validity was determined by comparing hip joint angle data that were obtained from BSLLT and DNS-HS measures. The test-retest reliability was determined by comparing hip joint angle data. Criterion validity was (ICC2,3) = 0.700 (p< 0.05), suggesting a good relationship between the two core stability measures. Test-retest reliability was (ICC3,3) = 0.953 (p< 0.05), indicating excellent consistency between the repeated DNS-HS measurements. Criterion validity data demonstrated a good relationship between the gold standard BSLLT and DNS-HS core stability measures. Test-retest reliability data suggests that DNS-HS core stability was a reliable test for core stability. Clinically, the DNS-HS test is useful to objectively quantify core instability and allow early detection and evaluation.
McCreesh, Karen M; Anjum, Shakeel; Crotty, James M; Lewis, Jeremy S
2016-01-01
Rotator cuff (RC) tendinopathy has been widely ascribed to impingement of the supraspinatus tendon (SsT) in the subacromial space, measured as the acromiohumeral distance (AHD). Ultrasound (US) is suitable for measuring AHD and SsT thickness, but few reliability studies have been carried out in symptomatic populations, and interrater reliability is unconfirmed. This study aimed to examine the intrarater and interrater reliability of US measurements of AHD and SsT thickness in asymptomatic control subjects and patients with RC tendinopathy. Seventy participants were recruited and grouped as healthy controls (n = 25) and RC tendinopathy (n = 45). Repeated US measurements of AHD and SsT thickness were obtained by one rater in both groups and by two raters in the RC tendinopathy group. Intrarater and interrater reliability coefficients were excellent for both measurements (intraclass correlation > 0.92), but the intrarater reliability was superior. The minimal detectable change values in the symptomatic group were 0.7 mm for AHD and 0.6 mm for SsT thickness for a single experienced examiner; the values rose to 1.2 mm and 1.3 mm, respectively, for the pair of examiners. The results support the reliability of US for the measurement of AHD and SsT thickness in patients with symptomatic RC tendinopathy and provide minimal detectable change values for use in future research studies. © 2015 Wiley Periodicals, Inc.
Dreessen, L; Arntz, A
1998-01-01
The short-interval test-retest interrater reliability of the Structured Clinical Interview for DSM-III-R personality disorders (SCID-II) was studied in a psychotherapy outpatient group whose main complaint was mostly an Axis I anxiety disorder. Using a test-retest approach to assess interrater reliability, three sources of variance were taken into account (rater variance in the elicitation and interpretation of information and patient variance across interviews). Base rate requirements were established before calculating reliability coefficients. On the whole, interrater agreement on the SCID-II was found to be satisfactory, except for the histrionic personality traits. This is the first study that has estimated short-interval test-retest interrater reliability of the SCID-II in outpatients, and also the first that has studied single SCID-II traits and dimensional diagnoses. The results found support the use of the SCID-II as a diagnostic instrument for clinical and research purposes.
Validation of a new classification system for skin tears.
LeBlanc, Kimberly; Baranoski, Sharon; Holloway, Samantha; Langemo, Diane
2013-06-01
The aim of this study was to validate and establish reliability of the International Skin Tear classification system. A consensus panel of 12 internationally recognized key opinion leaders convened in 2011 to establish consensus statements on the prevention, prediction, assessment, and treatment of skin tears. Subsequently, a new skin tear classification system was proposed. The system was then tested for interrater and intrarater reliability between the experts before being tested more widely on a sample of 327 individuals from the United States, Canada, and Europe. The results of the study indicated a substantial level of agreement for the expert panel (Fleiss κ = 0.619; 2-month follow-up = 0.653). Intrarater reliability was high (Cohen κ = 0.877). Interrater reliability was moderate (Fleiss κ = 0.555) for healthcare professionals (n = 303) and fair for non-health professionals (Fleiss κ = 0.338; n = 24). This international study established the reliability and validity of a new classification system for skin tears.
Jung, Kyoung-Sim; Jung, Jin-Hwa; In, Tae-Sung; Cho, Hwi-Young
2016-09-01
[Purpose] The purpose of this study was to establish the reliability and validity of the Short Musculoskeletal Function Assessment questionnaire, which was translated into Korean, for patients with musculoskeletal disorder. [Subjects and Methods] Fifty-five subjects (26 males and 29 females) with musculoskeletal diseases participated in the study. The Short Musculoskeletal Function Assessment questionnaire focuses on a limited range of physical functions and includes a dysfunction index and a bother index. Reliability was determined using the intraclass correlation coefficient, and validity was examined by correlating short musculoskeletal function assessment scores with the 36-item Short-Form Health Survey (SF-36) score. [Results] The reliability was 0.97 for the dysfunction index and 0.94 for the bother index. Validity was established by comparison with Korean version of the SF-36. [Conclusion] This study demonstrated that the Korean version of the Short Musculoskeletal Function Assessment questionnaire is a reliable and valid instrument for the assessment of musculoskeletal disorders.
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies
Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry
2017-01-01
Objectives To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Design Systematic review and narrative synthesis of reproducibility studies. Data sources Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Eligibility criteria Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations.Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. Results From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ−0.10). Inter-rater reliability was poor in six studies (37%) and excellent in only two (13%). This contrasts with studies conducted in the research setting, where the median inter-rater reliability was 0.76 (range 0.91-0.53), and 71% (5/7) studies achieved excellent inter-rater reliability. Reliability between assessing professionals was higher when the evaluation was guided by a standardised instrument (23 studies, P=0.006). No such association was detected for subjective or chronic health conditions or the studies’ generalisability to real world evaluation of disability (P=0.46, 0.45, and 0.65, respectively). Conclusions Despite their common use and far reaching consequences for workers claiming disabling injury or illness, research on the reliability of medical evaluations of disability for work is limited and indicates high variation in judgments among assessing professionals. Standardising the evaluation process could improve reliability. Development and testing of instruments and structured approaches to improve reliability in evaluation of disability are urgently needed. PMID:28122727
Validity and Reliability of the School Physical Activity Environment Questionnaire
ERIC Educational Resources Information Center
Martin, Jeffrey J.; McCaughtry, Nate; Flory, Sara; Murphy, Anne; Wisdom, Kimberlydawn
2011-01-01
The goal of the current study was to establish the factor validity of the Questionnaire Assessing School Physical Activity Environment (Robertson-Wilson, Levesque, & Holden, 2007) using confirmatory factor analysis procedures. Another goal was to establish internal reliability and test-retest reliability. The confirmatory factor analysis…
O'Grady, Michael G; Dusing, Stacey C
2015-01-01
Play is vital for development. Infants and children learn through play. Traditional standardized developmental tests measure whether a child performs individual skills within controlled environments. Play-based assessments can measure skill performance during natural, child-driven play. The purpose of this study was to systematically review reliability, validity, and responsiveness of all play-based assessments that quantify motor and cognitive skills in children from birth to 36 months of age. Studies were identified from a literature search using PubMed, ERIC, CINAHL, and PsycINFO databases and the reference lists of included papers. Included studies investigated reliability, validity, or responsiveness of play-based assessments that measured motor and cognitive skills for children to 36 months of age. Two reviewers independently screened 40 studies for eligibility and inclusion. The reviewers independently extracted reliability, validity, and responsiveness data. They examined measurement properties and methodological quality of the included studies. Four current play-based assessment tools were identified in 8 included studies. Each play-based assessment tool measured motor and cognitive skills in a different way during play. Interrater reliability correlations ranged from .86 to .98 for motor development and from .23 to .90 for cognitive development. Test-retest reliability correlations ranged from .88 to .95 for motor development and from .45 to .91 for cognitive development. Structural validity correlations ranged from .62 to .90 for motor development and from .42 to .93 for cognitive development. One study assessed responsiveness to change in motor development. Most studies had small and poorly described samples. Lack of transparency in data management and statistical analysis was common. Play-based assessments have potential to be reliable and valid tools to assess cognitive and motor skills, but higher-quality research is needed. Psychometric properties should be considered for each play-based assessment before it is used in clinical and research practice. © 2015 American Physical Therapy Association.
Reliability and validity of the McDonald Play Inventory.
McDonald, Ann E; Vigen, Cheryl
2012-01-01
This study examined the ability of a two-part self-report instrument, the McDonald Play Inventory, to reliably and validly measure the play activities and play styles of 7- to 11-yr-old children and to discriminate between the play of neurotypical children and children with known learning and developmental disabilities. A total of 124 children ages 7-11 recruited from a sample of convenience and a subsample of 17 parents participated in this study. Reliability estimates yielded moderate correlations for internal consistency, total test intercorrelations, and test-retest reliability. Validity estimates were established for content and construct validity. The results suggest that a self-report instrument yields reliable and valid measures of a child's perceived play performance and discriminates between the play of children with and without disabilities. Copyright © 2012 by the American Occupational Therapy Association, Inc.
Reliability modelling and analysis of thermal MEMS
NASA Astrophysics Data System (ADS)
Muratet, Sylvaine; Lavu, Srikanth; Fourniols, Jean-Yves; Bell, George; Desmulliez, Marc P. Y.
2006-04-01
This paper presents a MEMS reliability study methodology based on the novel concept of 'virtual prototyping'. This methodology can be used for the development of reliable sensors or actuators and also to characterize their behaviour in specific use conditions and applications. The methodology is demonstrated on the U-shaped micro electro thermal actuator used as test vehicle. To demonstrate this approach, a 'virtual prototype' has been developed with the modeling tools MatLab and VHDL-AMS. A best practice FMEA (Failure Mode and Effect Analysis) is applied on the thermal MEMS to investigate and assess the failure mechanisms. Reliability study is performed by injecting the identified defaults into the 'virtual prototype'. The reliability characterization methodology predicts the evolution of the behavior of these MEMS as a function of the number of cycles of operation and specific operational conditions.
Spaan, Suzanne; Pronk, Anjoeka; Koch, Holger M; Jusko, Todd A; Jaddoe, Vincent W V; Shaw, Pamela A; Tiemeier, Henning M; Hofman, Albert; Pierik, Frank H; Longnecker, Matthew P
2015-05-01
The widespread use of organophosphate (OP) pesticides has resulted in ubiquitous exposure in humans, primarily through their diet. Exposure to OP pesticides may have adverse health effects, including neurobehavioral deficits in children. The optimal design of new studies requires data on the reliability of urinary measures of exposure. In the present study, urinary concentrations of six dialkyl phosphate (DAP) metabolites, the main urinary metabolites of OP pesticides, were determined in 120 pregnant women participating in the Generation R Study in Rotterdam. Intra-class correlation coefficients (ICCs) across serial urine specimens taken at <18, 18-25, and >25 weeks of pregnancy were determined to assess reliability. Geometric mean total DAP metabolite concentrations were 229 (GSD 2.2), 240 (GSD 2.1), and 224 (GSD 2.2) nmol/g creatinine across the three periods of gestation. Metabolite concentrations from the serial urine specimens in general correlated moderately. The ICCs for the six DAP metabolites ranged from 0.14 to 0.38 (0.30 for total DAPs), indicating weak to moderate reliability. Although the DAP metabolite levels observed in this study are slightly higher and slightly more correlated than in previous studies, the low to moderate reliability indicates a high degree of within-person variability, which presents challenges for designing well-powered epidemiological studies.
Mayo, Ann M
2015-01-01
It is important for CNSs and other APNs to consider the reliability and validity of instruments chosen for clinical practice, evidence-based practice projects, or research studies. Psychometric testing uses specific research methods to evaluate the amount of error associated with any particular instrument. Reliability estimates explain more about how well the instrument is designed, whereas validity estimates explain more about scores that are produced by the instrument. An instrument may be architecturally sound overall (reliable), but the same instrument may not be valid. For example, if a specific group does not understand certain well-constructed items, then the instrument does not produce valid scores when used with that group. Many instrument developers may conduct reliability testing only once, yet continue validity testing in different populations over many years. All CNSs should be advocating for the use of reliable instruments that produce valid results. Clinical nurse specialists may find themselves in situations where reliability and validity estimates for some instruments that are being utilized are unknown. In such cases, CNSs should engage key stakeholders to sponsor nursing researchers to pursue this most important work.
Reliabilities of mental rotation tasks: limits to the assessment of individual differences.
Hirschfeld, Gerrit; Thielsch, Meinald T; Zernikow, Boris
2013-01-01
Mental rotation tasks with objects and body parts as targets are widely used in cognitive neuropsychology. Even though these tasks are well established to study between-groups differences, the reliability on an individual level is largely unknown. We present a systematic study on the internal consistency and test-retest reliability of individual differences in mental rotation tasks comparing different target types and orders of presentations. In total n = 99 participants (n = 63 for the retest) completed the mental rotation tasks with hands, feet, faces, and cars as targets. Different target types were presented in either randomly mixed blocks or blocks of homogeneous targets. Across all target types, the consistency (split-half reliability) and stability (test-retest reliabilities) were good or acceptable both for intercepts and slopes. At the level of individual targets, only intercepts showed acceptable reliabilities. Blocked presentations resulted in significantly faster and numerically more consistent and stable responses. Mental rotation tasks-especially in blocked variants-can be used to reliably assess individual differences in global processing speed. However, the assessment of the theoretically important slope parameter for individual targets requires further adaptations to mental rotation tests.
Great apes are sensitive to prior reliability of an informant in a gaze following task.
Schmid, Benjamin; Karg, Katja; Perner, Josef; Tomasello, Michael
2017-01-01
Social animals frequently rely on information from other individuals. This can be costly in case the other individual is mistaken or even deceptive. Human infants below 4 years of age show proficiency in their reliance on differently reliable informants. They can infer the reliability of an informant from few interactions and use that assessment in later interactions with the same informant in a different context. To explore whether great apes share that ability, in our study we confronted great apes with a reliable or unreliable informant in an object choice task, to see whether that would in a subsequent task affect their gaze following behaviour in response to the same informant. In our study, prior reliability of the informant and habituation during the gaze following task affected both great apes' automatic gaze following response and their more deliberate response of gaze following behind barriers. As habituation is very context specific, it is unlikely that habituation in the reliability task affected the gaze following task. Rather it seems that apes employ a reliability tracking strategy that results in a general avoidance of additional information from an unreliable informant.
We need more replication research - A case for test-retest reliability.
Leppink, Jimmie; Pérez-Fuster, Patricia
2017-06-01
Following debates in psychology on the importance of replication research, we have also started to see pleas for a more prominent role for replication research in medical education. To enable replication research, it is of paramount importance to carefully study the reliability of the instruments we use. Cronbach's alpha has been the most widely used estimator of reliability in the field of medical education, notably as some kind of quality label of test or questionnaire scores based on multiple items or of the reliability of assessment across exam stations. However, as this narrative review outlines, Cronbach's alpha or alternative reliability statistics may complement but not replace psychometric methods such as factor analysis. Moreover, multiple-item measurements should be preferred above single-item measurements, and when using single-item measurements, coefficients as Cronbach's alpha should not be interpreted as indicators of the reliability of a single item when that item is administered after fundamentally different activities, such as learning tasks that differ in content. Finally, if we want to follow up on recent pleas for more replication research, we have to start studying the test-retest reliability of the instruments we use.
2014-01-01
Background Patient-reported outcome validation needs to achieve validity and reliability standards. Among reliability analysis parameters, test-retest reliability is an important psychometric property. Retested patients must be in a clinically stable condition. This is particularly problematic in palliative care (PC) settings because advanced cancer patients are prone to a faster rate of clinical deterioration. The aim of this study was to evaluate the methods by which multi-symptom and health-related qualities of life (HRQoL) based on patient-reported outcomes (PROs) have been validated in oncological PC settings with regards to test-retest reliability. Methods A systematic search of PubMed (1966 to June 2013), EMBASE (1980 to June 2013), PsychInfo (1806 to June 2013), CINAHL (1980 to June 2013), and SCIELO (1998 to June 2013), and specific PRO databases was performed. Studies were included if they described a set of validation studies. Studies were included if they described a set of validation studies for an instrument developed to measure multi-symptom or multidimensional HRQoL in advanced cancer patients under PC. The COSMIN checklist was used to rate the methodological quality of the study designs. Results We identified 89 validation studies from 746 potentially relevant articles. From those 89 articles, 31 measured test-retest reliability and were included in this review. Upon critical analysis of the overall quality of the criteria used to determine the test-retest reliability, 6 (19.4%), 17 (54.8%), and 8 (25.8%) of these articles were rated as good, fair, or poor, respectively, and no article was classified as excellent. Multi-symptom instruments were retested over a shortened interval when compared to the HRQoL instruments (median values 24 hours and 168 hours, respectively; p = 0.001). Validation studies that included objective confirmation of clinical stability in their design yielded better results for the test-retest analysis with regard to both pain and global HRQoL scores (p < 0.05). The quality of the statistical analysis and its description were of great concern. Conclusion Test-retest reliability has been infrequently and poorly evaluated. The confirmation of clinical stability was an important factor in our analysis, and we suggest that special attention be focused on clinical stability when designing a PRO validation study that includes advanced cancer patients under PC. PMID:24447633
Kim, Dohyun; Ahn, So-Yeon; Kim, Junyoung; Park, Sung-Ho
2017-07-01
Since 2007, the FDI World Dental Federation (FDI) criteria have been used for the clinical evaluation of dental restorations. However, the reliability of the FDI criteria has not been sufficiently addressed. The purpose of this study was to assess and compare the interrater and intrarater reliability of the FDI criteria by evaluating posterior tooth-colored restorations photographically. A total of 160 clinical photographs of posterior tooth-colored restorations were evaluated independently by 5 raters with 9 of the FDI criteria suitable for photographic evaluation. The raters recorded the score of each restoration by using 5 grades, and the score was dichotomized into the clinical evaluation scores. After 1 month, 2 of the raters reevaluated the same set of 160 photographs in random order. To estimate the interrater reliability among the 5 raters, the proportion of agreement was calculated, and the Fleiss multirater kappa statistic was used. For the intrarater reliability, the proportion of agreement was calculated, and the Cohen standard kappa statistic was used for each of the 2 raters. The interrater proportion of agreement was 0.41 to 0.57, and the kappa value was 0.09 to 0.39. Overall, the intrarater reliability was higher than the interrater reliability, and rater 1 demonstrated higher intrarater reliability than rater 2. The proportion of agreement and kappa values increased when the 5 scores were dichotomized. The reliability was relatively lower for the esthetic properties compared with the functional or biological properties. Within the limitations of this study, the FDI criteria presented slight to fair interrater reliability and fair to excellent intrarater reliability in the photographic evaluation of posterior tooth-colored restorations. The reliability was improved by simplifying the evaluation scores. Copyright © 2016 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.
Morrison, Melanie A.; Churchill, Nathan W.; Cusimano, Michael D.; Schweizer, Tom A.; Das, Sunit; Graham, Simon J.
2016-01-01
Background Functional magnetic resonance imaging (fMRI) continues to develop as a clinical tool for patients with brain cancer, offering data that may directly influence surgical decisions. Unfortunately, routine integration of preoperative fMRI has been limited by concerns about reliability. Many pertinent studies have been undertaken involving healthy controls, but work involving brain tumor patients has been limited. To develop fMRI fully as a clinical tool, it will be critical to examine these reliability issues among patients with brain tumors. The present work is the first to extensively characterize differences in activation map quality between brain tumor patients and healthy controls, including the effects of tumor grade and the chosen behavioral testing paradigm on reliability outcomes. Method Test-retest data were collected for a group of low-grade (n = 6) and high-grade glioma (n = 6) patients, and for matched healthy controls (n = 12), who performed motor and language tasks during a single fMRI session. Reliability was characterized by the spatial overlap and displacement of brain activity clusters, BOLD signal stability, and the laterality index. Significance testing was performed to assess differences in reliability between the patients and controls, and low-grade and high-grade patients; as well as between different fMRI testing paradigms. Results There were few significant differences in fMRI reliability measures between patients and controls. Reliability was significantly lower when comparing high-grade tumor patients to controls, or to low-grade tumor patients. The motor task produced more reliable activation patterns than the language tasks, as did the rhyming task in comparison to the phonemic fluency task. Conclusion In low-grade glioma patients, fMRI data are as reliable as healthy control subjects. For high-grade glioma patients, further investigation is required to determine the underlying causes of reduced reliability. To maximize reliability outcomes, testing paradigms should be carefully selected to generate robust activation patterns. PMID:26894279
Rahman, Azriani Ab; Mohamad, Norsarwany; Imran, Musa Kamarul; Ibrahim, Wan Pauzi Wan; Othman, Azizah; Aziz, Aniza Abd; Harith, Sakinah; Ibrahim, Mohd Ismail; Ariffin, Nor Hashimah; Van Rostenberghe, Hans
2011-01-01
Background: No previous study has assessed the impact of childhood disability on parents and family in the context of Malaysia, and no instrument to measure this impact has previously been available. The objective of this cross-sectional study was to determine the reliability of a Malay version of the PedsQL™ Family Impact Module that measures the impact of children with disabilities (CWD) on their parents and family in a Malaysian context. Methods: The study was conducted in 2009. The questionnaire was translated forward and backward before it was administered to 44 caregivers of CWD to determine the internal consistency reliability. The test for Cronbach’s alpha was performed. Results: The internal consistency reliability was good. The Cronbach’s alpha for all domains was above 0.7, ranging from 0.73 to 0.895. Conclusion: The Malay version of the PedsQL™ Family Impact Module showed evidence of good internal consistency reliability. However, future studies with a larger sample size are necessary before the module can be recommended as a tool to measure the impact of disability on Malay-speaking Malaysian families. PMID:22589674
NASA Astrophysics Data System (ADS)
Taylor, John R.; Stolz, Christopher J.
1993-08-01
Laser system performance and reliability depends on the related performance and reliability of the optical components which define the cavity and transport subsystems. High-average-power and long transport lengths impose specific requirements on component performance. The complexity of the manufacturing process for optical components requires a high degree of process control and verification. Qualification has proven effective in ensuring confidence in the procurement process for these optical components. Issues related to component reliability have been studied and provide useful information to better understand the long term performance and reliability of the laser system.
NASA Astrophysics Data System (ADS)
Taylor, J. R.; Stolz, C. J.
1992-12-01
Laser system performance and reliability depends on the related performance and reliability of the optical components which define the cavity and transport subsystems. High-average-power and long transport lengths impose specific requirements on component performance. The complexity of the manufacturing process for optical components requires a high degree of process control and verification. Qualification has proven effective in ensuring confidence in the procurement process for these optical components. Issues related to component reliability have been studied and provide useful information to better understand the long term performance and reliability of the laser system.
Gandhi, Namita S; Baker, Mark E; Goenka, Ajit H; Bullen, Jennifer A; Obuchowski, Nancy A; Remer, Erick M; Coppa, Christopher P; Einstein, David; Feldman, Myra K; Kanmaniraja, Devaraju; Purysko, Andrei S; Vahdat, Noushin; Primak, Andrew N; Karim, Wadih; Herts, Brian R
2016-08-01
Purpose To compare the diagnostic accuracy and image quality of computed tomographic (CT) enterographic images obtained at half dose and reconstructed with filtered back projection (FBP) and sinogram-affirmed iterative reconstruction (SAFIRE) with those of full-dose CT enterographic images reconstructed with FBP for active inflammatory terminal or neoterminal ileal Crohn disease. Materials and Methods This retrospective study was compliant with HIPAA and approved by the institutional review board. The requirement to obtain informed consent was waived. Ninety subjects (45 with active terminal ileal Crohn disease and 45 without Crohn disease) underwent CT enterography with a dual-source CT unit. The reference standard for confirmation of active Crohn disease was active terminal ileal Crohn disease based on ileocolonoscopy or established Crohn disease and imaging features of active terminal ileal Crohn disease. Data from both tubes were reconstructed with FBP (100% exposure); data from the primary tube (50% exposure) were reconstructed with FBP and SAFIRE strengths 3 and 4, yielding four datasets per CT enterographic examination. The mean volume CT dose index (CTDIvol) and size-specific dose estimate (SSDE) at full dose were 13.1 mGy (median, 7.36 mGy) and 15.9 mGy (median, 13.06 mGy), respectively, and those at half dose were 6.55 mGy (median, 3.68 mGy) and 7.95 mGy (median, 6.5 mGy). Images were subjectively evaluated by eight radiologists for quality and diagnostic confidence for Crohn disease. Areas under the receiver operating characteristic curves (AUCs) were estimated, and the multireader, multicase analysis of variance method was used to compare reconstruction methods on the basis of a noninferiority margin of 0.05. Results The mean AUCs with half-dose scans (FBP, 0.908; SAFIRE 3, 0.935; SAFIRE 4, 0.924) were noninferior to the mean AUC with full-dose FBP scans (0.908; P < .003). The proportion of images with inferior quality was significantly higher with all half-dose reconstructions than with full-dose FBP (mean proportion: 0.117 for half-dose FBP, 0.054 for half-dose SAFIRE 3, 0.054 for half-dose SAFIRE 4, and 0.017 for full-dose FBP; P < .001). Conclusion The diagnostic accuracy of half-dose CT enterography with FBP and SAFIRE is statistically noninferior to that of full-dose CT enterography for active inflammatory terminal ileal Crohn disease, despite an inferior subjective image quality. (©) RSNA, 2016 Online supplemental material is available for this article.
A Comparison of Two Methods of Determining Interrater Reliability
ERIC Educational Resources Information Center
Fleming, Judith A.; Taylor, Janeen McCracken; Carran, Deborah
2004-01-01
This article offers an alternative methodology for practitioners and researchers to use in establishing interrater reliability for testing purposes. The majority of studies on interrater reliability use a traditional methodology where by two raters are compared using a Pearson product-moment correlation. This traditional method of estimating…
Reliability of Test Scores in Nonparametric Item Response Theory.
ERIC Educational Resources Information Center
Sijtsma, Klaas; Molenaar, Ivo W.
1987-01-01
Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four "classical" lower bounds to reliability. (Author/JAZ)
16 CFR 260.5 - Interpretation and substantiation of environmental marketing claims.
Code of Federal Regulations, 2011 CFR
2011-01-01
... reasonable basis substantiating the claim. A reasonable basis consists of competent and reliable evidence. In... reliable scientific evidence, defined as tests, analyses, research, studies or other evidence based on the... qualified to do so, using procedures generally accepted in the profession to yield accurate and reliable...
77 FR 53877 - Commission Information Collection Activities (FERC-715); Comment Request; Extension
Federal Register 2010, 2011, 2012, 2013, 2014
2012-09-04
...; A detailed description of the transmission planning reliability criteria used to evaluate system... reliability criteria are applied and the steps taken in performing transmission planning studies); and A... reliability criteria using its stated assessment practices. The FERC-715 enables the Commission to use the...
ERIC Educational Resources Information Center
Roberts, Clare; Pratt, Chris
1988-01-01
The study evaluated the psychometric properties of reliability and construct validity of the Attitude Toward Mainstreaming Scale (ATMS) in an Australian context. It was concluded that the scale is both reliable and factorially valid in an Australian context. (Author/DB)
Reliability and Validity of the Sexual Pressure Scale for Women-Revised
Jones, Rachel; Gulick, Elsie
2008-01-01
Sexual pressure among young urban women represents adherence to gender stereotypical expectations to engage in sex. Revision of the original 5-factor Sexual Pressure Scale was undertaken in two studies to improve reliabilities in two of the five factors. In Study 1 the reliability of the Sexual Pressure Scale for Women-Revised (SPSW-R) was tested, and principal components analysis was performed in a sample of 325 young, urban women. A parsimonious 18-item, 4-factor model explained 61% of the variance. In Study 2 the theory underlying sexual pressure was supported by confirmatory factor analysis using structural equation modeling in a sample of 181 women. Reliabilities of the SPSW-R total and subscales were very satisfactory, suggesting it may be used in intervention research. PMID:18666222
The validation of Huffaz Intelligence Test (HIT)
NASA Astrophysics Data System (ADS)
Rahim, Mohd Azrin Mohammad; Ahmad, Tahir; Awang, Siti Rahmah; Safar, Ajmain
2017-08-01
In general, a hafiz who can memorize the Quran has many specialties especially in respect to their academic performances. In this study, the theory of multiple intelligences introduced by Howard Gardner is embedded in a developed psychometric instrument, namely Huffaz Intelligence Test (HIT). This paper presents the validation and the reliability of HIT of some tahfiz students in Malaysia Islamic schools. A pilot study was conducted involving 87 huffaz who were randomly selected to answer the items in HIT. The analysis method used includes Partial Least Square (PLS) on reliability, convergence and discriminant validation. The study has validated nine intelligences. The findings also indicated that the composite reliabilities for the nine types of intelligences are greater than 0.8. Thus, the HIT is a valid and reliable instrument to measure the multiple intelligences among huffaz.
The Children's Play Therapy Instrument (CPTI). Description, development, and reliability studies.
Kernberg, P F; Chazan, S E; Normandin, L
1998-01-01
The Children's Play Therapy Instrument (CPTI), its development, and reliability studies are described. The CPTI is a new instrument to examine a child's play activity in individual psychotherapy. Three independent raters used the CPTI to rate eight videotaped play therapy vignettes. Results were compared with the authors' consensual scores from a preliminary study. Generally good to excellent levels of interrater reliability were obtained for the independent raters on intraclass correlation coefficients for ordinal categories of the CPTI. Likewise, kappa levels were acceptable to excellent for nominal categories of the scale. The CPTI holds promise to become a reliable measure of play activity in child psychotherapy. Further research is needed to assess discriminant validity of the CPTI for use as a diagnostic tool and as a measure of process and outcome.
Agreement, the F-Measure, and Reliability in Information Retrieval
Hripcsak, George; Rothschild, Adam S.
2005-01-01
Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the κ statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can be shown that the average F-measure among pairs of experts is numerically identical to the average positive specific agreement among experts and that κ approaches these measures as the number of negative cases grows large. Positive specific agreement—or the equivalent F-measure—may be an appropriate way to quantify interrater reliability and therefore to assess the reliability of a gold standard in these studies. PMID:15684123
2nd Generation Reusable Launch Vehicle (2G RLV). Revised
NASA Technical Reports Server (NTRS)
Matlock, Steve; Sides, Steve; Kmiec, Tom; Arbogast, Tim; Mayers, Tom; Doehnert, Bill
2001-01-01
This is a revised final report and addresses all of the work performed on this program. Specifically, it covers vehicle architecture background, definition of six baseline engine cycles, reliability baseline (space shuttle main engine QRAS), and component level reliability/performance/cost for the six baseline cycles, and selection of 3 cycles for further study. This report further addresses technology improvement selection and component level reliability/performance/cost for the three cycles selected for further study, as well as risk reduction plans, and recommendation for future studies.
An, Hyeong Su; Moon, Won-Jin; Ryu, Jae-Kyun; Park, Ju Yeon; Yun, Won Sung; Choi, Jin Woo; Jahng, Geon-Ho; Park, Jang-Yeon
2017-12-01
This prospective multi-center study aimed to evaluate the inter-vendor and test-retest reliabilities of resting-state functional magnetic resonance imaging (RS-fMRI) by assessing the temporal signal-to-noise ratio (tSNR) and functional connectivity. Study included 10 healthy subjects and each subject was scanned using three 3T MR scanners (GE Signa HDxt, Siemens Skyra, and Philips Achieva) in two sessions. The tSNR was calculated from the time course data. Inter-vendor and test-retest reliabilities were assessed with intra-class correlation coefficients (ICCs) derived from variant component analysis. Independent component analysis was performed to identify the connectivity of the default-mode network (DMN). In result, the tSNR for the DMN was not significantly different among the GE, Philips, and Siemens scanners (P=0.638). In terms of vendor differences, the inter-vendor reliability was good (ICC=0.774). Regarding the test-retest reliability, the GE scanner showed excellent correlation (ICC=0.961), while the Philips (ICC=0.671) and Siemens (ICC=0.726) scanners showed relatively good correlation. The DMN pattern of the subjects between the two sessions for each scanner and between three scanners showed the identical patterns of functional connectivity. The inter-vendor and test-retest reliabilities of RS-fMRI using different 3T MR scanners are good. Thus, we suggest that RS-fMRI could be used in multicenter imaging studies as a reliable imaging marker. Copyright © 2017 Elsevier Inc. All rights reserved.
López-Pascual, Juan; Cáceres, Magda Liliana; De Rosario, Helios; Page, Álvaro
2016-02-08
The reliability of joint rotation measurements is an issue of major interest, especially in clinical applications. The effect of instrumental errors and soft tissue artifacts on the variability of human motion measures is well known, but the influence of the representation of joint motion has not yet been studied. The aim of the study was to compare the within-subject reliability of three rotation formalisms for the calculation of the shoulder elevation joint angles. Five repetitions of humeral elevation in the scapular plane of 27 healthy subjects were recorded using a stereophotogrammetry system. The humerothoracic joint angles were calculated using the YX'Y" and XZ'Y" Euler angle sequences and the attitude vector. A within-subject repeatability study was performed for the three representations. ICC, SEM and CV were the indices used to estimate the error in the calculation of the angle amplitudes and the angular waveforms with each method. Excellent results were obtained in all representations for the main angle (elevation), but there were remarkable differences for axial rotation and plane of elevation. The YX'Y" sequence generally had the poorest reliability in the secondary angles. The XZ'Y' sequence proved to be the most reliable representation of axial rotation, whereas the attitude vector had the highest reliability in the plane of elevation. These results highlight the importance of selecting the method used to describe the joint motion when within-subjects reliability is an important issue of the experiment. This may be of particular importance when the secondary angles of motions are being studied. Copyright © 2016 Elsevier Ltd. All rights reserved.
Becker, Anne E.; Roberts, Andrea L.; Perloe, Alexandra; Bainivualiku, Asenaca; Richards, Lauren K.; Gilman, Stephen E.; Striegel-Moore, Ruth H.
2010-01-01
Objective The Global School-based Student Health Survey (GSHS) is an assessment for adolescent health risk behaviors and exposures, supported by the World Health Organization. Although already widely implemented—and intended for youth assessment across diverse ethnic and national contexts—no reliability data have yet been reported for GSHS-based assessment in any ethnicity or country-specific population. This study reports test-retest reliability for GSHS content adapted for a female adolescent ethnic Fijian study sample in Fiji. Design We adapted and translated GSHS content to assess health risk behaviors as part of a larger study investigating the impact of social transition on ethnic Fijian secondary schoolgirls in Fiji. In order to evaluate the performance of this measure for our ethnic Fijian study sample (n=523), we examined its test-retest reliability with kappa coefficients, % agreement, and prevalence estimates in a sub-sample (n=81). Reliability among strata defined by topic, age, and language was also examined. Results Average agreement between test and retest was 77%, and average Cohen's kappa was 0.47. Mean kappas for questions from core modules about alcohol use, tobacco use, and sexual behavior were substantial, and higher than those for modules relating to other risk behaviors. Conclusions Although test-retest reliability of responses within this country-specific version of GSHS content was substantial in several topical domains for this ethnic Fijian sample, only fair reliability for the module assessing dietary behaviors and other individual items suggests that population-specific psychometric evaluation is essential to interpreting language and country-specific GSHS data. PMID:20234961
Helmerhorst, Hendrik J F; Brage, Søren; Warren, Janet; Besson, Herve; Ekelund, Ulf
2012-08-31
Physical inactivity is one of the four leading risk factors for global mortality. Accurate measurement of physical activity (PA) and in particular by physical activity questionnaires (PAQs) remains a challenge. The aim of this paper is to provide an updated systematic review of the reliability and validity characteristics of existing and more recently developed PAQs and to quantitatively compare the performance between existing and newly developed PAQs.A literature search of electronic databases was performed for studies assessing reliability and validity data of PAQs using an objective criterion measurement of PA between January 1997 and December 2011. Articles meeting the inclusion criteria were screened and data were extracted to provide a systematic overview of measurement properties. Due to differences in reported outcomes and criterion methods a quantitative meta-analysis was not possible.In total, 31 studies testing 34 newly developed PAQs, and 65 studies examining 96 existing PAQs were included. Very few PAQs showed good results on both reliability and validity. Median reliability correlation coefficients were 0.62-0.71 for existing, and 0.74-0.76 for new PAQs. Median validity coefficients ranged from 0.30-0.39 for existing, and from 0.25-0.41 for new PAQs.Although the majority of PAQs appear to have acceptable reliability, the validity is moderate at best. Newly developed PAQs do not appear to perform substantially better than existing PAQs in terms of reliability and validity. Future PAQ studies should include measures of absolute validity and the error structure of the instrument.
Chen, Hong-Lin; Cao, Ying-Juan; Zhang, Wei; Wang, Jing; Huai, Bao-Sha
2017-02-01
The inter-rater reliability of Braden Scale is not so good. We modified the Braden(ALB) scale by defining nutrition subscale based on serum albumin, then assessed it's the validity and reliability in hospital patients. We designed a retrospective study for validity analysis, and a prospective study for reliability analysis. Receiver operating curve (ROC) and area under the curve (AUC) were used to evaluate the predictive validity. Intra-class correlation coefficient (ICC) was used to investigate the inter-rater reliability. Two thousand five hundred twenty-five patients were included for validity analysis, 76 patients (3.0%) developed pressure ulcer. Positive correlation was found between serum albumin and nutrition score in Braden scale (Spearman's coefficient 0.2203, P<0.0001). The AUCs for Braden scale and Braden(ALB) scale predicting pressure ulcer risk were 0.813 (95% CI 0.797-0.828; P<0.0001), and 0.859 (95% CI 0.845-0.872; P<0.0001), respectively. The Braden(ALB) scale was even more valid than the Braden scale (z=1.860, P=0.0628). In different age subgroups, the Braden(ALB) scale seems also more valid than the original Braden scale, but no statistically significant differences were found (P>0.05). The inter-rater reliability study showed the ICC-value for nutrition increased 45.9%, and increased 4.3% for total score. The Braden(ALB) scale has similar validity compared with the original Braden scale for in hospital patients. However, the inter-rater reliability was significantly increased. Copyright © 2016 Elsevier Inc. All rights reserved.
2012-01-01
Physical inactivity is one of the four leading risk factors for global mortality. Accurate measurement of physical activity (PA) and in particular by physical activity questionnaires (PAQs) remains a challenge. The aim of this paper is to provide an updated systematic review of the reliability and validity characteristics of existing and more recently developed PAQs and to quantitatively compare the performance between existing and newly developed PAQs. A literature search of electronic databases was performed for studies assessing reliability and validity data of PAQs using an objective criterion measurement of PA between January 1997 and December 2011. Articles meeting the inclusion criteria were screened and data were extracted to provide a systematic overview of measurement properties. Due to differences in reported outcomes and criterion methods a quantitative meta-analysis was not possible. In total, 31 studies testing 34 newly developed PAQs, and 65 studies examining 96 existing PAQs were included. Very few PAQs showed good results on both reliability and validity. Median reliability correlation coefficients were 0.62–0.71 for existing, and 0.74–0.76 for new PAQs. Median validity coefficients ranged from 0.30–0.39 for existing, and from 0.25–0.41 for new PAQs. Although the majority of PAQs appear to have acceptable reliability, the validity is moderate at best. Newly developed PAQs do not appear to perform substantially better than existing PAQs in terms of reliability and validity. Future PAQ studies should include measures of absolute validity and the error structure of the instrument. PMID:22938557
Brolin, Rosita; Rask, Mikael; Syrén, Susanne; Brunt, David Arthur
2013-10-01
The aim of this study was to investigate the reliability and validity of a questionnaire for studying satisfaction with housing and housing support for people with psychiatric disabilities. Most items were gathered from English language questionnaires. These were translated and adapted to a Swedish context and items concerning housing support were added. Two studies were conducted. The first, a test-retest reliability analysis, was performed in a pilot study with 53 participants; in the second study, which had 370 participants, a five factor solution with good internal consistency emerged. Further development of the questionnaire is discussed.
Lower Bounds to the Reliabilities of Factor Score Estimators.
Hessen, David J
2016-10-06
Under the general common factor model, the reliabilities of factor score estimators might be of more interest than the reliability of the total score (the unweighted sum of item scores). In this paper, lower bounds to the reliabilities of Thurstone's factor score estimators, Bartlett's factor score estimators, and McDonald's factor score estimators are derived and conditions are given under which these lower bounds are equal. The relative performance of the derived lower bounds is studied using classic example data sets. The results show that estimates of the lower bounds to the reliabilities of Thurstone's factor score estimators are greater than or equal to the estimates of the lower bounds to the reliabilities of Bartlett's and McDonald's factor score estimators.
Dontje, Manon L; Dall, Philippa M; Skelton, Dawn A; Gill, Jason M R; Chastin, Sebastien F M
2018-01-01
Prolonged sedentary behaviour (SB) is associated with poor health. It is unclear which SB measure is most appropriate for interventions and population surveillance to measure and interpret change in behaviour in older adults. The aims of this study: to examine the relative and absolute reliability, Minimal Detectable Change (MDC) and responsiveness to change of subjective and objective methods of measuring SB in older adults and give recommendations of use for different study designs. SB of 18 older adults (aged 71 (IQR 7) years) was assessed using a systematic set of six subjective tools, derived from the TAxonomy of Self report Sedentary behaviour Tools (TASST), and one objective tool (activPAL3c), over 14 days. Relative reliability (Intra Class Correlation coefficients-ICC), absolute reliability (SEM), MDC, and the relative responsiveness (Cohen's d effect size (ES) and Guyatt's Responsiveness coefficient (GR)) were calculated for each of the different tools and ranked for different study designs. ICC ranged from 0.414 to 0.946, SEM from 36.03 to 137.01 min, MDC from 1.66 to 8.42 hours, ES from 0.017 to 0.259 and GR from 0.024 to 0.485. Objective average day per week measurement ranked as most responsive in a clinical practice setting, whereas a one day measurement ranked highest in quasi-experimental, longitudinal and controlled trial study designs. TV viewing-Previous Week Recall (PWR) ranked as most responsive subjective measure in all study designs. The reliability, Minimal Detectable Change and responsiveness to change of subjective and objective methods of measuring SB is context dependent. Although TV viewing-PWR is the more reliable and responsive subjective method in most situations, it may have limitations as a reliable measure of total SB. Results of this study can be used to guide choice of tools for detecting change in sedentary behaviour in older adults in the contexts of population surveillance, intervention evaluation and individual care.
NASA Technical Reports Server (NTRS)
Wallace, Dolores R.
2003-01-01
In FY01 we learned that hardware reliability models need substantial changes to account for differences in software, thus making software reliability measurements more effective, accurate, and easier to apply. These reliability models are generally based on familiar distributions or parametric methods. An obvious question is 'What new statistical and probability models can be developed using non-parametric and distribution-free methods instead of the traditional parametric method?" Two approaches to software reliability engineering appear somewhat promising. The first study, begin in FY01, is based in hardware reliability, a very well established science that has many aspects that can be applied to software. This research effort has investigated mathematical aspects of hardware reliability and has identified those applicable to software. Currently the research effort is applying and testing these approaches to software reliability measurement, These parametric models require much project data that may be difficult to apply and interpret. Projects at GSFC are often complex in both technology and schedules. Assessing and estimating reliability of the final system is extremely difficult when various subsystems are tested and completed long before others. Parametric and distribution free techniques may offer a new and accurate way of modeling failure time and other project data to provide earlier and more accurate estimates of system reliability.
The reliability of four widely used patellar height ratios.
van Duijvenbode, Dennis; Stavenuiter, Michel; Burger, Bart; van Dijke, Cees; Spermon, Jacco; Hoozemans, Marco
2016-03-01
The objective of this study was to evaluate the inter-observer reliability and the intra-observer reliability of four patellar height ratios: Insall-Salvati (IS), modified Insall-Salvati (MIS), Blackburne-Peel (BP) and Caton-Deschamps (CD). The patellar height ratios were assessed by four independent examiners using weight-bearing lateral knee radiographs in 30° flexion. Intra-class correlation coefficients and Fleiss' kappa's were determined. The inter-observer reliability was excellent for the IS and moderate for the other ratios. When the ratio values were categorized, the inter-observer reliability was strong for the IS, moderate for the MIS and BP, and poor for the CD. The intra-observer reliability was excellent for the IS, MIS and CD, and strong for the BP. When the ratio values were categorized, the intra-observer reliability was strong for the IS and MIS, and moderate for the other ratios. Although the IS showed best reliability, we advise to use the MIS as it showed the second best reliability but is, according to the literature, associated with better validity.
Greer, Matthew D; Shih, Joanna H; Lay, Nathan; Barrett, Tristan; Kayat Bittencourt, Leonardo; Borofsky, Samuel; Kabakus, Ismail M; Law, Yan Mee; Marko, Jamie; Shebel, Haytham; Mertan, Francesca V; Merino, Maria J; Wood, Bradford J; Pinto, Peter A; Summers, Ronald M; Choyke, Peter L; Turkbey, Baris
2017-12-01
Purpose To validate the dominant pulse sequence paradigm and limited role of dynamic contrast material-enhanced magnetic resonance (MR) imaging in the Prostate Imaging Reporting and Data System (PI-RADS) version 2 for prostate multiparametric MR imaging by using data from a multireader study. Materials and Methods This HIPAA-compliant retrospective interpretation of prospectively acquired data was approved by the local ethics committee. Patients were treatment-naïve with endorectal coil 3-T multiparametric MR imaging. A total of 163 patients were evaluated, 110 with prostatectomy after multiparametric MR imaging and 53 with negative multiparametric MR imaging and systematic biopsy findings. Nine radiologists participated in this study and interpreted images in 58 patients, on average (range, 56-60 patients). Lesions were detected with PI-RADS version 2 and were compared with whole-mount prostatectomy findings. Probability of cancer detection for overall, T2-weighted, and diffusion-weighted (DW) imaging PI-RADS scores was calculated in the peripheral zone (PZ) and transition zone (TZ) by using generalized estimating equations. To determine dominant pulse sequence and benefit of dynamic contrast-enhanced (DCE) imaging, odds ratios (ORs) were calculated as the ratio of odds of cancer of two consecutive scores by logistic regression. Results A total of 654 lesions (420 in the PZ) were detected. The probability of cancer detection for PI-RADS category 2, 3, 4, and 5 lesions was 15.7%, 33.1%, 70.5%, and 90.7%, respectively. DW imaging outperformed T2-weighted imaging in the PZ (OR, 3.49 vs 2.45; P = .008). T2-weighted imaging performed better but did not clearly outperform DW imaging in the TZ (OR, 4.79 vs 3.77; P = .494). Lesions classified as PI-RADS category 3 at DW MR imaging and as positive at DCE imaging in the PZ showed a higher probability of cancer detection than did DCE-negative PI-RADS category 3 lesions (67.8% vs 40.0%, P = .02). The addition of DCE imaging to DW imaging in the PZ was beneficial (OR, 2.0; P = .027), with an increase in the probability of cancer detection of 15.7%, 16.0%, and 9.2% for PI-RADS category 2, 3, and 4 lesions, respectively. Conclusion DW imaging outperforms T2-weighted imaging in the PZ; T2-weighted imaging did not show a significant difference when compared with DW imaging in the TZ by PI-RADS version 2 criteria. The addition of DCE imaging to DW imaging scores in the PZ yields meaningful improvements in probability of cancer detection. © RSNA, 2017 An earlier incorrect version of this article appeared online. This article was corrected on July 27, 2017. Online supplemental material is available for this article.
Temporal reliability and lateralization of the resting-state language network.
Zhu, Linlin; Fan, Yang; Zou, Qihong; Wang, Jue; Gao, Jia-Hong; Niu, Zhendong
2014-01-01
The neural processing loop of language is complex but highly associated with Broca's and Wernicke's areas. The left dominance of these two areas was the earliest observation of brain asymmetry. It was demonstrated that the language network and its functional asymmetry during resting state were reproducible across institutions. However, the temporal reliability of resting-state language network and its functional asymmetry are still short of knowledge. In this study, we established a seed-based resting-state functional connectivity analysis of language network with seed regions located at Broca's and Wernicke's areas, and investigated temporal reliability of language network and its functional asymmetry. The language network was found to be temporally reliable in both short- and long-term. In the aspect of functional asymmetry, the Broca's area was found to be left lateralized, while the Wernicke's area is mainly right lateralized. Functional asymmetry of these two areas revealed high short- and long-term reliability as well. In addition, the impact of global signal regression (GSR) on reliability of the resting-state language network was investigated, and our results demonstrated that GSR had negligible effect on the temporal reliability of the resting-state language network. Our study provided methodology basis for future cross-culture and clinical researches of resting-state language network and suggested priority of adopting seed-based functional connectivity for its high reliability.
Temporal Reliability and Lateralization of the Resting-State Language Network
Zou, Qihong; Wang, Jue; Gao, Jia-Hong; Niu, Zhendong
2014-01-01
The neural processing loop of language is complex but highly associated with Broca's and Wernicke's areas. The left dominance of these two areas was the earliest observation of brain asymmetry. It was demonstrated that the language network and its functional asymmetry during resting state were reproducible across institutions. However, the temporal reliability of resting-state language network and its functional asymmetry are still short of knowledge. In this study, we established a seed-based resting-state functional connectivity analysis of language network with seed regions located at Broca's and Wernicke's areas, and investigated temporal reliability of language network and its functional asymmetry. The language network was found to be temporally reliable in both short- and long-term. In the aspect of functional asymmetry, the Broca's area was found to be left lateralized, while the Wernicke's area is mainly right lateralized. Functional asymmetry of these two areas revealed high short- and long-term reliability as well. In addition, the impact of global signal regression (GSR) on reliability of the resting-state language network was investigated, and our results demonstrated that GSR had negligible effect on the temporal reliability of the resting-state language network. Our study provided methodology basis for future cross-culture and clinical researches of resting-state language network and suggested priority of adopting seed-based functional connectivity for its high reliability. PMID:24475058
Reliability of heart rate measures during walking before and after running maximal efforts.
Boullosa, D A; Barros, E S; del Rosso, S; Nakamura, F Y; Leicht, A S
2014-11-01
Previous studies on HR recovery (HRR) measures have utilized the supine and the seated postures. However, the most common recovery mode in sport and clinical settings after running exercise is active walking. The aim of the current study was to examine the reliability of HR measures during walking (4 km · h(-1)) before and following a maximal test. Twelve endurance athletes performed an incremental running test on 2 days separated by 48 h. Absolute (coefficient of variation, CV, %) and relative [Intraclass correlation coefficient, (ICC)] reliability of time domain and non-linear measures of HR variability (HRV) from 3 min recordings, and HRR parameters over 5 min were assessed. Moderate to very high reliability was identified for most HRV indices with short-term components of time domain and non-linear HRV measures demonstrating the greatest reliability before (CV: 12-22%; ICC: 0.73-0.92) and after exercise (CV: 14-32%; ICC: 0.78-0.91). Most HRR indices and parameters of HRR kinetics demonstrated high to very high reliability with HR values at a given point and the asymptotic value of HR being the most reliable (CV: 2.5-10.6%; ICC: 0.81-0.97). These findings demonstrate these measures as reliable tools for the assessment of autonomic control of HR during walking before and after maximal efforts. © Georg Thieme Verlag KG Stuttgart · New York.
Probabilistic fatigue methodology for six nines reliability
NASA Technical Reports Server (NTRS)
Everett, R. A., Jr.; Bartlett, F. D., Jr.; Elber, Wolf
1990-01-01
Fleet readiness and flight safety strongly depend on the degree of reliability that can be designed into rotorcraft flight critical components. The current U.S. Army fatigue life specification for new rotorcraft is the so-called six nines reliability, or a probability of failure of one in a million. The progress of a round robin which was established by the American Helicopter Society (AHS) Subcommittee for Fatigue and Damage Tolerance is reviewed to investigate reliability-based fatigue methodology. The participants in this cooperative effort are in the U.S. Army Aviation Systems Command (AVSCOM) and the rotorcraft industry. One phase of the joint activity examined fatigue reliability under uniquely defined conditions for which only one answer was correct. The other phases were set up to learn how the different industry methods in defining fatigue strength affected the mean fatigue life and reliability calculations. Hence, constant amplitude and spectrum fatigue test data were provided so that each participant could perform their standard fatigue life analysis. As a result of this round robin, the probabilistic logic which includes both fatigue strength and spectrum loading variability in developing a consistant reliability analysis was established. In this first study, the reliability analysis was limited to the linear cumulative damage approach. However, it is expected that superior fatigue life prediction methods will ultimately be developed through this open AHS forum. To that end, these preliminary results were useful in identifying some topics for additional study.
Reliability Analysis for AFTI-F16 SRFCS Using ASSIST and SURE
NASA Technical Reports Server (NTRS)
Wu, N. Eva
2001-01-01
This paper reports the results of a study on reliability analysis of an AFTI-16 Self-Repairing Flight Control System (SRFCS) using software tools SURE (Semi-Markov Unreliability Range Evaluator and ASSIST (Abstract Semi-Markov Specification Interface to the SURE Tool). The purpose of the study is to investigate the potential utility of the software tools in the ongoing effort of the NASA Aviation Safety Program, where the class of systems must be extended beyond the originally intended serving class of electronic digital processors. The study concludes that SURE and ASSIST are applicable to reliability, analysis of flight control systems. They are especially efficient for sensitivity analysis that quantifies the dependence of system reliability on model parameters. The study also confirms an earlier finding on the dominant role of a parameter called a failure coverage. The paper will remark on issues related to the improvement of coverage and the optimization of redundancy level.
Alzyoud, Sukaina; Veeranki, Sreenivas P.; Kheirallah, Khalid A.; Shotar, Ali M.; Pbert, Lori
2016-01-01
Introduction: Waterpipe use among adolescents has been increasing progressively. Yet no studies were reported to assess the validity and reliability of nicotine dependence scale. The current study aims to assess the validity and reliability of an Arabic version of the modified Waterpipe Tolerance Questionnaire WTQ among school-going adolescent waterpipe users. Methods: In a cross-sectional study conducted in Jordan, information on waterpipe use among 333 school-going adolescents aged 11-18 years was obtained using the Arabic version of the WTQ. An exploratory factor analysis and correlation matrices were conducted to assess validity and reliability of the WTQ. Results: The WTQ had a 0.73 alpha of internal consistency indicating moderate level of reliability. The scale showed multidimensionality with items loading on two factors, namely waterpipe consumption and morning smoking. Conclusion: This study report nicotine dependence level among school-going adolescents who identify themselves as waterpipe users using the WTQ. PMID:26383198
Big data analytics for the Future Circular Collider reliability and availability studies
NASA Astrophysics Data System (ADS)
Begy, Volodimir; Apollonio, Andrea; Gutleber, Johannes; Martin-Marquez, Manuel; Niemi, Arto; Penttinen, Jussi-Pekka; Rogova, Elena; Romero-Marin, Antonio; Sollander, Peter
2017-10-01
Responding to the European Strategy for Particle Physics update 2013, the Future Circular Collider study explores scenarios of circular frontier colliders for the post-LHC era. One branch of the study assesses industrial approaches to model and simulate the reliability and availability of the entire particle collider complex based on the continuous monitoring of CERN’s accelerator complex operation. The modelling is based on an in-depth study of the CERN injector chain and LHC, and is carried out as a cooperative effort with the HL-LHC project. The work so far has revealed that a major challenge is obtaining accelerator monitoring and operational data with sufficient quality, to automate the data quality annotation and calculation of reliability distribution functions for systems, subsystems and components where needed. A flexible data management and analytics environment that permits integrating the heterogeneous data sources, the domain-specific data quality management algorithms and the reliability modelling and simulation suite is a key enabler to complete this accelerator operation study. This paper describes the Big Data infrastructure and analytics ecosystem that has been put in operation at CERN, serving as the foundation on which reliability and availability analysis and simulations can be built. This contribution focuses on data infrastructure and data management aspects and presents case studies chosen for its validation.
Reliability of the ECHOWS Tool for Assessment of Patient Interviewing Skills.
Boissonnault, Jill S; Evans, Kerrie; Tuttle, Neil; Hetzel, Scott J; Boissonnault, William G
2016-04-01
History taking is an important component of patient/client management. Assessment of student history-taking competency can be achieved via a standardized tool. The ECHOWS tool has been shown to be valid with modest intrarater reliability in a previous study but did not demonstrate sufficient power to definitively prove its stability. The purposes of this study were: (1) to assess the reliability of the ECHOWS tool for student assessment of patient interviewing skills and (2) to determine whether the tool discerns between novice and experienced skill levels. A reliability and construct validity assessment was conducted. Three faculty members from the United States and Australia scored videotaped histories from standardized patients taken by students and experienced clinicians from each of these countries. The tapes were scored twice, 3 to 6 weeks apart. Reliability was assessed using interclass correlation coefficients (ICCs) and repeated measures. Analysis of variance models assessed the ability of the tool to discern between novice and experienced skill levels. The ECHOWS tool showed excellent intrarater reliability (ICC [3,1]=.74-.89) and good interrater reliability (ICC [2,1]=.55) as a whole. The summary of performance (S) section showed poor interrater reliability (ICC [2,1]=.27). There was no statistical difference in performance on the tool between novice and experienced clinicians. A possible ceiling effect may occur when standardized patients are not coached to provide complex and obtuse responses to interviewer questions. Variation in familiarity with the ECHOWS tool and in use of the online training may have influenced scoring of the S section. The ECHOWS tool demonstrates excellent intrarater reliability and moderate interrater reliability. Sufficient training with the tool prior to student assessment is recommended. The S section must evolve in order to provide a more discerning measure of interviewing skills. © 2016 American Physical Therapy Association.
Leifker, Feea R.; Patterson, Thomas L.; Bowie, Christopher R.; Mausbach, Brent T.; Harvey, Philip D.
2010-01-01
Performance-based measures of the ability to perform social and everyday living skills are being more widely used to assess functional capacity in people with serious mental illnesses such as schizophrenia and bipolar disorder. Since they are also being used as outcome measures in pharmacological and cognitive remediation studies aimed at cognitive impairments in schizophrenia, understanding their measurement properties and potential sensitivity to change is important. In this study, the test-retest reliability, practice effects, and reliable change indices of two different performance-based functional capacity measures, the UCSD Performance-based skills assessment (UPSA) and Social skills performance assessment (SSPA) were examined over several different retest intervals in two different samples of people with schizophrenia (n’s=238 and 116) and a healthy comparison sample (n=109). These psychometric properties were compared to those of a neuropsychological assessment battery. Test-retest reliabilities of the long form of the UPSA ranged from r=.63 to r=.80 over follow-up periods up to 36 months in people with schizophrenia, while brief UPSA reliabilities ranged from r=.66 to r=.81. Test-retest reliability of the NP performance scores ranged from r=.77 to r=.79. Test-retest reliabilities of the UPSA were lower in healthy controls, while NP performance was slightly more reliable. SSPA test-retest reliability was lower. Practice effect sizes ranged from .05 to .16 for the UPSA and .07 to .19 for the NP assessment in patients, with HC having more practice effects. Reliable change intervals were consistent across NP and both FC measures, indicating equal potential for detection of change. These performance-based measures of functional capacity appear to have similar potential to be sensitive to change compared to NP performance in people with schizophrenia. PMID:20399613
How Many Sleep Diary Entries Are Needed to Reliably Estimate Adolescent Sleep?
Arora, Teresa; Gradisar, Michael; Taheri, Shahrad; Carskadon, Mary A.
2017-01-01
Abstract Study Objectives: To investigate (1) how many nights of sleep diary entries are required for reliable estimates of five sleep-related outcomes (bedtime, wake time, sleep onset latency [SOL], sleep duration, and wake after sleep onset [WASO]) and (2) the test–retest reliability of sleep diary estimates of school night sleep across 12 weeks. Methods: Data were drawn from four adolescent samples (Australia [n = 385], Qatar [n = 245], United Kingdom [n = 770], and United States [n = 366]), who provided 1766 eligible sleep diary weeks for reliability analyses. We performed reliability analyses for each cohort using complete data (7 days), one to five school nights, and one to two weekend nights. We also performed test–retest reliability analyses on 12-week sleep diary data available from a subgroup of 55 US adolescents. Results: Intraclass correlation coefficients for bedtime, SOL, and sleep duration indicated good-to-excellent reliability from five weekday nights of sleep diary entries across all adolescent cohorts. Four school nights was sufficient for wake times in the Australian and UK samples, but not the US or Qatari samples. Only Australian adolescents showed good reliability for two weekend nights of bedtime reports; estimates of SOL were adequate for UK adolescents based on two weekend nights. WASO was not reliably estimated using 1 week of sleep diaries. We observed excellent test–rest reliability across 12 weeks of sleep diary data in a subsample of US adolescents. Conclusion: We recommend at least five weekday nights of sleep dairy entries to be made when studying adolescent bedtimes, SOL, and sleep duration. Adolescent sleep patterns were stable across 12 consecutive school weeks. PMID:28199718
Prospective patients rate practice factors: development of a questionnaire.
St Louis, Brian Lingg; Firestone, Allen R; Johnston, William; Shanker, Shiva; Vig, Katherine W L
2011-02-01
The importance that prospective patients place on practice characteristics when choosing an orthodontic practice has not been extensively reported. The objective of this research was to develop a valid and reliable questionnaire to address the relative importance of orthodontic office and doctor characteristics for prospective patients or parents of child patients during the initial orthodontic office consultation. An initial questionnaire, based on published literature, was field-tested on 16 subjects to assess its validity. Based on the field test, the questionnaire was modified and tested for reliability by using a test-retest method. The questionnaire covered the following areas: doctor, office, staff, and finances. The reliability study included 2 groups of subjects: 12 consecutive prospective adult patients and 41 consecutive parents of prospective child patients. The questionnaires consisted of 43 and 50 questions for the adult patients and the parents of patients, respectively. The subjects rated the importance of practice characteristics in their selection of an orthodontic practice using a 100-mm visual analog scale anchored at "not important at all" and "most important." Reliability was analyzed by using the intraclass correlation coefficient (ICC). Summary scores of all 53 subjects showed excellent reliability (ICC, 0.88; range, 0.61-1.0). Summary scores of all 50 questions showed acceptable reliability (ICC, 0.70; range, 0.45-0.88). Twenty-one questions had excellent reliability (ICC, >.75), and 29 questions had fair-to-good reliability (ICC, 0.41-0.75). No questions showed poor reliability (ICC, <0.4). The pilot study data indicated that the overall reliability of the questionnaire is acceptable. Copyright © 2011 American Association of Orthodontists. Published by Mosby, Inc. All rights reserved.
Yang, Nan; Waddington, Gordon; Adams, Roger; Han, Jia
2018-05-01
Quantitative assessments of handedness and footedness are often required in studies of human cognition and behaviour, yet no reliable Chinese versions of commonly used handedness and footedness questionnaires are available. Accordingly, the objective of the present study was to translate the Edinburgh Handedness Inventory (EHI) and the Waterloo Footedness Questionnaire-Revised (WFQ-R) into Mandarin Chinese and to evaluate the reliability and validity of these translated versions in healthy Chinese people. In the first stage of the study, Chinese versions of the EHI and WFQ-R were produced from a process of translation, back translation and examination, with necessary cultural adaptations. The second stage involved determining the reliability and validity of the translated EHI and WFQ-R for the Chinese population. One hundred and ten Chinese participants were tested online, and the results showed that the Cronbach's alpha coefficient of internal consistency was 0.877 for the translated EHI and 0.855 for the translated WFQ-R. Another 170 Chinese participants were tested and re-tested after a 30-day interval. The intra-class correlation coefficients showed high reliability, 0.898 for the translated EHI and 0.869 for the translated WFQ-R. This preliminary validation study found the translated versions to be reliable and valid tools for assessing handedness and footedness in this population.
The reliability of a simplified water displacement instrument: a method for measuring arm volume.
Sagen, Ase; Kåresen, Rolf; Risberg, May Arna
2005-01-01
To present a new water displacement measurement, the Simplified Water Displacement Instrument (SWDI), and to evaluate its intra- and intertester reliability. Reliability design. Hospital setting. Fifty-six healthy people were studied. Intratester reliability was evaluated once a week for 4 weeks in 20 women and 10 men. Intertester reliability was assessed by 2 physical therapists in 26 people. Not applicable. Coefficients of variation (CVs) and intraclass correlation coefficients (ICCs). The intratester reliability showed a CV range of 2.2% to 2.6% and an ICC range of .98 to .99. The intertester reliability showed a CV of 1.3% and an ICC of .99. There was a significant increase in arm volume in men compared with women. There were no significant differences in changes in volume over the 4 weeks. There was a significant greater right arm volume (3.3%) among the right-handed subjects (P<.001). Both intra- and intertester reliability were satisfactory for the SWDI.
Schiffman, Eric L; Truelove, Edmond L; Ohrbach, Richard; Anderson, Gary C; John, Mike T; List, Thomas; Look, John O
2010-01-01
The purpose of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) Validation Project was to assess the diagnostic validity of this examination protocol. The aim of this article is to provide an overview of the project's methodology, descriptive statistics, and data for the study participant sample. This article also details the development of reliable methods to establish the reference standards for assessing criterion validity of the Axis I RDC/TMD diagnoses. The Axis I reference standards were based on the consensus of two criterion examiners independently performing a comprehensive history, clinical examination, and evaluation of imaging. Intersite reliability was assessed annually for criterion examiners and radiologists. Criterion examination reliability was also assessed within study sites. Study participant demographics were comparable to those of participants in previous studies using the RDC/TMD. Diagnostic agreement of the criterion examiners with each other and with the consensus-based reference standards was excellent with all kappas > or = 0.81, except for osteoarthrosis (moderate agreement, k = 0.53). Intrasite criterion examiner agreement with reference standards was excellent (k > or = 0.95). Intersite reliability of the radiologists for detecting computed tomography-disclosed osteoarthrosis and magnetic resonance imaging-disclosed disc displacement was good to excellent (k = 0.71 and 0.84, respectively). The Validation Project study population was appropriate for assessing the reliability and validity of the RDC/TMD Axis I and II. The reference standards used to assess the validity of Axis I TMD were based on reliable and clinically credible methods.
Rubio-Ochoa, J; Benítez-Martínez, J; Lluch, E; Santacruz-Zaragozá, S; Gómez-Contreras, P; Cook, C E
2016-02-01
It has been suggested that differential diagnosis of headaches should consist of a robust subjective examination and a detailed physical examination of the cervical spine. Cervicogenic headache (CGH) is a form of headache that involves referred pain from the neck. To our knowledge, no studies have summarized the reliability and diagnostic accuracy of physical examination tests for CGH. The aim of this study was to summarize the reliability and diagnostic accuracy of physical examination tests used to diagnose CGH. A systematic review following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines was performed in four electronic databases (MEDLINE, Web of Science, Embase and Scopus). Full text reports concerning physical tests for the diagnosis of CGH which reported the clinometric properties for assessment of CGH, were included and screened for methodological quality. Quality Appraisal for Reliability Studies (QAREL) and Quality Assessment of Studies of Diagnostic Accuracy (QUADAS-2) scores were completed to assess article quality. Eight articles were retrieved for quality assessment and data extraction. Studies investigating diagnostic reliability of physical examination tests for CGH scored poorer on methodological quality (higher risk of bias) than those of diagnostic accuracy. There is sufficient evidence showing high levels of reliability and diagnostic accuracy of the selected physical examination tests for the diagnosis of CGH. The cervical flexion-rotation test (CFRT) exhibited both the highest reliability and the strongest diagnostic accuracy for the diagnosis of CGH. Copyright © 2015 Elsevier Ltd. All rights reserved.
Lagarde, Marloes L J; Kamalski, Digna M A; van den Engel-Hoek, Lenie
2016-02-01
To systematically review the available evidence for the reliability and validity of cervical auscultation in diagnosing the several aspects of dysphagia in adults and children suffering from dysphagia. Medline (PubMed), Embase and the Cochrane Library databases. The systematic review was carried out applying the steps of the PRISMA-statement. The methodological quality of the included studies were evaluated using the Dutch 'Cochrane checklist for diagnostic accuracy studies'. A total of 90 articles were identified through the search strategy, and after applying the inclusion and exclusion criteria, six articles were included in this review. In the six studies, 197 patients were assessed with cervical auscultation. Two of the six articles were considered to be of 'good' quality and three studies were of 'moderate' quality. One article was excluded because of a 'poor' methodological quality. Sensitivity ranges from 23%-94% and specificity ranges from 50%-74%. Inter-rater reliability was 'poor' or 'fair' in all studies. The intra-rater reliability shows a wide variance among speech language therapists. In this systematic review, conflicting evidence is found for the validity of cervical auscultation. The reliability of cervical auscultation is insufficient when used as a stand-alone tool in the diagnosis of dysphagia in adults. There is no available evidence for the validity and reliability of cervical auscultation in children. Cervical auscultation should not be used as a stand-alone instrument to diagnose dysphagia. © The Author(s) 2015.
Harris, Joshua D; Erickson, Brandon J; Cvetanovich, Gregory L; Abrams, Geoffrey D; McCormick, Frank M; Gupta, Anil K; Verma, Nikhil N; Bach, Bernard R; Cole, Brian J
2014-02-01
Condition-specific questionnaires are important components in evaluation of outcomes of surgical interventions. No condition-specific study methodological quality questionnaire exists for evaluation of outcomes of articular cartilage surgery in the knee. To develop a reliable and valid knee articular cartilage-specific study methodological quality questionnaire. Cross-sectional study. A stepwise, a priori-designed framework was created for development of a novel questionnaire. Relevant items to the topic were identified and extracted from a recent systematic review of 194 investigations of knee articular cartilage surgery. In addition, relevant items from existing generic study methodological quality questionnaires were identified. Items for a preliminary questionnaire were generated. Redundant and irrelevant items were eliminated, and acceptable items modified. The instrument was pretested and items weighed. The instrument, the MARK score (Methodological quality of ARticular cartilage studies of the Knee), was tested for validity (criterion validity) and reliability (inter- and intraobserver). A 19-item, 3-domain MARK score was developed. The 100-point scale score demonstrated face validity (focus group of 8 orthopaedic surgeons) and criterion validity (strong correlation to Cochrane Quality Assessment score and Modified Coleman Methodology Score). Interobserver reliability for the overall score was good (intraclass correlation coefficient [ICC], 0.842), and for all individual items of the MARK score, acceptable to perfect (ICC, 0.70-1.000). Intraobserver reliability ICC assessed over a 3-week interval was strong for 2 reviewers (≥0.90). The MARK score is a valid and reliable knee articular cartilage condition-specific study methodological quality instrument. This condition-specific questionnaire may be used to evaluate the quality of studies reporting outcomes of articular cartilage surgery in the knee.
Bravo, G; Bragança, S; Arezes, P M; Molenbroek, J F M; Castellucci, H I
2018-05-22
Despite offering many benefits, direct manual anthropometric measurement method can be problematic due to their vulnerability to measurement errors. The purpose of this literature review was to determine, whether or not the currently published anthropometric studies of school children, related to ergonomics, mentioned or evaluated the variables precision, reliability or accuracy in the direct manual measurement method. Two bibliographic databases, and the bibliographic references of all the selected papers were used for finding relevant published papers in the fields considered in this study. Forty-six (46) studies met the criteria previously defined for this literature review. However, only ten (10) studies mentioned at least one of the analyzed variables, and none has evaluated all of them. Only reliability was assessed by three papers. Moreover, in what regards the factors that affect precision, reliability and accuracy, the reviewed papers presented large differences. This was particularly clear in the instruments used for the measurements, which were not consistent throughout the studies. Additionally, it was also clear that there was a lack of information regarding the evaluators' training and procedures for anthropometric data collection, which are assumed to be the most important issues that affect precision, reliability and accuracy. Based on the review of the literature, it was possible to conclude that the considered anthropometric studies had not focused their attention to the analysis of precision, reliability and accuracy of the manual measurement methods. Hence, and with the aim of avoiding measurement errors and misleading data, anthropometric studies should put more efforts and care on testing measurement error and defining the procedures used to collect anthropometric data.
A simulation model for risk assessment of turbine wheels
NASA Technical Reports Server (NTRS)
Safie, Fayssal M.; Hage, Richard T.
1991-01-01
A simulation model has been successfully developed to evaluate the risk of the Space Shuttle auxiliary power unit (APU) turbine wheels for a specific inspection policy. Besides being an effective tool for risk/reliability evaluation, the simulation model also allows the analyst to study the trade-offs between wheel reliability, wheel life, inspection interval, and rejection crack size. For example, in the APU application, sensitivity analysis results showed that the wheel life limit has the least effect on wheel reliability when compared to the effect of the inspection interval and the rejection crack size. In summary, the simulation model developed represents a flexible tool to predict turbine wheel reliability and study the risk under different inspection policies.
Validity and Reliability of Farsi Version of Youth Sport Environment Questionnaire
Eshghi, Mohammad Ali; Kordi, Ramin; Memari, Amir Hossein; Ghaziasgar, Ahmad; Mansournia, Mohammad-Ali; Zamani Sani, Seyed Hojjat
2015-01-01
The Youth Sport Environment Questionnaire (YSEQ) had been developed from Group Environment Questionnaire, a well-known measure of team cohesion. The aim of this study was to adapt and examine the reliability and validity of the Farsi version of the YSEQ. This version was completed by 455 athletes aged 13–17 years. Results of confirmatory factor analysis indicated that two-factor solution showed a good fit to the data. The results also revealed that the Farsi YSEQ showed high internal consistency, test-retest reliability, and good concurrent validity. This study indicated that the Farsi version of the YSEQ is a valid and reliable measure to assess team cohesion in sport setting. PMID:26464900
A simulation model for risk assessment of turbine wheels
NASA Astrophysics Data System (ADS)
Safie, Fayssal M.; Hage, Richard T.
A simulation model has been successfully developed to evaluate the risk of the Space Shuttle auxiliary power unit (APU) turbine wheels for a specific inspection policy. Besides being an effective tool for risk/reliability evaluation, the simulation model also allows the analyst to study the trade-offs between wheel reliability, wheel life, inspection interval, and rejection crack size. For example, in the APU application, sensitivity analysis results showed that the wheel life limit has the least effect on wheel reliability when compared to the effect of the inspection interval and the rejection crack size. In summary, the simulation model developed represents a flexible tool to predict turbine wheel reliability and study the risk under different inspection policies.
Test-retest reliability of myofascial trigger point detection in hip and thigh areas.
Rozenfeld, E; Finestone, A S; Moran, U; Damri, E; Kalichman, L
2017-10-01
Myofascial trigger points (MTrP's) are a primary source of pain in patients with musculoskeletal disorders. Nevertheless, they are frequently underdiagnosed. Reliable MTrP palpation is the necessary for their diagnosis and treatment. The few studies that have looked for intra-tester reliability of MTrPs detection in upper body, provide preliminary evidence that MTrP palpation is reliable. Reliability tests for MTrP palpation on the lower limb have not yet been performed. To evaluate inter- and intra-tester reliability of MTrP recognition in hip and thigh muscles. Reliability study. 21 patients (15 males and 6 females, mean age 21.1 years) referred to the physical therapy clinic, 10 with knee or hip pain and 11 with pain in an upper limb, low back, shin or ankle. Two experienced physical therapists performed the examinations, blinded to the subjects' identity, medical condition and results of the previous MTrP evaluation. Each subject was evaluated four times, twice by each examiner in a random order. Dichotomous findings included a palpable taut band, tenderness, referred pain, and relevance of referred pain to patient's complaint. Based on these, diagnosis of latent MTrP's or active MTrP's was established. The evaluation was performed on both legs and included a total of 16 locations in the following muscles: rectus femoris (proximal), vastus medialis (middle and distal), vastus lateralis (middle and distal) and gluteus medius (anterior, posterior and distal). Inter- and intra-tester reliability (Cohen's kappa (κ)) values for single sites ranged from -0.25 to 0.77. Median intra-tester reliability was 0.45 and 0.46 for latent and active MTrP's, and median inter-tester reliability was 0.51 and 0.64 for latent and active MTrPs, respectively. The examination of the distal vastus medialis was most reliable for latent and active MTrP's (intra-tester k = 0.27-0.77, inter-tester k = 0.77 and intra-tester k = 0.53-0.72, inter-tester k = 0.72, correspondingly). Inter- and intra-tester reliability of active and latent MTrP evaluation was moderate to substantial. Palpation evaluation can be used for clinical diagnosis of MTrP's in the hip and thigh muscles. This study provides evidence that MTrP palpation is a moderately reliable diagnostic tool in the hip and thigh muscles and can be used in clinical practice and research. Copyright © 2017 Elsevier Ltd. All rights reserved.
Test-retest reliability of an fMRI paradigm for studies of cardiovascular reactivity.
Sheu, Lei K; Jennings, J Richard; Gianaros, Peter J
2012-07-01
We examined the reliability of measures of fMRI, subjective, and cardiovascular reactions to standardized versions of a Stroop color-word task and a multisource interference task. A sample of 14 men and 12 women (30-49 years old) completed the tasks on two occasions, separated by a median of 88 days. The reliability of fMRI BOLD signal changes in brain areas engaged by the tasks was moderate, and aggregating fMRI BOLD signal changes across the tasks improved test-retest reliability metrics. These metrics included voxel-wise intraclass correlation coefficients (ICCs) and overlap ratio statistics. Task-aggregated ratings of subjective arousal, valence, and control, as well as cardiovascular reactions evoked by the tasks showed ICCs of 0.57 to 0.87 (ps < .001), indicating moderate-to-strong reliability. These findings support using these tasks as a battery for fMRI studies of cardiovascular reactivity. Copyright © 2012 Society for Psychophysiological Research.
Clinimetric properties of the alberta infant motor scale in infants born preterm.
Pin, Tamis W; de Valle, Katy; Eldridge, Bev; Galea, Mary P
2010-01-01
The Alberta Infant Motor Scale (AIMS) is a standardized motor assessment for young infants. This study aimed to examine the reliability of the AIMS in a group of infants born at or before 29 weeks of gestation. Fifty-nine infants born preterm were recruited. Two experienced pediatric physical therapists participated in this reliability study. Infants were assessed at 4, 8, 12, and 18 months corrected age (CA). Intrarater reliability was high (intraclass correlation coefficient [ICC] > or =0.99). The ICC for interrater reliability varied from 0.85 to 0.97. The ICC was low at 4 and 18 months CA. The AIMS is reliable in evaluating motor development in infants born preterm. Clinicians should be cautious about using the AIMS in infants at very young ages and those approaching independent ambulation. Accurate placement of the window on a movement repertoire is crucial. Attention is required when using the AIMS in infants developing atypically.
Mohseni Bandpei, Mohammad A; Rahmani, Nahid; Majdoleslam, Basir; Abdollahi, Iraj; Ali, Shabnam Shah; Ahmad, Ashfaq
2014-09-01
The purpose of this study was to review the literature to determine whether surface electromyography (EMG) is a reliable tool to assess paraspinal muscle fatigue in healthy subjects and in patients with low back pain (LBP). A literature search for the period of 2000 to 2012 was performed, using PubMed, ProQuest, Science Direct, EMBASE, OVID, CINAHL, and MEDLINE databases. Electromyography, reliability, median frequency, paraspinal muscle, endurance, low back pain, and muscle fatigue were used as keywords. The literature search yielded 178 studies using the above keywords. Twelve articles were selected according to the inclusion criteria of the study. In 7 of the 12 studies, the surface EMG was only applied in healthy subjects, and in 5 studies, the reliability of surface EMG was investigated in patients with LBP or a comparison with a control group. In all of these studies, median frequency was shown to be a reliable EMG parameter to assess paraspinal muscles fatigue. There was a wide variation among studies in terms of methodology, surface EMG parameters, electrode location, procedure, and homogeneity of the study population. The results suggest that there seems to be a convincing body of evidence to support the merit of surface EMG in the assessment of paraspinal muscle fatigue in healthy subject and in patients with LBP. Copyright © 2014 National University of Health Sciences. Published by Elsevier Inc. All rights reserved.
de Albuquerque, Priscila Maria Nascimento Martins; de Alencar, Geisa Guimarães; de Oliveira, Daniela Araújo; de Siqueira, Gisela Rocha
2018-01-01
The aim of this study was to examine and interpret the concordance, accuracy, and reliability of photogrammetric protocols available in the literature for evaluating cervical lordosis in an adult population aged 18 to 59 years. A systematic search of 6 electronic databases (MEDLINE via PubMed, LILACS, CINAHL, Scopus, ScienceDirect, and Web of Science) located studies that assessed the reliability and/or concordance and/or accuracy of photogrammetric protocols for evaluating cervical lordosis, compared with radiography. Articles published through April 2016 were selected. Two independent reviewers used a critical appraisal tool (QUADAS and QAREL) to assess the quality of the selected studies. Two studies were included in the review and had high levels of reliability (intraclass correlation coefficient: 0.974-0.98). Only 1 study assessed the concordance between the methods, which was calculated using Pearson's correlation coefficient. To date, the accuracy of photogrammetry has not been investigated thoroughly. We encountered no study in the literature that investigated the accuracy of photogrammetry in diagnosing hyperlordosis of cervical spine. However, both current studies report high levels of intra- and interrater reliability. To increase the level of evidence of photogrammetry in the evaluation of cervical lordosis, it is necessary to conduct further studies using a larger sample to increase the external validity of the findings. Copyright © 2018. Published by Elsevier Inc.
Cerin, Ester; Sit, Cindy H P; Huang, Ya-Jun; Barnett, Anthony; Macfarlane, Duncan J; Wong, Stephen S H
2014-06-06
Physical activity and sedentary behaviour are important contributors to adolescents' health. These behaviours may be affected by the school and neighbourhood built environments. However, current evidence on such effects is mainly limited to Western countries. The International Physical Activity and the Environment Network (IPEN)-Adolescent study aims to examine associations of the built environment with adolescent physical activity and sedentary behaviour across five continents.We report on the repeatability of measures of in-school and out-of school physical activity, plus measures of out-of-school sedentary and travel behaviours adopted by the IPEN - Adolescent study and adapted for Chinese-speaking Hong Kong adolescents participating in the international Healthy environments and active living in teenagers-(Hong Kong) [iHealt(H)] study, which is part of IPEN-Adolescent. Items gauging in-school physical activity and out-of-school physical activity, and out-of-school sedentary and travel behaviours developed for the IPEN - Adolescent study were translated from English into Chinese, adapted, and pilot tested. Sixty-eight Chinese-speaking 12-17 year old secondary school students (36 boys; 32 girls) residing in areas of Hong Kong differing in transport-related walkability were recruited. They self-completed the survey items twice, 8-16 days apart. Test-retest reliability was assessed for the whole sample and by gender using one-way random effects intra-class correlation coefficients (ICC). Test-retest reliability of items with restricted variability was assessed using percentage agreement. Overall test-retest reliability of items and scales was moderate to excellent (ICC = 0.47-0.92). Items with restricted variability in responses had a high percentage agreement (92%-100%). Test-retest reliability was similar in girls and boys, with the exception of daily hours of homework (reliability higher in girls) and number of school-based sports teams or after-school physical activity classes (reliability higher in boys). The translated and adapted self-report measures of physical activity, sedentary and travel behaviours used in the iHealt(H) study are sufficiently reliable. Levels of reliability are comparable or slightly higher than those observed for the original measures.
Moore, Martha; Barker, Karen
2017-09-11
The four square step test (FSST) was first validated in healthy older adults to provide a measure of dynamic standing balance and mobility. The FSST has since been used in a variety of patient populations. The purpose of this systematic review is to determine the validity and reliability of the FSST in these different adult patient populations. The literature search was conducted to highlight all the studies that measured validity and reliability of the FSST. Six electronic databases were searched including AMED, CINAHL, MEDLINE, PEDro, Web of Science and Google Scholar. Grey literature was also searched for any documents relevant to the review. Two independent reviewers carried out study selection and quality assessment. The methodological quality was assessed using the QUADAS-2 tool, which is a validated tool for the quality assessment of diagnostic accuracy studies, and the COSMIN four-point checklist, which contains standards for evaluating reliability studies on the measurement properties of health instruments. Fifteen studies were reviewed studying community-dwelling older adults, Parkinson's disease, Huntington's disease, multiple sclerosis, vestibular disorders, post stroke, post unilateral transtibial amputation, knee pain and hip osteoarthritis. Three of the studies were of moderate methodological quality scoring low in risk of bias and applicability for all domains in the QUADAS-2 tool. Three studies scored "fair" on the COSMIN four-point checklist for the reliability components. The concurrent validity of the FSST was measured in nine of the studies with moderate to strong correlations being found. Excellent Intraclass Correlation Coefficients were found between physiotherapists carrying out the tests (ICC = .99) with good to excellent test-retest reliability shown in nine of the studies (ICC = .73-.98). The FSST may be an effective and valid tool for measuring dynamic balance and a participants' falls risk. It has been shown to have strong correlations with other measures of balance and mobility with good reliability shown in a number of populations. However, the quality of the papers reviewed was variable with key factors, such as sample size and test set up, needing to be addressed before the tool can be confidently used in these specified populations.
A study on the real-time reliability of on-board equipment of train control system
NASA Astrophysics Data System (ADS)
Zhang, Yong; Li, Shiwei
2018-05-01
Real-time reliability evaluation is conducive to establishing a condition based maintenance system for the purpose of guaranteeing continuous train operation. According to the inherent characteristics of the on-board equipment, the connotation of reliability evaluation of on-board equipment is defined and the evaluation index of real-time reliability is provided in this paper. From the perspective of methodology and practical application, the real-time reliability of the on-board equipment is discussed in detail, and the method of evaluating the realtime reliability of on-board equipment at component level based on Hidden Markov Model (HMM) is proposed. In this method the performance degradation data is used directly to realize the accurate perception of the hidden state transition process of on-board equipment, which can achieve a better description of the real-time reliability of the equipment.
Electrical service reliability: the customer perspective
DOE Office of Scientific and Technical Information (OSTI.GOV)
Samsa, M.E.; Hub, K.A.; Krohm, G.C.
1978-09-01
Electric-utility-system reliability criteria have traditionally been established as a matter of utility policy or through long-term engineering practice, generally with no supportive customer cost/benefit analysis as justification. This report presents results of an initial study of the customer perspective toward electric-utility-system reliability, based on critical review of over 20 previous and ongoing efforts to quantify the customer's value of reliable electric service. A possible structure of customer classifications is suggested as a reasonable level of disaggregation for further investigation of customer value, and these groups are characterized in terms of their electricity use patterns. The values that customers assign tomore » reliability are discussed in terms of internal and external cost components. A list of options for effecting changes in customer service reliability is set forth, and some of the many policy issues that could alter customer-service reliability are identified.« less
Creating Highly Reliable Accountable Care Organizations.
Vogus, Timothy J; Singer, Sara J
2016-12-01
Accountable Care Organizations' (ACOs) pursuit of the triple aim of higher quality, lower cost, and improved population health has met with mixed results. To improve the design and implementation of ACOs we look to organizations that manage similarly complex, dynamic, and tightly coupled conditions while sustaining exceptional performance known as high-reliability organizations. We describe the key processes through which organizations achieve reliability, the leadership and organizational practices that enable it, and the role that professionals can play when charged with enacting it. Specifically, we present concrete practices and processes from health care organizations pursuing high-reliability and from early ACOs to illustrate how the triple aim may be met by cultivating mindful organizing, practicing reliability-enhancing leadership, and identifying and supporting reliability professionals. We conclude by proposing a set of research questions to advance the study of ACOs and high-reliability research. © The Author(s) 2016.
Cook, David A; Reed, Darcy A
2015-08-01
The Medical Education Research Study Quality Instrument (MERSQI) and the Newcastle-Ottawa Scale-Education (NOS-E) were developed to appraise methodological quality in medical education research. The study objective was to evaluate the interrater reliability, normative scores, and between-instrument correlation for these two instruments. In 2014, the authors searched PubMed and Google for articles using the MERSQI or NOS-E. They obtained or extracted data for interrater reliability-using the intraclass correlation coefficient (ICC)-and normative scores. They calculated between-scale correlation using Spearman rho. Each instrument contains items concerning sampling, controlling for confounders, and integrity of outcomes. Interrater reliability for overall scores ranged from 0.68 to 0.95. Interrater reliability was "substantial" or better (ICC > 0.60) for nearly all domain-specific items on both instruments. Most instances of low interrater reliability were associated with restriction of range, and raw agreement was usually good. Across 26 studies evaluating published research, the median overall MERSQI score was 11.3 (range 8.9-15.1, of possible 18). Across six studies, the median overall NOS-E score was 3.22 (range 2.08-3.82, of possible 6). Overall MERSQI and NOS-E scores correlated reasonably well (rho 0.49-0.72). The MERSQI and NOS-E are useful, reliable, complementary tools for appraising methodological quality of medical education research. Interpretation and use of their scores should focus on item-specific codes rather than overall scores. Normative scores should be used for relative rather than absolute judgments because different research questions require different study designs.
Ratter, Julia; Radlinger, Lorenz; Lucas, Cees
2014-09-01
Are submaximal and maximal exercise tests reliable, valid and acceptable in people with chronic pain, fibromyalgia and fatigue disorders? Systematic review of studies of the psychometric properties of exercise tests. People older than 18 years with chronic pain, fibromyalgia and chronic fatigue disorders. Studies of the measurement properties of tests of physical capacity in people with chronic pain, fibromyalgia or chronic fatigue disorders were included. Studies were required to report: reliability coefficients (intraclass correlation coefficient, alpha reliability coefficient, limits of agreements and Bland-Altman plots); validity coefficients (intraclass correlation coefficient, Spearman's correlation, Kendal T coefficient, Pearson's correlation); or dropout rates. Fourteen studies were eligible: none had low risk of bias, 10 had unclear risk of bias and four had high risk of bias. The included studies evaluated: Åstrand test; modified Åstrand test; Lean body mass-based Åstrand test; submaximal bicycle ergometer test following another protocol other than Åstrand test; 2-km walk test; 5-minute, 6-minute and 10-minute walk tests; shuttle walk test; and modified symptom-limited Bruce treadmill test. None of the studies assessed maximal exercise tests. Where they had been tested, reliability and validity were generally high. Dropout rates were generally acceptable. The 2-km walk test was not recommended in fibromyalgia. Moderate evidence was found for reliability, validity and acceptability of submaximal exercise tests in patients with chronic pain, fibromyalgia or chronic fatigue. There is no evidence about maximal exercise tests in patients with chronic pain, fibromyalgia and chronic fatigue. Copyright © 2014. Published by Elsevier B.V.
Test-Retest Reliability of a Survey to Measure Transport-Related Physical Activity in Adults
ERIC Educational Resources Information Center
Badland, Hannah; Schofield, Grant
2006-01-01
The present research details test-retest reliability of a newly developed, telephone-administered TPA survey for adults. This instrument examines barriers, perceptions, and current travel behaviors to place of work/study and local convenience shops. Demonstrated test-retest reliability of the Active Friendly Environments-Transport-Related Physical…
A Latent Class Approach to Estimating Test-Score Reliability
ERIC Educational Resources Information Center
van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas
2011-01-01
This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…
Binge Eating Disorder: Reliability and Validity of a New Diagnostic Category.
ERIC Educational Resources Information Center
Brody, Michelle L.; And Others
1994-01-01
Examined reliability and validity of binge eating disorder (BED), proposed for inclusion in Diagnostic and Statistical Manual of Mental Disorders (DSM), fourth edition. Interrater reliability of BED diagnosis compared favorably with that of most diagnoses in DSM revised third edition. Study comparing obese individuals with and without BED and…
Is School-Based Height and Weight Screening of Elementary Students Private and Reliable?
ERIC Educational Resources Information Center
Stoddard, Sarah A.; Kubik, Martha Y.; Skay, Carol
2008-01-01
The Institute of Medicine recommends school-based body mass index (BMI) screening as an obesity prevention strategy. While school nurses have provided height/weight screening for years, little has been published describing measurement reliability or process. This study evaluated the reliability of height/weight measures collected by school nurses…
Test-Retest Reliability and Predictive Validity of the Implicit Association Test in Children
ERIC Educational Resources Information Center
Rae, James R.; Olson, Kristina R.
2018-01-01
The Implicit Association Test (IAT) is increasingly used in developmental research despite minimal evidence of whether children's IAT scores are reliable across time or predictive of behavior. When test-retest reliability and predictive validity have been assessed, the results have been mixed, and because these studies have differed on many…
A Note on Structural Equation Modeling Estimates of Reliability
ERIC Educational Resources Information Center
Yang, Yanyun; Green, Samuel B.
2010-01-01
Reliability can be estimated using structural equation modeling (SEM). Two potential problems with this approach are that estimates may be unstable with small sample sizes and biased with misspecified models. A Monte Carlo study was conducted to investigate the quality of SEM estimates of reliability by themselves and relative to coefficient…
ERIC Educational Resources Information Center
Worrell, Frank C.; Mello, Zena R.
2007-01-01
In this study, the authors examined the reliability, structural validity, and concurrent validity of Zimbardo Time Perspective Inventory (ZTPI) scores in a group of 815 academically talented adolescents. Reliability estimates of the purported factors' scores were in the low to moderate range. Exploratory factor analysis supported a five-factor…
The Interplay of Surface Mount Solder Joint Quality and Reliability of Low Volume SMAs
NASA Technical Reports Server (NTRS)
Ghaffarian, R.
1997-01-01
Spacecraft electronics including those used at the Jet Propulsion Laboratory (JPL), demand production of highly reliable assemblies. JPL has recently completed an extensive study, funded by NASA's code Q, of the interplay between manufacturing defects and reliability of ball grid array (BGA) and surface mount electronic components.
ERIC Educational Resources Information Center
Setzer, J. Carl; He, Yi
2009-01-01
Reliability Analysis for the Internationally Administered 2002 Series GED (General Educational Development) Tests Reliability refers to the consistency, or stability, of test scores when the authors administer the measurement procedure repeatedly to groups of examinees (American Educational Research Association [AERA], American Psychological…
Reliability of the Test of Integrated Language and Literacy Skills (TILLS)
ERIC Educational Resources Information Center
Mailend, Marja-Liisa; Plante, Elena; Anderson, Michele A.; Applegate, E. Brooks; Nelson, Nickola W.
2016-01-01
Background: As new standardized tests become commercially available, it is critical that clinicians have access to the information about a test's psychometric properties, including aspects of reliability. Aims: The purpose of the three studies reported in this article was to investigate the reliability of a new test, the Test of Integrated…
Assessing the Reliability of Student Evaluations of Teaching: Choosing the Right Coefficient
ERIC Educational Resources Information Center
Morley, Donald
2014-01-01
Many of the studies used to support the claim that student evaluations of teaching are reliable measures of teaching effectiveness have frequently calculated inappropriate reliability coefficients. This paper points to three coefficients that would be appropriate depending on if student evaluations were used for formative or summative purposes.…
ERIC Educational Resources Information Center
Md Desa, Zairul Nor Deana
2012-01-01
In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…
ERIC Educational Resources Information Center
Arce-Ferrer, Alvaro J.; Castillo, Irene Borges
2007-01-01
The use of face-to-face interviews is controversial for college admissions decisions in light of the lack of availability of validity and reliability evidence for most college admission processes. This study investigated reliability and incremental predictive validity of a face-to-face postgraduate college admission interview with a sample of…
Reliability and precision of pellet-group counts for estimating landscape-level deer density
David S. deCalesta
2013-01-01
This study provides hitherto unavailable methodology for reliably and precisely estimating deer density within forested landscapes, enabling quantitative rather than qualitative deer management. Reliability and precision of the deer pellet-group technique were evaluated in 1 small and 2 large forested landscapes. Density estimates, adjusted to reflect deer harvest and...
ERIC Educational Resources Information Center
Erford, Bradley T.; Alsamadi, Silvana C.
2012-01-01
Score reliability and validity of parent responses concerning their 10- to 17-year-old students were analyzed using the Screening Test for Emotional Problems-Parent Report (STEP-P), which assesses a variety of emotional problems classified under the Individuals with Disabilities Education Improvement Act. Score reliability, convergent, and…
An Integrated Approach to Establish Validity and Reliability of Reading Tests
ERIC Educational Resources Information Center
Razi, Salim
2012-01-01
This study presents the processes of developing and establishing reliability and validity of a reading test by administering an integrative approach as conventional reliability and validity measures superficially reveals the difficulty of a reading test. In this respect, analysing vocabulary frequency of the test is regarded as a more eligible way…
ERIC Educational Resources Information Center
Klesius, Janell P.; Homan, Susan P.
1985-01-01
The article reviews validity and reliability studies on the informal reading inventory, a diagnostic instrument to identify reading grade-level placement and strengths and weaknesses in work recognition and comprehension. Gives suggestions to improve the validity and reliability of existing inventories and to evaluate them in newly published…
Absolute and Relative Reliability of Percentage of Syllables Stuttered and Severity Rating Scales
ERIC Educational Resources Information Center
Karimi, Hamid; O'Brian, Sue; Onslow, Mark; Jones, Mark
2014-01-01
Purpose: Percentage of syllables stuttered (%SS) and severity rating (SR) scales are measures in common use to quantify stuttering severity and its changes during basic and clinical research conditions. However, their reliability has not been assessed with indices measuring both relative and absolute reliability. This study was designed to provide…
Wang, Wei; Huang, Li; Liang, Xuedong
2018-01-06
This paper investigates the reliability of complex emergency logistics networks, as reliability is crucial to reducing environmental and public health losses in post-accident emergency rescues. Such networks' statistical characteristics are analyzed first. After the connected reliability and evaluation indices for complex emergency logistics networks are effectively defined, simulation analyses of network reliability are conducted under two different attack modes using a particular emergency logistics network as an example. The simulation analyses obtain the varying trends in emergency supply times and the ratio of effective nodes and validates the effects of network characteristics and different types of attacks on network reliability. The results demonstrate that this emergency logistics network is both a small-world and a scale-free network. When facing random attacks, the emergency logistics network steadily changes, whereas it is very fragile when facing selective attacks. Therefore, special attention should be paid to the protection of supply nodes and nodes with high connectivity. The simulation method provides a new tool for studying emergency logistics networks and a reference for similar studies.
Cann, A P; Connolly, M; Ruuska, R; MacNeil, M; Birmingham, T B; Vandervoort, A A; Callaghan, J P
2008-04-01
Despite the ongoing health problem of repetitive strain injuries, there are few tools currently available for ergonomic applications evaluating cumulative loading that have well-documented evidence of reliability and validity. The purpose of this study was to determine the inter-rater reliability of a posture matching based analysis tool (3DMatch, University of Waterloo) for predicting cumulative and peak spinal loads. A total of 30 food service workers were each videotaped for a 1-h period while performing typical work activities and a single work task was randomly selected from each for analysis by two raters. Inter-rater reliability was determined using intraclass correlation coefficients (ICC) model 2,1 and standard errors of measurement for cumulative and peak spinal and shoulder loading variables across all subjects. Overall, 85.5% of variables had moderate to excellent inter-rater reliability, with ICCs ranging from 0.30-0.99 for all cumulative and peak loading variables. 3DMatch was found to be a reliable ergonomic tool when more than one rater is involved.
On the Simulation-Based Reliability of Complex Emergency Logistics Networks in Post-Accident Rescues
Wang, Wei; Huang, Li; Liang, Xuedong
2018-01-01
This paper investigates the reliability of complex emergency logistics networks, as reliability is crucial to reducing environmental and public health losses in post-accident emergency rescues. Such networks’ statistical characteristics are analyzed first. After the connected reliability and evaluation indices for complex emergency logistics networks are effectively defined, simulation analyses of network reliability are conducted under two different attack modes using a particular emergency logistics network as an example. The simulation analyses obtain the varying trends in emergency supply times and the ratio of effective nodes and validates the effects of network characteristics and different types of attacks on network reliability. The results demonstrate that this emergency logistics network is both a small-world and a scale-free network. When facing random attacks, the emergency logistics network steadily changes, whereas it is very fragile when facing selective attacks. Therefore, special attention should be paid to the protection of supply nodes and nodes with high connectivity. The simulation method provides a new tool for studying emergency logistics networks and a reference for similar studies. PMID:29316614
[Santa Claus is perceived as reliable and friendly: results of the Danish Christmas 2013 survey].
Amin, Faisal Mohammad; West, Anders Sode; Jørgensen, Carina Sleiborg; Simonsen, Sofie Amalie; Lindberg, Ulrich; Tranum-Jensen, Jørgen; Hougaard, Anders
2013-12-02
Several studies have indicated that the population in general perceives doctors as reliable. In the present study perceptions of reliability and kindness attributed to another socially significant archetype, Santa Claus, have been comparatively examined in relation to the doctor. In all, 52 randomly chosen participants were shown a film, where a narrator dressed either as Santa Claus or as a doctor tells an identical story. Structured interviews were then used to assess the subjects' perceptions of reliability and kindness in relation to the narrator's appearance. We found a strong inclination for Santa Claus being perceived as friendlier than the doctor (p = 0.053). However, there was no significant difference in the perception of reliability between Santa Claus and the doctor (p = 0.524). The positive associations attributed to Santa Claus probably cause that he is perceived friendlier than the doctor who may be associated with more serious and unpleasant memories of illness and suffering. Surprisingly, and despite him being an imaginary person, Santa Claus was assessed as being as reliable as the doctor.
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies.
Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry; Kunz, Regina
2017-01-25
To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Systematic review and narrative synthesis of reproducibility studies. Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations. : Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ-0.10). Inter-rater reliability was poor in six studies (37%) and excellent in only two (13%). This contrasts with studies conducted in the research setting, where the median inter-rater reliability was 0.76 (range 0.91-0.53), and 71% (5/7) studies achieved excellent inter-rater reliability. Reliability between assessing professionals was higher when the evaluation was guided by a standardised instrument (23 studies, P=0.006). No such association was detected for subjective or chronic health conditions or the studies' generalisability to real world evaluation of disability (P=0.46, 0.45, and 0.65, respectively). Despite their common use and far reaching consequences for workers claiming disabling injury or illness, research on the reliability of medical evaluations of disability for work is limited and indicates high variation in judgments among assessing professionals. Standardising the evaluation process could improve reliability. Development and testing of instruments and structured approaches to improve reliability in evaluation of disability are urgently needed. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Javanshir, Khodabakhsh; Mohseni-Bandpei, Mohammad Ali; Rezasoltani, Asghar; Amiri, Mohsen; Rahgozar, Mehdi
2011-01-01
In this study, the reliability of the longus colli muscle (LCM) size was assessed in a relaxed state by a real time ultrasonography (US) device in a group of healthy subjects and a group of patients with chronic neck pain. Fifteen healthy subjects (19-41 years old) and 10 patients with chronic neck pain (27-44 years old) were recruited for the purpose of this study. LCM size was measured at the level of thyroid cartilage. Two images were taken on the same day with an hour interval to assess the within day reliability and the third image was taken 1 week later to determine between days reliability. Cross sectional area (CSA), anterior posterior dimension (APD), and lateral dimension (LD) were measured each time. The shape ratio was calculated as LD/APD. Intraclass correlation coefficients (ICC) and standard error of measurement (SEM) were computed for data analysis. The ICC of left and right CSA for within day and between days reliability in healthy subjects were (0.90, 0.93) and (0.85, 0.82), respectively. The ICC of left and right CSA for within day and between days reliability in patients with neck pain were (0.86, 0.82) and (0.76, 0.81), respectively. The results indicated that US could be used as a reliable tool to measure the LCM dimensions in healthy subjects and patients with chronic neck pain. Copyright © 2009 Elsevier Ltd. All rights reserved.
Meirte, J; Moortgat, P; Truijen, S; Maertens, K; Lafaire, C; De Cuyper, L; Hubens, G; Van Daele, U
2015-09-01
Burn scars are frequently accompanied with sensory deficits often remaining present months or even years after injury. Clinimetric properties of assessment tools remain understudied within burn literature. Tactile sense of touch can be examined with the touch pressure threshold (TPT) method using the Semmes Weinstein monofilament test (SWMT). There is in recent research no consensus on the exact measurement procedure when using the SWMT. The aim of this paper was to determine the interrater and intrarater reliability of TPT within burn scars and healthy controls using the 'ascending descending' measurement procedure. We used the newly developed guidelines for reporting reliability and agreement studies (GRRAS) as a basis to report this reliability study. In total 36 individuals were tested; a healthy control group and a scar group. The interrater reliability was excellent in the scar group (ICC=0.908/SEM=0.21) and fair to good in the control group (ICC=0.731/SEM=0.12). In the scar group intrarater ICC value was excellent (ICC=0.822/SEM=0.33). Within the control group also an excellent intrarater reliability (ICC=0.807/SEM=0.27) was found. In conclusion this study shows that the SWMT with the 'ascending descending' measurement procedure is a feasible and reliable objective measure to evaluate TPT in (older) upper extremities burn scars as well as in healthy skin. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Nascimento-Ferreira, Marcus Vinícius; De Moraes, Augusto César Ferreira; Toazza-Oliveira, Paulo Vinícius; Forjaz, Claudia L M; Aristizabal, Juan Carlos; Santaliesra-Pasías, Alba M; Lepera, Candela; Nascimento-Junior, Walter Viana; Skapino, Estela; Delgado, Carlos Alberto; Moreno, Luis Alberto; Carvalho, Heráclito Barbosa
2018-03-01
The objective of this article is to test the reliability and validity of the new and innovative physical activity (PA) questionnaire. Subsamples from the South American Youth/Child Cardiovascular and Environment Study (SAYCARE) study were included to examine its reliability (children: n = 161; adolescents: n = 177) and validity (children: n = 82; adolescents: n = 60). The questionnaire consists of three dimensions of PA (leisure, active commuting, and school) performed during the last week. To assess its validity, the subjects wore accelerometers for at least 3 days and 8 h/d (at least one weekend day). The reliability was analyzed by correlation coefficients. In addition, Bland-Altman analysis and a multilevel regression were applied to estimate the measurement bias, limits of agreement, and influence of contextual variables. In children, the questionnaire showed consistent reliability (ρ = 0.56) and moderate validity (ρ = 0.46), and the contextual variable variance explained 43.0% with -22.9 min/d bias. In adolescents, the reliability was higher (ρ = 0.76) and the validity was almost excellent (ρ = 0.88), with 66.7% of the variance explained by city level with 16.0 min/d PA bias. The SAYCARE PA questionnaire shows acceptable (in children) to strong (in adolescents) reliability and strong validity in the measurement of PA in the pediatric population from low- to middle-income countries. © 2018 The Obesity Society.
McCool, Megan E.; Wahl, Josepha; Schlecht, Inga; Apfelbacher, Christian
2015-01-01
Patients actively seek information about how to cope with their health problems, but the quality of the information available varies. A number of instruments have been developed to assess the quality of patient information, primarily though in English. Little is known about the reliability of these instruments when applied to patient information in German. The objective of our study was to investigate and compare the reliability of two validated instruments, DISCERN and EQIP, in order to determine which of these instruments is better suited for a further study pertaining to the quality of information available to German patients with eczema. Two independent raters evaluated a random sample of 20 informational brochures in German. All the brochures addressed eczema as a disorder and/or therapy options and care. Intra-rater and inter-rater reliability were assessed by calculating intra-class correlation coefficients, agreement was tested with weighted kappas, and the correlation of the raters’ scores for each instrument was measured with Pearson’s correlation coefficient. DISCERN demonstrated substantial intra- and inter-rater reliability. It also showed slightly better agreement than EQIP. There was a strong correlation of the raters’ scores for both instruments. The findings of this study support the reliability of both DISCERN and EQIP. However, based on the results of the inter-rater reliability, agreement and correlation analyses, we consider DISCERN to be the more precise tool for our project on patient information concerning the treatment and care of eczema. PMID:26440612
McCool, Megan E; Wahl, Josepha; Schlecht, Inga; Apfelbacher, Christian
2015-01-01
Patients actively seek information about how to cope with their health problems, but the quality of the information available varies. A number of instruments have been developed to assess the quality of patient information, primarily though in English. Little is known about the reliability of these instruments when applied to patient information in German. The objective of our study was to investigate and compare the reliability of two validated instruments, DISCERN and EQIP, in order to determine which of these instruments is better suited for a further study pertaining to the quality of information available to German patients with eczema. Two independent raters evaluated a random sample of 20 informational brochures in German. All the brochures addressed eczema as a disorder and/or therapy options and care. Intra-rater and inter-rater reliability were assessed by calculating intra-class correlation coefficients, agreement was tested with weighted kappas, and the correlation of the raters' scores for each instrument was measured with Pearson's correlation coefficient. DISCERN demonstrated substantial intra- and inter-rater reliability. It also showed slightly better agreement than EQIP. There was a strong correlation of the raters' scores for both instruments. The findings of this study support the reliability of both DISCERN and EQIP. However, based on the results of the inter-rater reliability, agreement and correlation analyses, we consider DISCERN to be the more precise tool for our project on patient information concerning the treatment and care of eczema.
Reliability Issues and Solutions in Flexible Electronics Under Mechanical Fatigue
NASA Astrophysics Data System (ADS)
Yi, Seol-Min; Choi, In-Suk; Kim, Byoung-Joon; Joo, Young-Chang
2018-07-01
Flexible devices are of significant interest due to their potential expansion of the application of smart devices into various fields, such as energy harvesting, biological applications and consumer electronics. Due to the mechanically dynamic operations of flexible electronics, their mechanical reliability must be thoroughly investigated to understand their failure mechanisms and lifetimes. Reliability issue caused by bending fatigue, one of the typical operational limitations of flexible electronics, has been studied using various test methodologies; however, electromechanical evaluations which are essential to assess the reliability of electronic devices for flexible applications had not been investigated because the testing method was not established. By employing the in situ bending fatigue test, we has studied the failure mechanism for various conditions and parameters, such as bending strain, fatigue area, film thickness, and lateral dimensions. Moreover, various methods for improving the bending reliability have been developed based on the failure mechanism. Nanostructures such as holes, pores, wires and composites of nanoparticles and nanotubes have been suggested for better reliability. Flexible devices were also investigated to find the potential failures initiated by complex structures under bending fatigue strain. In this review, the recent advances in test methodology, mechanism studies, and practical applications are introduced. Additionally, perspectives including the future advance to stretchable electronics are discussed based on the current achievements in research.
Li, Qi; Yu, Hongtao; Wu, Yan; Gao, Ning
2016-08-26
The integration of multiple sensory inputs is essential for perception of the external world. The spatial factor is a fundamental property of multisensory audiovisual integration. Previous studies of the spatial constraints on bimodal audiovisual integration have mainly focused on the spatial congruity of audiovisual information. However, the effect of spatial reliability within audiovisual information on bimodal audiovisual integration remains unclear. In this study, we used event-related potentials (ERPs) to examine the effect of spatial reliability of task-irrelevant sounds on audiovisual integration. Three relevant ERP components emerged: the first at 140-200ms over a wide central area, the second at 280-320ms over the fronto-central area, and a third at 380-440ms over the parieto-occipital area. Our results demonstrate that ERP amplitudes elicited by audiovisual stimuli with reliable spatial relationships are larger than those elicited by stimuli with inconsistent spatial relationships. In addition, we hypothesized that spatial reliability within an audiovisual stimulus enhances feedback projections to the primary visual cortex from multisensory integration regions. Overall, our findings suggest that the spatial linking of visual and auditory information depends on spatial reliability within an audiovisual stimulus and occurs at a relatively late stage of processing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Dental hygiene faculty calibration in the evaluation of calculus detection.
Garland, Kandis V; Newell, Kathleen J
2009-03-01
The purpose of this pilot study was to explore the impact of faculty calibration training on intra- and interrater reliability regarding calculus detection. After IRB approval, twelve dental hygiene faculty members were recruited from a pool of twenty-two for voluntary participation and randomized into two groups. All subjects provided two pre- and two posttest scorings of calculus deposits on each of three typodonts by recording yes or no indicating if they detected calculus. Accuracy and consistency of calculus detection were evaluated using an answer key. The experimental group received three two-hour training sessions to practice a prescribed exploring sequence and technique for calculus detection. Participants immediately corrected their answers, received feedback from the trainer, and reconciled missed areas. Intra- and interrater reliability (pre- and posttest) was determined using Cohen's Kappa and compared between groups using repeated measures (split-plot) ANOVA. The groups did not differ from pre- to posttraining (intrarater reliability p=0.64; interrater reliability p=0.20). Training had no effect on reliability levels for simulated calculus detection in this study. Recommendations for future studies of faculty calibration when evaluating students include using patients for assessing rater reliability, employing larger samples at multiple sites, and assessing the impact on students' attitudes and learning outcomes.
NASA Astrophysics Data System (ADS)
Franck, Bas A. M.; Dreschler, Wouter A.; Lyzenga, Johannes
2004-12-01
In this study we investigated the reliability and convergence characteristics of an adaptive multidirectional pattern search procedure, relative to a nonadaptive multidirectional pattern search procedure. The procedure was designed to optimize three speech-processing strategies. These comprise noise reduction, spectral enhancement, and spectral lift. The search is based on a paired-comparison paradigm, in which subjects evaluated the listening comfort of speech-in-noise fragments. The procedural and nonprocedural factors that influence the reliability and convergence of the procedure are studied using various test conditions. The test conditions combine different tests, initial settings, background noise types, and step size configurations. Seven normal hearing subjects participated in this study. The results indicate that the reliability of the optimization strategy may benefit from the use of an adaptive step size. Decreasing the step size increases accuracy, while increasing the step size can be beneficial to create clear perceptual differences in the comparisons. The reliability also depends on starting point, stop criterion, step size constraints, background noise, algorithms used, as well as the presence of drifting cues and suboptimal settings. There appears to be a trade-off between reliability and convergence, i.e., when the step size is enlarged the reliability improves, but the convergence deteriorates. .
Reliability Issues and Solutions in Flexible Electronics Under Mechanical Fatigue
NASA Astrophysics Data System (ADS)
Yi, Seol-Min; Choi, In-Suk; Kim, Byoung-Joon; Joo, Young-Chang
2018-03-01
Flexible devices are of significant interest due to their potential expansion of the application of smart devices into various fields, such as energy harvesting, biological applications and consumer electronics. Due to the mechanically dynamic operations of flexible electronics, their mechanical reliability must be thoroughly investigated to understand their failure mechanisms and lifetimes. Reliability issue caused by bending fatigue, one of the typical operational limitations of flexible electronics, has been studied using various test methodologies; however, electromechanical evaluations which are essential to assess the reliability of electronic devices for flexible applications had not been investigated because the testing method was not established. By employing the in situ bending fatigue test, we has studied the failure mechanism for various conditions and parameters, such as bending strain, fatigue area, film thickness, and lateral dimensions. Moreover, various methods for improving the bending reliability have been developed based on the failure mechanism. Nanostructures such as holes, pores, wires and composites of nanoparticles and nanotubes have been suggested for better reliability. Flexible devices were also investigated to find the potential failures initiated by complex structures under bending fatigue strain. In this review, the recent advances in test methodology, mechanism studies, and practical applications are introduced. Additionally, perspectives including the future advance to stretchable electronics are discussed based on the current achievements in research.
A proposed method to investigate reliability throughout a questionnaire.
Wentzel-Larsen, Tore; Norekvål, Tone M; Ulvik, Bjørg; Nygård, Ottar; Pripp, Are H
2011-10-05
Questionnaires are used extensively in medical and health care research and depend on validity and reliability. However, participants may differ in interest and awareness throughout long questionnaires, which can affect reliability of their answers. A method is proposed for "screening" of systematic change in random error, which could assess changed reliability of answers. A simulation study was conducted to explore whether systematic change in reliability, expressed as changed random error, could be assessed using unsupervised classification of subjects by cluster analysis (CA) and estimation of intraclass correlation coefficient (ICC). The method was also applied on a clinical dataset from 753 cardiac patients using the Jalowiec Coping Scale. The simulation study showed a relationship between the systematic change in random error throughout a questionnaire and the slope between the estimated ICC for subjects classified by CA and successive items in a questionnaire. This slope was proposed as an awareness measure--to assessing if respondents provide only a random answer or one based on a substantial cognitive effort. Scales from different factor structures of Jalowiec Coping Scale had different effect on this awareness measure. Even though assumptions in the simulation study might be limited compared to real datasets, the approach is promising for assessing systematic change in reliability throughout long questionnaires. Results from a clinical dataset indicated that the awareness measure differed between scales.
Yildirim, Yücel; Ergin, Gülbin
2013-01-01
Fatigue is primarily a subjective experience and self-report is the most common approach used to measure fatigue. Numerous self-report instruments have been developed to measure fatigue. Unfortunately, each of these measures was tailored for the situation in which fatigue was studied. Therefore, the aim of this study was to determine the reliability and validity of the Turkish language version of the Multidimensional Assessment of Fatigue Scale (MAF-T) in chronic musculoskeletal physical therapy patients. The MAF-T was supplied by the MAPI Research Institute, and 69 chronic musculoskeletal physical therapy patients were evaluated. To validate MAF-T, all participants completed the MAF-T and Short Form-36 (SF-36). The MAF was administered again one week later to assess test-retest reliability. Using Cronbach α, the internal consistency reliability of the MAF-T was 0.90, the Intraclass Correlation Coefficient (ICC) reliability was 0.96. Item-discriminant validity was calculated between r=0.14 and r=0.82. The correlations between the total scores of the MAF-T scale and the subscale scores of SF-36 were negative and significant (p< 0.01). The MAF-T is a valid and reliable scale for assessing fatigue in chronic musculoskeletal physical therapy patients.
Wise, Frances M; Harris, Darren W; Olver, John H
2017-01-01
Considerable research has been undertaken in evaluating the DASS-21 in a variety of clinical populations, but studies of the instrument's psychometric adequacy in healthcare professionals is lacking. This study aimed to establish and improve the construct validity and reliability of the DASS-21 in a cohort of Australian health professionals. 343 rehabilitation health professionals completed the DASS-21, along with a demographic questionnaire. Principal components analysis was performed to identify potential factors in the DASS-21. Factors were interpreted against theoretical constructs underlying the instrument. Items loading on separate factors were then subjected to reliability analysis to determine internal consistency of subscales. Items that demonstrated poor fit, or loaded onto more than one factor, were deleted to maximise the reliability of each subscale. Principal components analysis identified three dimensions (depression, anxiety, stress) in a modified version of the DASS-21 (renamed DASS-14), with appropriate construct validity and good reliability (a=0.73 to 0.88). The three dimensions accounted for over 62% of variance between items. The modified DASS-14 scale is a more parsimonious measure of depression, anxiety, and stress, with acceptable reliability and construct validity, in rehabilitation health professionals and is appropriate for use in studies of similar populations.
Dichter, Martin Nikolaus; Schwab, Christian G G; Meyer, Gabriele; Bartholomeyczik, Sabine; Halek, Margareta
2016-02-01
For people with dementia, the concept of quality of life (Qol) reflects the disease's impact on the whole person. Thus, Qol is an increasingly used outcome measure in dementia research. This systematic review was performed to identify available dementia-specific Qol measurements and to assess the quality of linguistic validations and reliability studies of these measurements (PROSPERO 2013: CRD42014008725). The MEDLINE, CINAHL, EMBASE, PsycINFO, and Cochrane Methodology Register databases were systematically searched without any date restrictions. Forward and backward citation tracking were performed on the basis of selected articles. A total of 70 articles addressing 19 dementia-specific Qol measurements were identified; nine measurements were adapted to nonorigin countries. The quality of the linguistic validations varied from insufficient to good. Internal consistency was the most frequently tested reliability property. Most of the reliability studies lacked internal validity. Qol measurements for dementia are insufficiently linguistic validated and not well tested for reliability. None of the identified measurements can be recommended without further research. The application of international guidelines and quality criteria is strongly recommended for the performance of linguistic validations and reliability studies of dementia-specific Qol measurements. Copyright © 2016 Elsevier Inc. All rights reserved.
Anderson, Donald D; Segal, Neil A; Kern, Andrew M; Nevitt, Michael C; Torner, James C; Lynch, John A
2012-01-01
Recent findings suggest that contact stress is a potent predictor of subsequent symptomatic osteoarthritis development in the knee. However, much larger numbers of knees (likely on the order of hundreds, if not thousands) need to be reliably analyzed to achieve the statistical power necessary to clarify this relationship. This study assessed the reliability of new semiautomated computational methods for estimating contact stress in knees from large population-based cohorts. Ten knees of subjects from the Multicenter Osteoarthritis Study were included. Bone surfaces were manually segmented from sequential 1.0 Tesla magnetic resonance imaging slices by three individuals on two nonconsecutive days. Four individuals then registered the resulting bone surfaces to corresponding bone edges on weight-bearing radiographs, using a semi-automated algorithm. Discrete element analysis methods were used to estimate contact stress distributions for each knee. Segmentation and registration reliabilities (day-to-day and interrater) for peak and mean medial and lateral tibiofemoral contact stress were assessed with Shrout-Fleiss intraclass correlation coefficients (ICCs). The segmentation and registration steps of the modeling approach were found to have excellent day-to-day (ICC 0.93-0.99) and good inter-rater reliability (0.84-0.97). This approach for estimating compartment-specific tibiofemoral contact stress appears to be sufficiently reliable for use in large population-based cohorts.
Morris, Roisin; MacNeela, Padraig; Scott, Anne; Treacy, Pearl; Hyde, Abbey; O'Brien, Julian; Lehwaldt, Daniella; Byrne, Anne; Drennan, Jonathan
2008-04-01
In a study to establish the interrater reliability of the Irish Nursing Minimum Data Set (I-NMDS) for mental health difficulties relating to the choice of reliability test statistic were encountered. The objective of this paper is to highlight the difficulties associated with testing interrater reliability for an ordinal scale using a relatively homogenous sample and the recommended kw statistic. One pair of mental health nurses completed the I-NMDS for mental health for a total of 30 clients attending a mental health day centre over a two-week period. Data was analysed using the kw and percentage agreement statistics. A total of 34 of the 38 I-NMDS for mental health variables with lower than acceptable levels of kw reliability scores achieved acceptable levels of reliability according to their percentage agreement scores. The study findings implied that, due to the homogeneity of the sample, low variability within the data resulted in the 'base rate problem' associated with the use of kw statistic. Conclusions point to the interpretation of kw in tandem with percentage agreement scores. Suggestions that kw scores were low due to chance agreement and that one should strive to use a study sample with known variability are queried.
Day-to-day reliability of gait characteristics in rats.
Raffalt, Peter C; Nielsen, Louise R; Madsen, Stefan; Munk Højberg, Laurits; Pingel, Jessica; Nielsen, Jens Bo; Wienecke, Jacob; Alkjær, Tine
2018-04-27
The purpose of the present study was to determine the day-to-day reliability in stride characteristics in rats during treadmill walking obtained with two-dimensional (2D) motion capture. Kinematics were recorded from 26 adult rats during walking at 8 m/min, 12 m/min and 16 m/min on two separate days. Stride length, stride time, contact time, swing time and hip, knee and ankle joint range of motion were extracted from 15 strides. The relative reliability was assessed using intra-class correlation coefficients (ICC(1,1)) and (ICC(3,1)). The absolute reliability was determined using measurement error (ME). Across walking speeds, the relative reliability ranged from fair to good (ICCs between 0.4 and 0.75). The ME was below 91 mm for strides lengths, below 55 ms for the temporal stride variables and below 6.4° for the joint angle range of motion. In general, the results indicated an acceptable day-to-day reliability of the gait pattern parameters observed in rats during treadmill walking. The results of the present study may serve as a reference material that can help future intervention studies on rat gait characteristics both with respect to the selection of outcome measures and in the interpretation of the results. Copyright © 2018 Elsevier Ltd. All rights reserved.