Schmidt, Frank L; Le, Huy; Ilies, Remus
2003-06-01
On the basis of an empirical study of measures of constructs from the cognitive domain, the personality domain, and the domain of affective traits, the authors of this study examine the implications of transient measurement error for the measurement of frequently studied individual differences variables. The authors clarify relevant reliability concepts as they relate to transient error and present a procedure for estimating the coefficient of equivalence and stability (L. J. Cronbach, 1947), the only classical reliability coefficient that assesses all 3 major sources of measurement error (random response, transient, and specific factor errors). The authors conclude that transient error exists in all 3 trait domains and is especially large in the domain of affective traits. Their findings indicate that the nearly universal use of the coefficient of equivalence (Cronbach's alpha; L. J. Cronbach, 1951), which fails to assess transient error, leads to overestimates of reliability and undercorrections for biases due to measurement error.
ERIC Educational Resources Information Center
Schumacker, Randall E.; Smith, Everett V., Jr.
2007-01-01
Measurement error is a common theme in classical measurement models used in testing and assessment. In classical measurement models, the definition of measurement error and the subsequent reliability coefficients differ on the basis of the test administration design. Internal consistency reliability specifies error due primarily to poor item…
Maassen, Gerard H
2010-08-01
In this Journal, Lewis and colleagues introduced a new Reliable Change Index (RCI(WSD)), which incorporated the within-subject standard deviation (WSD) of a repeated measurement design as the standard error. In this note, two opposite errors in using WSD this way are demonstrated. First, being the standard error of measurement of only a single assessment makes WSD too small when practice effects are absent. Then, too many individuals will be designated reliably changed. Second, WSD can grow unlimitedly to the extent that differential practice effects occur. This can even make RCI(WSD) unable to detect any reliable change.
Pruitt, Sandi L; Jeffe, Donna B; Yan, Yan; Schootman, Mario
2012-04-01
Limited psychometric research has examined the reliability of self-reported measures of neighbourhood conditions, the effect of measurement error on associations between neighbourhood conditions and health, and potential differences in the reliabilities between neighbourhood strata (urban vs rural and low vs high poverty). We assessed overall and stratified reliability of self-reported perceived neighbourhood conditions using five scales (social and physical disorder, social control, social cohesion, fear) and four single items (multidimensional neighbouring). We also assessed measurement error-corrected associations of these conditions with self-rated health. Using random-digit dialling, 367 women without breast cancer (matched controls from a larger study) were interviewed twice, 2-3 weeks apart. Test-retest (intraclass correlation coefficients (ICC)/weighted κ) and internal consistency reliability (Cronbach's α) were assessed. Differences in reliability across neighbourhood strata were tested using bootstrap methods. Regression calibration corrected estimates for measurement error. All measures demonstrated satisfactory internal consistency (α ≥ 0.70) and either moderate (ICC/κ=0.41-0.60) or substantial (ICC/κ=0.61-0.80) test-retest reliability in the full sample. Internal consistency did not differ by neighbourhood strata. Test-retest reliability was significantly lower among rural (vs urban) residents for two scales (social control, physical disorder) and two multidimensional neighbouring items; test-retest reliability was higher for physical disorder and lower for one multidimensional neighbouring item among the high (vs low) poverty strata. After measurement error correction, the magnitude of associations between neighbourhood conditions and self-rated health were larger, particularly in the rural population. Research is needed to develop and test reliable measures of perceived neighbourhood conditions relevant to the health of rural populations.
Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.
2012-01-01
The purpose of this article is to help researchers avoid common pitfalls associated with reliability including incorrectly assuming that (a) measurement error always attenuates observed score correlations, (b) different sources of measurement error originate from the same source, and (c) reliability is a function of instrumentation. To accomplish our purpose, we first describe what reliability is and why researchers should care about it with focus on its impact on effect sizes. Second, we review how reliability is assessed with comment on the consequences of cumulative measurement error. Third, we consider how researchers can use reliability generalization as a prescriptive method when designing their research studies to form hypotheses about whether or not reliability estimates will be acceptable given their sample and testing conditions. Finally, we discuss options that researchers may consider when faced with analyzing unreliable data. PMID:22518107
Reliability of a Longitudinal Sequence of Scale Ratings
ERIC Educational Resources Information Center
Laenen, Annouschka; Alonso, Ariel; Molenberghs, Geert; Vangeneugden, Tony
2009-01-01
Reliability captures the influence of error on a measurement and, in the classical setting, is defined as one minus the ratio of the error variance to the total variance. Laenen, Alonso, and Molenberghs ("Psychometrika" 73:443-448, 2007) proposed an axiomatic definition of reliability and introduced the R[subscript T] coefficient, a measure of…
Hanson, Lisa C; Taylor, Nicholas F; McBurney, Helen
2016-09-01
To determine the retest reliability of the 10m incremental shuttle walk test (ISWT) in a mixed cardiac rehabilitation population. Participants completed two 10m ISWTs in a single session in a repeated measures study. Ten participants completed a third 10m ISWT as part of a pilot study. Hospital physiotherapy department. 62 adults aged a mean of 68 years (SD 10) referred to a cardiac rehabilitation program. Retest reliability of the 10m ISWT expressed as relative reliability and measurement error. Relative reliability was expressed in a ratio in the form of an intraclass correlation coefficient (ICC) and measurement error in the form of the standard error of measurement (SEM) and 95% confidence intervals for the group and individual. There was a high level of relative reliability over the two walks with an ICC of .99. The SEMagreement was 17m, and a change of at least 23m for the group and 54m for the individual would be required to be 95% confident of exceeding measurement error. The 10m ISWT demonstrated good retest reliability and is sufficiently reliable to be applied in practice in this population without the use of a practice test. Copyright © 2015 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
Mieritz, Rune M; Bronfort, Gert; Jakobsen, Markus D; Aagaard, Per; Hartvigsen, Jan
2014-09-01
A basic premise for any instrument measuring spinal motion is that reliable outcomes can be obtained on a relevant sample under standardized conditions. The purpose of this study was to assess the overall reliability and measurement error of regional spinal sagittal plane motion in patients with chronic low back pain (LBP), and then to evaluate the influence of body mass index, examiner, gender, stability of pain, and pain distribution on reliability and measurement error. This study comprises a test-retest design separated by 7 to 14 days. The patient cohort consisted of 220 individuals with chronic LBP. Kinematics of the lumbar spine were sampled during standardized spinal extension-flexion testing using a 6-df instrumented spatial linkage system. Test-retest reliability and measurement error were evaluated using interclass correlation coefficients (ICC(1,1)) and Bland-Altman limits of agreement (LOAs). The overall test-retest reliability (ICC(1,1)) for various motion parameters ranged from 0.51 to 0.70, and relatively wide LOAs were observed for all parameters. Reliability measures in patient subgroups (ICC(1,1)) ranged between 0.34 and 0.77. In general, greater (ICC(1,1)) coefficients and smaller LOAs were found in subgroups with patients examined by the same examiner, patients with a stable pain level, patients with a body mass index less than below 30 kg/m(2), patients who were men, and patients in the Quebec Task Force classifications Group 1. This study shows that sagittal plane kinematic data from patients with chronic LBP may be sufficiently reliable in measurements of groups of patients. However, because of the large LOAs, this test procedure appears unusable at the individual patient level. Furthermore, reliability and measurement error varies substantially among subgroups of patients. Copyright © 2014 Elsevier Inc. All rights reserved.
Reliable absolute analog code retrieval approach for 3D measurement
NASA Astrophysics Data System (ADS)
Yu, Shuang; Zhang, Jing; Yu, Xiaoyang; Sun, Xiaoming; Wu, Haibin; Chen, Deyun
2017-11-01
The wrapped phase of phase-shifting approach can be unwrapped by using Gray code, but both the wrapped phase error and Gray code decoding error can result in period jump error, which will lead to gross measurement error. Therefore, this paper presents a reliable absolute analog code retrieval approach. The combination of unequal-period Gray code and phase shifting patterns at high frequencies are used to obtain high-frequency absolute analog code, and at low frequencies, the same unequal-period combination patterns are used to obtain the low-frequency absolute analog code. Next, the difference between the two absolute analog codes was employed to eliminate period jump errors, and a reliable unwrapped result can be obtained. Error analysis was used to determine the applicable conditions, and this approach was verified through theoretical analysis. The proposed approach was further verified experimentally. Theoretical analysis and experimental results demonstrate that the proposed approach can perform reliable analog code unwrapping.
Reliability of anthropometric measurements in European preschool children: the ToyBox-study.
De Miguel-Etayo, P; Mesana, M I; Cardon, G; De Bourdeaudhuij, I; Góźdź, M; Socha, P; Lateva, M; Iotova, V; Koletzko, B V; Duvinage, K; Androutsos, O; Manios, Y; Moreno, L A
2014-08-01
The ToyBox-study aims to develop and test an innovative and evidence-based obesity prevention programme for preschoolers in six European countries: Belgium, Bulgaria, Germany, Greece, Poland and Spain. In multicentre studies, anthropometric measurements using standardized procedures that minimize errors in the data collection are essential to maximize reliability of measurements. The aim of this paper is to describe the standardization process and reliability (intra- and inter-observer) of height, weight and waist circumference (WC) measurements in preschoolers. All technical procedures and devices were standardized and centralized training was given to the fieldworkers. At least seven children per country participated in the intra- and inter-observer reliability testing. Intra-observer technical error ranged from 0.00 to 0.03 kg for weight and from 0.07 to 0.20 cm for height, with the overall reliability being above 99%. A second training was organized for WC due to low reliability observed in the first training. Intra-observer technical error for WC ranged from 0.12 to 0.71 cm during the first training and from 0.05 to 1.11 cm during the second training, and reliability above 92% was achieved. Epidemiological surveys need standardized procedures and training of researchers to reduce measurement error. In the ToyBox-study, very good intra- and-inter-observer agreement was achieved for all anthropometric measurements performed. © 2014 World Obesity.
A method of bias correction for maximal reliability with dichotomous measures.
Penev, Spiridon; Raykov, Tenko
2010-02-01
This paper is concerned with the reliability of weighted combinations of a given set of dichotomous measures. Maximal reliability for such measures has been discussed in the past, but the pertinent estimator exhibits a considerable bias and mean squared error for moderate sample sizes. We examine this bias, propose a procedure for bias correction, and develop a more accurate asymptotic confidence interval for the resulting estimator. In most empirically relevant cases, the bias correction and mean squared error correction can be performed simultaneously. We propose an approximate (asymptotic) confidence interval for the maximal reliability coefficient, discuss the implementation of this estimator, and investigate the mean squared error of the associated asymptotic approximation. We illustrate the proposed methods using a numerical example.
Intraobserver reliability of contact pachymetry in children.
Weise, Katherine K; Kaminski, Brett; Melia, Michele; Repka, Michael X; Bradfield, Yasmin S; Davitt, Bradley V; Johnson, David A; Kraker, Raymond T; Manny, Ruth E; Matta, Noelle S; Schloff, Susan
2013-04-01
Central corneal thickness (CCT) is an important measurement in the treatment and management of pediatric glaucoma and potentially of refractive error, but data regarding reliability of CCT measurement in children are limited. The purpose of this study was to evaluate the reliability of CCT measurement with the use of handheld contact pachymetry in children. We conducted a multicenter intraobserver test-retest reliability study of more than 3,400 healthy eyes in children aged from newborn to 17 years by using a handheld contact pachymeter (Pachmate DGH55; DGH Technology Inc, Exton, PA) in 2 clinical settings--with the use of topical anesthesia in the office and with the patient under general anesthesia in a surgical facility. The overall standard error of measurement, including only measurements with standard deviation ≤5 μm, was 8 μm; the corresponding coefficient of repeatability, or limits within which 95% of test-retest differences fell, was ±22.3 μm. However, standard error of measurement increased as CCT increased, from 6.8 μm for CCT less than 525 μm, to 12.9 μm for CCT 625 μm and greater. The standard error of measurement including measurements with standard deviation >5 μm was 10.5 μm. Age, sex, race/ethnicity group, and examination setting did not influence the magnitude of test-retest differences. CCT measurement reliability in children via the Pachmate DGH55 handheld contact pachymeter is similar to that reported for adults. Because thicker CCT measurements are less reliable than thinner measurements, a second measure may be helpful when the first exceeds 575 μm. Reliability is also improved by disregarding measurements with instrument-reported standard deviations >5 μm. Copyright © 2013 American Association for Pediatric Ophthalmology and Strabismus. Published by Mosby, Inc. All rights reserved.
ERIC Educational Resources Information Center
Raykov, Tenko; Penev, Spiridon
2006-01-01
Unlike a substantial part of reliability literature in the past, this article is concerned with weighted combinations of a given set of congeneric measures with uncorrelated errors. The relationship between maximal coefficient alpha and maximal reliability for such composites is initially dealt with, and it is shown that the former is a lower…
ERIC Educational Resources Information Center
Schretlen, David; And Others
1994-01-01
Composite reliability and standard errors of measurement were computed for prorated Verbal, Performance, and Full-Scale intelligence quotient (IQ) scores from a seven-subtest short form of the Wechsler Adult Intelligence Scale-Revised. Results with 1,880 adults (standardization sample) indicate that this form is as reliable as the complete test.…
Acute Respiratory Distress Syndrome Measurement Error. Potential Effect on Clinical Study Results
Cooke, Colin R.; Iwashyna, Theodore J.; Hofer, Timothy P.
2016-01-01
Rationale: Identifying patients with acute respiratory distress syndrome (ARDS) is a recognized challenge. Experts often have only moderate agreement when applying the clinical definition of ARDS to patients. However, no study has fully examined the implications of low reliability measurement of ARDS on clinical studies. Objectives: To investigate how the degree of variability in ARDS measurement commonly reported in clinical studies affects study power, the accuracy of treatment effect estimates, and the measured strength of risk factor associations. Methods: We examined the effect of ARDS measurement error in randomized clinical trials (RCTs) of ARDS-specific treatments and cohort studies using simulations. We varied the reliability of ARDS diagnosis, quantified as the interobserver reliability (κ-statistic) between two reviewers. In RCT simulations, patients identified as having ARDS were enrolled, and when measurement error was present, patients without ARDS could be enrolled. In cohort studies, risk factors as potential predictors were analyzed using reviewer-identified ARDS as the outcome variable. Measurements and Main Results: Lower reliability measurement of ARDS during patient enrollment in RCTs seriously degraded study power. Holding effect size constant, the sample size necessary to attain adequate statistical power increased by more than 50% as reliability declined, although the result was sensitive to ARDS prevalence. In a 1,400-patient clinical trial, the sample size necessary to maintain similar statistical power increased to over 1,900 when reliability declined from perfect to substantial (κ = 0.72). Lower reliability measurement diminished the apparent effectiveness of an ARDS-specific treatment from a 15.2% (95% confidence interval, 9.4–20.9%) absolute risk reduction in mortality to 10.9% (95% confidence interval, 4.7–16.2%) when reliability declined to moderate (κ = 0.51). In cohort studies, the effect on risk factor associations was similar. Conclusions: ARDS measurement error can seriously degrade statistical power and effect size estimates of clinical studies. The reliability of ARDS measurement warrants careful attention in future ARDS clinical studies. PMID:27159648
Reliability and Validity Assessment of a Linear Position Transducer
Garnacho-Castaño, Manuel V.; López-Lastra, Silvia; Maté-Muñoz, José L.
2015-01-01
The objectives of the study were to determine the validity and reliability of peak velocity (PV), average velocity (AV), peak power (PP) and average power (AP) measurements were made using a linear position transducer. Validity was assessed by comparing measurements simultaneously obtained using the Tendo Weightlifting Analyzer Systemi and T-Force Dynamic Measurement Systemr (Ergotech, Murcia, Spain) during two resistance exercises, bench press (BP) and full back squat (BS), performed by 71 trained male subjects. For the reliability study, a further 32 men completed both lifts using the Tendo Weightlifting Analyzer Systemz in two identical testing sessions one week apart (session 1 vs. session 2). Intraclass correlation coefficients (ICCs) indicating the validity of the Tendo Weightlifting Analyzer Systemi were high, with values ranging from 0.853 to 0.989. Systematic biases and random errors were low to moderate for almost all variables, being higher in the case of PP (bias ±157.56 W; error ±131.84 W). Proportional biases were identified for almost all variables. Test-retest reliability was strong with ICCs ranging from 0.922 to 0.988. Reliability results also showed minimal systematic biases and random errors, which were only significant for PP (bias -19.19 W; error ±67.57 W). Only PV recorded in the BS showed no significant proportional bias. The Tendo Weightlifting Analyzer Systemi emerged as a reliable system for measuring movement velocity and estimating power in resistance exercises. The low biases and random errors observed here (mainly AV, AP) make this device a useful tool for monitoring resistance training. Key points This study determined the validity and reliability of peak velocity, average velocity, peak power and average power measurements made using a linear position transducer The Tendo Weight-lifting Analyzer Systemi emerged as a reliable system for measuring movement velocity and power. PMID:25729300
Gómez-Cabello, Alba; Vicente-Rodríguez, Germán; Albers, Ulrike; Mata, Esmeralda; Rodriguez-Marroyo, Jose A.; Olivares, Pedro R.; Gusi, Narcis; Villa, Gerardo; Aznar, Susana; Gonzalez-Gross, Marcela; Casajús, Jose A.; Ara, Ignacio
2012-01-01
Background The elderly EXERNET multi-centre study aims to collect normative anthropometric data for old functionally independent adults living in Spain. Purpose To describe the standardization process and reliability of the anthropometric measurements carried out in the pilot study and during the final workshop, examining both intra- and inter-rater errors for measurements. Materials and Methods A total of 98 elderly from five different regions participated in the intra-rater error assessment, and 10 different seniors living in the city of Toledo (Spain) participated in the inter-rater assessment. We examined both intra- and inter-rater errors for heights and circumferences. Results For height, intra-rater technical errors of measurement (TEMs) were smaller than 0.25 cm. For circumferences and knee height, TEMs were smaller than 1 cm, except for waist circumference in the city of Cáceres. Reliability for heights and circumferences was greater than 98% in all cases. Inter-rater TEMs were 0.61 cm for height, 0.75 cm for knee-height and ranged between 2.70 and 3.09 cm for the circumferences measured. Inter-rater reliabilities for anthropometric measurements were always higher than 90%. Conclusion The harmonization process, including the workshop and pilot study, guarantee the quality of the anthropometric measurements in the elderly EXERNET multi-centre study. High reliability and low TEM may be expected when assessing anthropometry in elderly population. PMID:22860013
ERIC Educational Resources Information Center
Nicewander, W. Alan
2018-01-01
Spearman's correction for attenuation (measurement error) corrects a correlation coefficient for measurement errors in either-or-both of two variables, and follows from the assumptions of classical test theory. Spearman's equation removes all measurement error from a correlation coefficient which translates into "increasing the reliability of…
Superior model for fault tolerance computation in designing nano-sized circuit systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, N. S. S., E-mail: narinderjit@petronas.com.my; Muthuvalu, M. S., E-mail: msmuthuvalu@gmail.com; Asirvadam, V. S., E-mail: vijanth-sagayan@petronas.com.my
2014-10-24
As CMOS technology scales nano-metrically, reliability turns out to be a decisive subject in the design methodology of nano-sized circuit systems. As a result, several computational approaches have been developed to compute and evaluate reliability of desired nano-electronic circuits. The process of computing reliability becomes very troublesome and time consuming as the computational complexity build ups with the desired circuit size. Therefore, being able to measure reliability instantly and superiorly is fast becoming necessary in designing modern logic integrated circuits. For this purpose, the paper firstly looks into the development of an automated reliability evaluation tool based on the generalizationmore » of Probabilistic Gate Model (PGM) and Boolean Difference-based Error Calculator (BDEC) models. The Matlab-based tool allows users to significantly speed-up the task of reliability analysis for very large number of nano-electronic circuits. Secondly, by using the developed automated tool, the paper explores into a comparative study involving reliability computation and evaluation by PGM and, BDEC models for different implementations of same functionality circuits. Based on the reliability analysis, BDEC gives exact and transparent reliability measures, but as the complexity of the same functionality circuits with respect to gate error increases, reliability measure by BDEC tends to be lower than the reliability measure by PGM. The lesser reliability measure by BDEC is well explained in this paper using distribution of different signal input patterns overtime for same functionality circuits. Simulation results conclude that the reliability measure by BDEC depends not only on faulty gates but it also depends on circuit topology, probability of input signals being one or zero and also probability of error on signal lines.« less
Pain, Liza A M; Baker, Ross; Sohail, Qazi Zain; Richardson, Denyse; Zabjek, Karl; Mogk, Jeremy P M; Agur, Anne M R
2018-03-23
Altered three-dimensional (3D) joint kinematics can contribute to shoulder pathology, including post-stroke shoulder pain. Reliable assessment methods enable comparative studies between asymptomatic shoulders of healthy subjects and painful shoulders of post-stroke subjects, and could inform treatment planning for post-stroke shoulder pain. The study purpose was to establish intra-rater test-retest reliability and within-subject repeatability of a palpation/digitization protocol, which assesses 3D clavicular/scapular/humeral rotations, in asymptomatic and painful post-stroke shoulders. Repeated measurements of 3D clavicular/scapular/humeral joint/segment rotations were obtained using palpation/digitization in 32 asymptomatic and six painful post-stroke shoulders during four reaching postures (rest/flexion/abduction/external rotation). Intra-class correlation coefficients (ICCs), standard error of the measurement and 95% confidence intervals were calculated. All ICC values indicated high to very high test-retest reliability (≥0.70), with lower reliability for scapular anterior/posterior tilt during external rotation in asymptomatic subjects, and scapular medial/lateral rotation, humeral horizontal abduction/adduction and axial rotation during abduction in post-stroke subjects. All standard error of measurement values demonstrated within-subject repeatability error ≤5° for all clavicular/scapular/humeral joint/segment rotations (asymptomatic ≤3.75°; post-stroke ≤5.0°), except for humeral axial rotation (asymptomatic ≤5°; post-stroke ≤15°). This noninvasive, clinically feasible palpation/digitization protocol was reliable and repeatable in asymptomatic shoulders, and in a smaller sample of painful post-stroke shoulders. Implications for Rehabilitation In the clinical setting, a reliable and repeatable noninvasive method for assessment of three-dimensional (3D) clavicular/scapular/humeral joint orientation and range of motion (ROM) is currently required. The established reliability and repeatability of this proposed palpation/digitization protocol will enable comparative 3D ROM studies between asymptomatic and post-stroke shoulders, which will further inform treatment planning. Intra-rater test-retest repeatability, which is measured by the standard error of the measure, indicates the range of error associated with a single test measure. Therefore, clinicians can use the standard error of the measure to determine the "true" differences between pre-treatment and post-treatment test scores.
Measurement-based reliability/performability models
NASA Technical Reports Server (NTRS)
Hsueh, Mei-Chen
1987-01-01
Measurement-based models based on real error-data collected on a multiprocessor system are described. Model development from the raw error-data to the estimation of cumulative reward is also described. A workload/reliability model is developed based on low-level error and resource usage data collected on an IBM 3081 system during its normal operation in order to evaluate the resource usage/error/recovery process in a large mainframe system. Thus, both normal and erroneous behavior of the system are modeled. The results provide an understanding of the different types of errors and recovery processes. The measured data show that the holding times in key operational and error states are not simple exponentials and that a semi-Markov process is necessary to model the system behavior. A sensitivity analysis is performed to investigate the significance of using a semi-Markov process, as opposed to a Markov process, to model the measured system.
The reliability of knee joint position testing using electrogoniometry
Piriyaprasarth, Pagamas; Morris, Meg E; Winter, Adele; Bialocerkowski, Andrea E
2008-01-01
Background The current investigation examined the inter- and intra-tester reliability of knee joint angle measurements using a flexible Penny and Giles Biometric® electrogoniometer. The clinical utility of electrogoniometry was also addressed. Methods The first study examined the inter- and intra-tester reliability of measurements of knee joint angles in supine, sitting and standing in 35 healthy adults. The second study evaluated inter-tester and intra-tester reliability of knee joint angle measurements in standing and after walking 10 metres in 20 healthy adults, using an enhanced measurement protocol with a more detailed electrogoniometer attachment procedure. Both inter-tester reliability studies involved two testers. Results In the first study, inter-tester reliability (ICC[2,10]) ranged from 0.58–0.71 in supine, 0.68–0.79 in sitting and 0.57–0.80 in standing. The standard error of measurement between testers was less than 3.55° and the limits of agreement ranged from -12.51° to 12.21°. Reliability coefficients for intra-tester reliability (ICC[3,10]) ranged from 0.75–0.76 in supine, 0.86–0.87 in sitting and 0.87–0.88 in standing. The standard error of measurement for repeated measures by the same tester was less than 1.7° and the limits of agreement ranged from -8.13° to 7.90°. The second study showed that using a more detailed electrogoniometer attachment protocol reduced the error of measurement between testers to 0.5°. Conclusion Using a standardised protocol, reliable measures of knee joint angles can be gained in standing, supine and sitting by using a flexible goniometer. PMID:18211714
Correcting Coefficient Alpha for Correlated Errors: Is [alpha][K]a Lower Bound to Reliability?
ERIC Educational Resources Information Center
Rae, Gordon
2006-01-01
When errors of measurement are positively correlated, coefficient alpha may overestimate the "true" reliability of a composite. To reduce this inflation bias, Komaroff (1997) has proposed an adjusted alpha coefficient, ak. This article shows that ak is only guaranteed to be a lower bound to reliability if the latter does not include correlated…
Measurement error: Implications for diagnosis and discrepancy models of developmental dyslexia.
Cotton, Sue M; Crewther, David P; Crewther, Sheila G
2005-08-01
The diagnosis of developmental dyslexia (DD) is reliant on a discrepancy between intellectual functioning and reading achievement. Discrepancy-based formulae have frequently been employed to establish the significance of the difference between 'intelligence' and 'actual' reading achievement. These formulae, however, often fail to take into consideration test reliability and the error associated with a single test score. This paper provides an illustration of the potential effects that test reliability and measurement error can have on the diagnosis of dyslexia, with particular reference to discrepancy models. The roles of reliability and standard error of measurement (SEM) in classic test theory are also briefly reviewed. This is followed by illustrations of how SEM and test reliability can aid with the interpretation of a simple discrepancy-based formula of DD. It is proposed that a lack of consideration of test theory in the use of discrepancy-based models of DD can lead to misdiagnosis (both false positives and false negatives). Further, misdiagnosis in research samples affects reproducibility and generalizability of findings. This in turn, may explain current inconsistencies in research on the perceptual, sensory, and motor correlates of dyslexia.
Langarika-Rocafort, Argia; Emparanza, José Ignacio; Aramendi, José F; Castellano, Julen; Calleja-González, Julio
2017-01-01
To examine the intra-observer reliability and agreement between five methods of measurement for dorsiflexion during Weight Bearing Dorsiflexion Lunge Test and to assess the degree of agreement between three methods in female athletes. Repeated measurements study design. Volleyball club. Twenty-five volleyball players. Dorsiflexion was evaluated using five methods: heel-wall distance, first toe-wall distance, inclinometer at tibia, inclinometer at Achilles tendon and the dorsiflexion angle obtained by a simple trigonometric function. For the statistical analysis, agreement was studied using the Bland-Altman method, the Standard Error of Measurement and the Minimum Detectable Change. Reliability analysis was performed using the Intraclass Correlation Coefficient. Measurement methods using the inclinometer had more than 6° of measurement error. The angle calculated by trigonometric function had 3.28° error. The reliability of inclinometer based methods had ICC values < 0.90. Distance based methods and trigonometric angle measurement had an ICC values > 0.90. Concerning the agreement between methods, there was from 1.93° to 14.42° bias, and from 4.24° to 7.96° random error. To assess DF angle in WBLT, the angle calculated by a trigonometric function is the most repeatable method. The methods of measurement cannot be used interchangeably. Copyright © 2016 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Harshman, Jordan; Yezierski, Ellen
2016-01-01
Determining the error of measurement is a necessity for researchers engaged in bench chemistry, chemistry education research (CER), and a multitude of other fields. Discussions regarding what constructs measurement error entails and how to best measure them have occurred, but the critiques about traditional measures have yielded few alternatives.…
Intra-rater reliability of hallux flexor strength measures using the Nintendo Wii Balance Board.
Quek, June; Treleaven, Julia; Brauer, Sandra G; O'Leary, Shaun; Clark, Ross A
2015-01-01
The purpose of this study was to investigate the intra-rater reliability of a new method in combination with the Nintendo Wii Balance Board (NWBB) to measure the strength of hallux flexor muscle. Thirty healthy individuals (age: 34.9 ± 12.9 years, height: 170.4 ± 10.5 cm, weight: 69.3 ± 15.3 kg, female = 15) participated. Repeated testing was completed within 7 days. Participants performed strength testing in sitting using a wooden platform in combination with the NWBB. This new method was set up to selectively recruit an intrinsic muscle of the foot, specifically the flexor hallucis brevis muscle. Statistical analysis was performed using intra-class coefficients and ordinary least product analysis. To estimate measurement error, standard error of measurement (SEM), minimal detectable change (MDC) and percentage error were calculated. Results indicate excellent intra-rater reliability (ICC = 0.982, CI = 0.96-0.99) with an absence of systematic bias. SEM, MDC and percentage error value were 0.5, 1.4 and 12 % respectively. This study demonstrates that a new method in combination with the NWBB application is reliable to measure hallux flexor strength and has potential to be used for future research and clinical application.
Regression dilution bias: tools for correction methods and sample size calculation.
Berglund, Lars
2012-08-01
Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.
Skinner, Ian W; Hübscher, Markus; Moseley, G Lorimer; Lee, Hopin; Wand, Benedict M; Traeger, Adrian C; Gustin, Sylvia M; McAuley, James H
2017-08-15
Eyetracking is commonly used to investigate attentional bias. Although some studies have investigated the internal consistency of eyetracking, data are scarce on the test-retest reliability and agreement of eyetracking to investigate attentional bias. This study reports the test-retest reliability, measurement error, and internal consistency of 12 commonly used outcome measures thought to reflect the different components of attentional bias: overall attention, early attention, and late attention. Healthy participants completed a preferential-looking eyetracking task that involved the presentation of threatening (sensory words, general threat words, and affective words) and nonthreatening words. We used intraclass correlation coefficients (ICCs) to measure test-retest reliability (ICC > .70 indicates adequate reliability). The ICCs(2, 1) ranged from -.31 to .71. Reliability varied according to the outcome measure and threat word category. Sensory words had a lower mean ICC (.08) than either affective words (.32) or general threat words (.29). A longer exposure time was associated with higher test-retest reliability. All of the outcome measures, except second-run dwell time, demonstrated low measurement error (<6%). Most of the outcome measures reported high internal consistency (α > .93). Recommendations are discussed for improving the reliability of eyetracking tasks in future research.
Martín-Rodríguez, Saúl; Loturco, Irineu; Hunter, Angus M; Rodríguez-Ruiz, David; Munguia-Izquierdo, Diego
2017-12-01
Martín-Rodríguez, S, Loturco, I, Hunter, AM, Rodríguez-Ruiz, D, and Munguia-Izquierdo, D. Reliability and measurement error of tensiomyography to assess mechanical muscle function: A systematic review. J Strength Cond Res 31(12): 3524-3536, 2017-Interest in studying mechanical skeletal muscle function through tensiomyography (TMG) has increased in recent years. This systematic review aimed to (a) report the reliability and measurement error of all TMG parameters (i.e., maximum radial displacement of the muscle belly [Dm], contraction time [Tc], delay time [Td], half-relaxation time [½ Tr], and sustained contraction time [Ts]) and (b) to provide critical reflection on how to perform accurate and appropriate measurements for informing clinicians, exercise professionals, and researchers. A comprehensive literature search was performed of the Pubmed, Scopus, Science Direct, and Cochrane databases up to July 2017. Eight studies were included in this systematic review. Meta-analysis could not be performed because of the low quality of the evidence of some studies evaluated. Overall, the review of the 9 studies involving 158 participants revealed high relative reliability (intraclass correlation coefficient [ICC]) for Dm (0.91-0.99); moderate-to-high ICC for Ts (0.80-0.96), Tc (0.70-0.98), and ½ Tr (0.77-0.93); and low-to-high ICC for Td (0.60-0.98), independently of the evaluated muscles. In addition, absolute reliability (coefficient of variation [CV]) was low for all TMG parameters except for ½ Tr (CV = >20%), whereas measurement error indexes were high for this parameter. In conclusion, this study indicates that 3 of the TMG parameters (Dm, Td, and Tc) are highly reliable, whereas ½ Tr demonstrate insufficient reliability, and thus should not be used in future studies.
Rosenblum, Uri; Melzer, Itshak
2017-01-01
About 90% of people with multiple sclerosis (PwMS) have gait instability and 50% fall. Reliable and clinically feasible methods of gait instability assessment are needed. The study investigated the reliability and validity of the Narrow Path Walking Test (NPWT) under single-task (ST) and dual-task (DT) conditions for PwMS. Thirty PwMS performed the NPWT on 2 different occasions, a week apart. Number of Steps, Trial Time, Trial Velocity, Step Length, Number of Step Errors, Number of Cognitive Task Errors, and Number of Balance Losses were measured. Intraclass correlation coefficients (ICC2,1) were calculated from the average values of NPWT parameters. Absolute reliability was quantified from standard error of measurement (SEM) and smallest real difference (SRD). Concurrent validity of NPWT with Functional Reach Test, Four Square Step Test (FSST), 12-item Multiple Sclerosis Walking Scale (MSWS-12), and 2 Minute Walking Test (2MWT) was determined using partial correlations. Intraclass correlation coefficients (ICCs) for most NPWT parameters during ST and DT ranged from 0.46-0.94 and 0.55-0.95, respectively. The highest relative reliability was found for Number of Step Errors (ICC = 0.94 and 0.93, for ST and DT, respectively) and Trial Velocity (ICC = 0.83 and 0.86, for ST and DT, respectively). Absolute reliability was high for Number of Step Errors in ST (SEM % = 19.53%) and DT (SEM % = 18.14%) and low for Trial Velocity in ST (SEM % = 6.88%) and DT (SEM % = 7.29%). Significant correlations for Number of Step Errors and Trial Velocity were found with FSST, MSWS-12, and 2MWT. In persons with PwMS performing the NPWT, Number of Step Errors and Trial Velocity were highly reliable parameters. Based on correlations with other measures of gait instability, Number of Step Errors was the most valid parameter of dynamic balance under the conditions of our test.Video Abstract available for more insights from the authors (see Supplemental Digital Content 1, available at: http://links.lww.com/JNPT/A159).
de Oliveira, Valéria M A; Pitangui, Ana C R; Nascimento, Vinícius Y S; da Silva, Hítalo A; Dos Passos, Muana H P; de Araújo, Rodrigo C
2017-02-01
The Closed Kinetic Chain Upper Extremity Stability Test (CKCUEST) has been proposed as an option to assess upper limb function and stability; however, there are few studies that support the use of this test in adolescents. The purpose of the present study was to investigate the intersession reliability and agreement of three CKCUEST scores in adolescents and establish clinimetric values for this test. Test-retest reliability. Twenty-five healthy adolescents of both sexes were evaluated. The subjects performed two CKCUEST with an interval of one week between the tests. An intraclass correlation coefficient (ICC 3,3 ) two-way mixed model with a 95% interval of confidence was utilized to determine intersession reliability. A Bland-Altman graph was plotted to analyze the agreement between assessments. The presence of systematic error was evaluated by a one-sample t test. The difference between the evaluation and reevaluation was observed using a paired-sample t test. The level of significance was set at 0.05. Standard error of measurements and minimum detectable changes were calculated. The intersession reliability of the average touches score, normalized score, and power score were 0.68, 0.68 and 0.87, the standard error of measurement were 2.17, 1.35 and 6.49, and the minimal detectable change was 6.01, 3.74 and 17.98, respectively. The presence of systematic error (p < 0.014), the significant difference between the measurements (p < 0.05), and the analysis of the Bland-Altman graph infer that CKCUEST is a discordant test with moderate to excellent reliability when used with adolescents. The CKCUEST is a measurement with moderate to excellent reliability for adolescents. 2b.
Habets, Bas; Staal, J Bart; Tijssen, Marsha; van Cingel, Robert
2018-01-10
To determine the intrarater reliability of the Humac NORM isokinetic dynamometer for concentric and eccentric strength tests of knee and shoulder muscles. 54 participants (50% female, average age 20.9 ± 3.1 years) performed concentric and eccentric strength measures of the knee extensors and flexors, and the shoulder internal and external rotators on two different Humac NORM isokinetic dynamometers, which were situated at two different centers. The knee extensors and flexors were tested concentrically at 60° and 180°/s, and eccentrically at 60° s. Concentric strength of the shoulder internal and external rotators, and eccentric strength of the external rotators were measured at 60° and 120°/s. We calculated intraclass correlation coefficients (ICCs), standard error of measurement, standard error of measurement expressed as a %, and the smallest detectable change to determine reliability and measurement error. ICCs for the knee tests ranged from 0.74 to 0.89, whereas ICC values for the shoulder tests ranged from 0.72 to 0.94. Measurement error was highest for the concentric test of the knee extensors and lowest for the concentric test of shoulder external rotators.
Saka, Masayuki; Yamauchi, Hiroki; Hoshi, Kenji; Yoshioka, Toru; Hamada, Hidetoshi; Gamada, Kazuyoshi
2015-05-01
Humeral retroversion is defined as the orientation of the humeral head relative to the distal humerus. Because none of the previous methods used to measure humeral retroversion strictly follow this definition, values obtained by these techniques vary and may be biased by morphologic variations of the humerus. The purpose of this study was 2-fold: to validate a method to define the axis of the distal humerus with a virtual cylinder and to establish the reliability of 3-dimensional (3D) measurement of humeral retroversion by this cylinder fitting method. Humeral retroversion in 14 baseball players (28 humeri) was measured by the 3D cylinder fitting method. The root mean square error was calculated to compare values obtained by a single tester and by 2 different testers using the embedded coordinate system. To establish the reliability, intraclass correlation coefficient (ICC) and precision (standard error of measurement [SEM]) were calculated. The root mean square errors for the humeral coordinate system were <1.0 mm/1.0° for comparison of all translations/rotations obtained by a single tester and <1.0 mm/2.0° for comparison obtained by 2 different testers. Assessment of reliability and precision of the 3D measurement of retroversion yielded an intratester ICC of 0.99 (SEM, 1.0°) and intertester ICC of 0.96 (SEM, 2.8°). The error in measurements obtained by a distal humerus cylinder fitting method was small enough not to affect retroversion measurement. The 3D measurement of retroversion by this method provides excellent intratester and intertester reliability. Copyright © 2015 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
Seligman, Sarah C; Giovannetti, Tania; Sestito, John; Libon, David J
2014-01-01
Mild functional difficulties have been associated with early cognitive decline in older adults and increased risk for conversion to dementia in mild cognitive impairment, but our understanding of this decline has been limited by a dearth of objective methods. This study evaluated the reliability and validity of a new system to code subtle errors on an established performance-based measure of everyday action and described preliminary findings within the context of a theoretical model of action disruption. Here 45 older adults completed the Naturalistic Action Test (NAT) and neuropsychological measures. NAT performance was coded for overt errors, and subtle action difficulties were scored using a novel coding system. An inter-rater reliability coefficient was calculated. Validity of the coding system was assessed using a repeated-measures ANOVA with NAT task (simple versus complex) and error type (overt versus subtle) as within-group factors. Correlation/regression analyses were conducted among overt NAT errors, subtle NAT errors, and neuropsychological variables. The coding of subtle action errors was reliable and valid, and episodic memory breakdown predicted subtle action disruption. Results suggest that the NAT can be useful in objectively assessing subtle functional decline. Treatments targeting episodic memory may be most effective in addressing early functional impairment in older age.
NASA Astrophysics Data System (ADS)
Kim, K.-h.; Oh, T.-s.; Park, K.-r.; Lee, J. H.; Ghim, Y.-c.
2017-11-01
One factor determining the reliability of measurements of electron temperature using a Thomson scattering (TS) system is transmittance of the optical bandpass filters in polychromators. We investigate the system performance as a function of electron temperature to determine reliable range of measurements for a given set of the optical bandpass filters. We show that such a reliability, i.e., both bias and random errors, can be obtained by building a forward model of the KSTAR TS system to generate synthetic TS data with the prescribed electron temperature and density profiles. The prescribed profiles are compared with the estimated ones to quantify both bias and random errors.
The reliability of three devices used for measuring vertical jump height.
Nuzzo, James L; Anning, Jonathan H; Scharfenberg, Jessica M
2011-09-01
The purpose of this investigation was to assess the intrasession and intersession reliability of the Vertec, Just Jump System, and Myotest for measuring countermovement vertical jump (CMJ) height. Forty male and 39 female university students completed 3 maximal-effort CMJs during 2 testing sessions, which were separated by 24-48 hours. The height of the CMJ was measured from all 3 devices simultaneously. Systematic error, relative reliability, absolute reliability, and heteroscedasticity were assessed for each device. Systematic error across the 3 CMJ trials was observed within both sessions for males and females, and this was most frequently observed when the CMJ height was measured by the Vertec. No systematic error was discovered across the 2 testing sessions when the maximum CMJ heights from the 2 sessions were compared. In males, the Myotest demonstrated the best intrasession reliability (intraclass correlation coefficient [ICC] = 0.95; SEM = 1.5 cm; coefficient of variation [CV] = 3.3%) and intersession reliability (ICC = 0.88; SEM = 2.4 cm; CV = 5.3%; limits of agreement = -0.08 ± 4.06 cm). Similarly, in females, the Myotest demonstrated the best intrasession reliability (ICC = 0.91; SEM = 1.4 cm; CV = 4.5%) and intersession reliability (ICC = 0.92; SEM = 1.3 cm; CV = 4.1%; limits of agreement = 0.33 ± 3.53 cm). Additional analysis revealed that heteroscedasticity was present in the CMJ when measured from all 3 devices, indicating that better jumpers demonstrate greater fluctuations in CMJ scores across testing sessions. To attain reliable CMJ height measurements, practitioners are encouraged to familiarize athletes with the CMJ technique and then allow the athletes to complete numerous repetitions until performance plateaus, particularly if the Vertec is being used.
Tucker, Neil; Reid, Duncan; McNair, Peter
2007-01-01
The slump test is a tool to assess the mechanosensitivity of the neuromeningeal structures within the vertebral canal. While some studies have investigated the reliability of aspects of this test within the same day, few have assessed the reliability across days. Therefore, the purpose of this pilot study was to investigate reliability when measuring active knee extension range of motion (AROM) in a modified slump test position within trials on a single day and across days. Ten male and ten female asymptomatic subjects, ages 20-49 (mean age 30.1, SD 6.4) participated in the study. Knee extension AROM in a modified slump position with the cervical spine in a flexed position and then in an extended position was measured via three trials on two separate days. Across three trials, knee extension AROM increased significantly with a mean magnitude of 2 degrees within days for both cervical spine positions (P>0.05). The findings showed that there was no statistically significant difference in knee extension AROM measurements across days (P>0.05). The intraclass correlation coefficients for the mean of the three trials across days were 0.96 (lower limit 95% CI: 0.90) with the cervical spine flexed and 0.93 (lower limit 95% CI: 0.83) with cervical extension. Measurement error was calculated by way of the typical error and 95% limits of agreement, and visually represented in Bland and Altman plots. The typical error for the cervical flexed and extended positions averaged across trials was 2.6 degrees and 3.3 degrees , respectively. The limits of agreement were narrow, and the Bland and Altman plots also showed minimal bias in the joint angles across days with a random distribution of errors across the range of measured angles. This study demonstrated that knee extension AROM could be reliably measured across days in subjects without pathology and that the measurement error was acceptable. Implications of variability over multiple trials are discussed. The modified set-up for the test using the Kincom dynamometer and elevated thigh position may be useful to clinical researchers in determining the mechanosensitivity of the nervous system.
Tucker, Neil; Reid, Duncan; McNair, Peter
2007-01-01
The slump test is a tool to assess the mechanosensitivity of the neuromeningeal structures within the vertebral canal. While some studies have investigated the reliability of aspects of this test within the same day, few have assessed the reliability across days. Therefore, the purpose of this pilot study was to investigate reliability when measuring active knee extension range of motion (AROM) in a modified slump test position within trials on a single day and across days. Ten male and ten female asymptomatic subjects, ages 20–49 (mean age 30.1, SD 6.4) participated in the study. Knee extension AROM in a modified slump position with the cervical spine in a flexed position and then in an extended position was measured via three trials on two separate days. Across three trials, knee extension AROM increased significantly with a mean magnitude of 2° within days for both cervical spine positions (P>0.05). The findings showed that there was no statistically significant difference in knee extension AROM measurements across days (P>0.05). The intraclass correlation coefficients for the mean of the three trials across days were 0.96 (lower limit 95% CI: 0.90) with the cervical spine flexed and 0.93 (lower limit 95% CI: 0.83) with cervical extension. Measurement error was calculated by way of the typical error and 95% limits of agreement, and visually represented in Bland and Altman plots. The typical error for the cervical flexed and extended positions averaged across trials was 2.6° and 3.3°, respectively. The limits of agreement were narrow, and the Bland and Altman plots also showed minimal bias in the joint angles across days with a random distribution of errors across the range of measured angles. This study demonstrated that knee extension AROM could be reliably measured across days in subjects without pathology and that the measurement error was acceptable. Implications of variability over multiple trials are discussed. The modified set-up for the test using the Kincom dynamometer and elevated thigh position may be useful to clinical researchers in determining the mechanosensitivity of the nervous system. PMID:19066666
Lee, Ji-Hyun; Cynn, Heon-Seock; Choi, Woo-Jeong; Jeong, Hyo-Jung; Yoon, Tae-Lim
2016-05-01
The objective of this study was to introduce levator scapulae (LS) measurement using a caliper and the levator scapulae index (LSI) and to investigate intra- and interrater reliability of the LSI in subjects with and without scapular downward rotation syndrome (SDRS). Two raters measured LS length twice in 38 subjects (19 with SDRS and 19 without SDRS). For reliability testing, intraclass correlation coefficients (ICCs), standard error of measurement (SEM), and minimal detectable change (MDC) were calculated. Intrarater reliability analysis resulted with ICCs ranging from 0.94 to 0.98 in subjects with SDRS and 0.96 to 0.98 in subjects without SDRS. These results represented that intrarater reliability in both groups were excellent for measuring LS length with the LSI. Interrater reliability was good (ICC: 0.82) in subjects with SDRS; however, interrater reliability was moderate (ICC: 0.75) in subjects without SDRS. Additionally, SEM and MDC were 0.13% and 0.36% in subjects with SDRS and 0.35% and 0.97% in subjects without SDRS. In subjects with SDRS, low dispersion of the measurement errors and MDC were shown. This study suggested that the LSI is a reliable method to measure LS length and is more reliable for subjects with SDRS. Copyright © 2015 Elsevier Ltd. All rights reserved.
Bragança, Sara; Arezes, Pedro; Carvalho, Miguel; Ashdown, Susan P; Castellucci, Ignacio; Leão, Celina
2018-01-01
Collecting anthropometric data for real-life applications demands a high degree of precision and reliability. It is important to test new equipment that will be used for data collectionOBJECTIVE:Compare two anthropometric data gathering techniques - manual methods and a Kinect-based 3D body scanner - to understand which of them gives more precise and reliable results. The data was collected using a measuring tape and a Kinect-based 3D body scanner. It was evaluated in terms of precision by considering the regular and relative Technical Error of Measurement and in terms of reliability by using the Intraclass Correlation Coefficient, Reliability Coefficient, Standard Error of Measurement and Coefficient of Variation. The results obtained showed that both methods presented better results for reliability than for precision. Both methods showed relatively good results for these two variables, however, manual methods had better results for some body measurements. Despite being considered sufficiently precise and reliable for certain applications (e.g. apparel industry), the 3D scanner tested showed, for almost every anthropometric measurement, a different result than the manual technique. Many companies design their products based on data obtained from 3D scanners, hence, understanding the precision and reliability of the equipment used is essential to obtain feasible results.
Baert, Isabel A C; Lluch, Enrique; Struyf, Thomas; Peeters, Greta; Van Oosterwijck, Sophie; Tuynman, Joanna; Rufai, Salim; Struyf, Filip
2018-06-01
The therapeutic value of proprioceptive-based exercises in knee osteoarthritis (KOA) management warrants investigation of proprioceptive testing methods easily accessible in clinical practice. To estimate inter- and intrarater reliability of the knee joint position sense (KJPS) test and knee force sense (KFS) test in subjects with and without KOA. Cross-sectional test-retest design. Two blinded raters performed independently repeated measures of the KJPS and KFS test, using an analogue inclinometer and handheld dynamometer, respectively, in eight KOA patients (12 symptomatic knees) and 26 healthy controls (52 asymptomatic knees). Intraclass correlation coefficients (ICCs; model 2,1), standard error of measurement (SEM) and minimal detectable change with 95% confidence bounds (MDC 95 ) were calculated. For KJPS, results showed good to excellent test-retest agreement (ICCs 0.70-0.95 in KOA patients; ICCs 0.65-0.85 in healthy controls). A 2° measurement error (SEM 1°) was reported when measuring KJPS in multiple test positions and calculating mean repositioning error. Testing KOA patients pre and post therapy a repositioning error larger than 4° (MDC 95 ) is needed to consider true change. Measuring KFS using handheld dynamometry showed poor to fair interrater and poor to excellent intrarater reliability in subjects with and without KOA. Measuring KJPS in multiple test positions using an analogue inclinometer and calculating mean repositioning error is reliable and can be used in clinical practice. We do not recommend the use of the KFS test to clinicians. Further research is required to establish diagnostic accuracy and validity of our KJPS test in larger knee pain populations. Copyright © 2017 Elsevier Ltd. All rights reserved.
Mehta, Saurabh P; George, Hannah R; Goering, Christian A; Shafer, Danielle R; Koester, Alan; Novotny, Steven
2017-11-01
Clinical measurement study. The push-off test (POT) was recently conceived and found to be reliable and valid for assessing weight bearing through injured wrist or elbow. However, further research with larger sample can lend credence to the preliminary findings supporting the use of the POT. This study examined the interrater reliability, construct validity, and measurement error for the POT in patients with wrist conditions. Participants with musculoskeletal (MSK) wrist conditions were recruited. The performance on the POT, grip isometric strength of wrist extensors was assessed. The shortened version of the Disabilities of the Arm, Shoulder and Hand and numeric pain rating scale were completed. The intraclass correlation coefficient assessed interrater reliability of the POT. Pearson correlation coefficients (r) examined the concurrent relationships between the POT and other measures. The standard error of measurement and the minimal detectable change at 90% confidence interval were assessed as measurement error and index of true change for the POT. A total of 50 participants with different elbow or wrist conditions (age: 48.1 ± 16.6 years) were included in this study. The results of this study strongly supported the interrater reliability (intraclass correlation coefficient: 0.96 and 0.93 for the affected and unaffected sides, respectively) of the POT in patients with wrist MSK conditions. The POT showed convergent relationships with the grip strength on the injured side (r = 0.89) and the wrist extensor strength (r = 0.7). The POT showed smaller standard error of measurement (1.9 kg). The minimal detectable change at 90% confidence interval for the POT was 4.4 kg for the sample. This study provides additional evidence to support the reliability and validity of the POT. This is the first study that provides the values for the measurement error and true change on the POT scores in patients with wrist MSK conditions. Further research should examine the responsiveness and discriminant validity of the POT in patients with wrist conditions. Copyright © 2017 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Ammann-Reiffer, Corinne; Bastiaenen, Caroline H G; de Bie, Rob A; van Hedel, Hubertus J A
2014-08-01
Sound measurement properties of outcome tools are essential when evaluating outcomes of an intervention, in clinical practice and in research. The purpose of this study was to review the evidence on reliability, measurement error, and responsiveness of measures of gait function in children with neuromuscular diagnoses. The MEDLINE, CINAHL, EMBASE, and PsycINFO databases were searched up to June 15, 2012. Studies evaluating reliability, measurement error, or responsiveness of measures of gait function in 1- to 18-year-old children and youth with neuromuscular diagnoses were included. Quality of the studies was independently rated by 2 raters using a modified COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) checklist. Studies with a fair quality rating or better were considered for best evidence synthesis. Regarding the methodological quality, 32 out of 35 reliability studies, all of the 13 measurement error studies, and 5 out of 10 responsiveness studies were of fair or good quality. Best evidence synthesis revealed moderate to strong evidence for reliability for several measures in children and youth with cerebral palsy (CP) but was limited or unknown in other diagnoses. The Functional Mobility Scale (FMS) and the Gross Motor Function Measure (GMFM) dimension E showed limited positive evidence for responsiveness in children with CP, but it was unknown or controversial in other diagnoses. No information was reported on the minimal important change; thus, evidence on measurement error remained undetermined. As studies on validity were not included in the review, a comprehensive appraisal of the best available gait-related outcome measure per diagnosis is not possible. There is moderate to strong evidence on reliability for several measures of gait function in children and youth with CP, whereas evidence on responsiveness exists only for the FMS and the GMFM dimension E. © 2014 American Physical Therapy Association.
Reliability of Total Test Scores When Considered as Ordinal Measurements
ERIC Educational Resources Information Center
Biswas, Ajoy Kumar
2006-01-01
This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…
Impact of Measurement Error on Statistical Power: Review of an Old Paradox.
ERIC Educational Resources Information Center
Williams, Richard H.; And Others
1995-01-01
The paradox that a Student t-test based on pretest-posttest differences can attain its greatest power when the difference score reliability is zero was explained by demonstrating that power is not a mathematical function of reliability unless either true score variance or error score variance is constant. (SLD)
Gajewski, Byron J.; Lee, Robert; Dunton, Nancy
2012-01-01
Data Envelopment Analysis (DEA) is the most commonly used approach for evaluating healthcare efficiency (Hollingsworth, 2008), but a long-standing concern is that DEA assumes that data are measured without error. This is quite unlikely, and DEA and other efficiency analysis techniques may yield biased efficiency estimates if it is not realized (Gajewski, Lee, Bott, Piamjariyakul and Taunton, 2009; Ruggiero, 2004). We propose to address measurement error systematically using a Bayesian method (Bayesian DEA). We will apply Bayesian DEA to data from the National Database of Nursing Quality Indicators® (NDNQI®) to estimate nursing units’ efficiency. Several external reliability studies inform the posterior distribution of the measurement error on the DEA variables. We will discuss the case of generalizing the approach to situations where an external reliability study is not feasible. PMID:23328796
Rathi, Sangeeta; Taylor, Nicholas F; Gee, Jamie; Green, Rodney A
2016-12-01
Ultrasonography is an economical and non-invasive method for measuring real-time joint movements. Although physiotherapists are increasingly using ultrasound imaging for rotator cuff disorders, there is a lack of evidence on their reliability in using ultrasonography to measure glenohumeral translation. The aim of this study was to evaluate the reliability of a physiotherapist in measuring anterior and posterior glenohumeral joint translation with ultrasound. Study design: within day reliability. Anterior and posterior glenohumeral translations were measured at rest, in response to passive accessory motion testing force, and with isometric internal and external rotation in 12 young healthy adults. All the measurements were made in real time by a physiotherapist and an experienced sonographer in two positions (neutral and abducted) and in two views (anterior and posterior). Intra-rater and inter-rater reliability were expressed using intraclass correlation coefficients (ICC) and measurement error (mm). Intra-rater reliability was good for both raters (ICC P : 0.86-0.98; ICC S : 0.85-0.96). The inter-rater reliability between the physiotherapist and sonographer was moderate to good for posterior measurements (ICC 0.50-0.75) and poor to moderate for anterior measurements (ICC 0.31-0.53). For both intra-rater and inter-rater measurements, posterior translation was more reliable than the anterior translation with smaller measurement errors (posterior: 0.1-0.2 mm, anterior: 0.2-0.3 mm). A physiotherapist with minimal training was reliable in measuring glenohumeral joint translations. The ultrasound method was reliable for repeated measurement of both anterior and posterior glenohumeral translations with posterior measurements being more reliable than anterior. This method is recommended for future research to investigate the stabilising role of rotator cuff muscles. Copyright © 2016 Elsevier Ltd. All rights reserved.
A proposed method to investigate reliability throughout a questionnaire.
Wentzel-Larsen, Tore; Norekvål, Tone M; Ulvik, Bjørg; Nygård, Ottar; Pripp, Are H
2011-10-05
Questionnaires are used extensively in medical and health care research and depend on validity and reliability. However, participants may differ in interest and awareness throughout long questionnaires, which can affect reliability of their answers. A method is proposed for "screening" of systematic change in random error, which could assess changed reliability of answers. A simulation study was conducted to explore whether systematic change in reliability, expressed as changed random error, could be assessed using unsupervised classification of subjects by cluster analysis (CA) and estimation of intraclass correlation coefficient (ICC). The method was also applied on a clinical dataset from 753 cardiac patients using the Jalowiec Coping Scale. The simulation study showed a relationship between the systematic change in random error throughout a questionnaire and the slope between the estimated ICC for subjects classified by CA and successive items in a questionnaire. This slope was proposed as an awareness measure--to assessing if respondents provide only a random answer or one based on a substantial cognitive effort. Scales from different factor structures of Jalowiec Coping Scale had different effect on this awareness measure. Even though assumptions in the simulation study might be limited compared to real datasets, the approach is promising for assessing systematic change in reliability throughout long questionnaires. Results from a clinical dataset indicated that the awareness measure differed between scales.
Leach, Julia M; Mancini, Martina; Peterka, Robert J; Hayes, Tamara L; Horak, Fay B
2014-09-29
The Nintendo Wii balance board (WBB) has generated significant interest in its application as a postural control measurement device in both the clinical and (basic, clinical, and rehabilitation) research domains. Although the WBB has been proposed as an alternative to the "gold standard" laboratory-grade force plate, additional research is necessary before the WBB can be considered a valid and reliable center of pressure (CoP) measurement device. In this study, we used the WBB and a laboratory-grade AMTI force plate (AFP) to simultaneously measure the CoP displacement of a controlled dynamic load, which has not been done before. A one-dimensional inverted pendulum was displaced at several different displacement angles and load heights to simulate a variety of postural sway amplitudes and frequencies (<1 Hz). Twelve WBBs were tested to address the issue of inter-device variability. There was a significant effect of sway amplitude, frequency, and direction on the WBB's CoP measurement error, with an increase in error as both sway amplitude and frequency increased and a significantly greater error in the mediolateral (ML) (compared to the anteroposterior (AP)) sway direction. There was no difference in error across the 12 WBB's, supporting low inter-device variability. A linear calibration procedure was then implemented to correct the WBB's CoP signals and reduce measurement error. There was a significant effect of calibration on the WBB's CoP signal accuracy, with a significant reduction in CoP measurement error (quantified by root-mean-squared error) from 2-6 mm (before calibration) to 0.5-2 mm (after calibration). WBB-based CoP signal calibration also significantly reduced the percent error in derived (time-domain) CoP sway measures, from -10.5% (before calibration) to -0.05% (after calibration) (percent errors averaged across all sway measures and in both sway directions). In this study, we characterized the WBB's CoP measurement error under controlled, dynamic conditions and implemented a linear calibration procedure for WBB CoP signals that is recommended to reduce CoP measurement error and provide more reliable estimates of time-domain CoP measures. Despite our promising results, additional work is necessary to understand how our findings translate to the clinical and rehabilitation research domains. Once the WBB's CoP measurement error is fully characterized in human postural sway (which differs from our simulated postural sway in both amplitude and frequency content), it may be used to measure CoP displacement in situations where lower accuracy and precision is acceptable.
Leach, Julia M.; Mancini, Martina; Peterka, Robert J.; Hayes, Tamara L.; Horak, Fay B.
2014-01-01
The Nintendo Wii balance board (WBB) has generated significant interest in its application as a postural control measurement device in both the clinical and (basic, clinical, and rehabilitation) research domains. Although the WBB has been proposed as an alternative to the “gold standard” laboratory-grade force plate, additional research is necessary before the WBB can be considered a valid and reliable center of pressure (CoP) measurement device. In this study, we used the WBB and a laboratory-grade AMTI force plate (AFP) to simultaneously measure the CoP displacement of a controlled dynamic load, which has not been done before. A one-dimensional inverted pendulum was displaced at several different displacement angles and load heights to simulate a variety of postural sway amplitudes and frequencies (<1 Hz). Twelve WBBs were tested to address the issue of inter-device variability. There was a significant effect of sway amplitude, frequency, and direction on the WBB's CoP measurement error, with an increase in error as both sway amplitude and frequency increased and a significantly greater error in the mediolateral (ML) (compared to the anteroposterior (AP)) sway direction. There was no difference in error across the 12 WBB's, supporting low inter-device variability. A linear calibration procedure was then implemented to correct the WBB's CoP signals and reduce measurement error. There was a significant effect of calibration on the WBB's CoP signal accuracy, with a significant reduction in CoP measurement error (quantified by root-mean-squared error) from 2–6 mm (before calibration) to 0.5–2 mm (after calibration). WBB-based CoP signal calibration also significantly reduced the percent error in derived (time-domain) CoP sway measures, from −10.5% (before calibration) to −0.05% (after calibration) (percent errors averaged across all sway measures and in both sway directions). In this study, we characterized the WBB's CoP measurement error under controlled, dynamic conditions and implemented a linear calibration procedure for WBB CoP signals that is recommended to reduce CoP measurement error and provide more reliable estimates of time-domain CoP measures. Despite our promising results, additional work is necessary to understand how our findings translate to the clinical and rehabilitation research domains. Once the WBB's CoP measurement error is fully characterized in human postural sway (which differs from our simulated postural sway in both amplitude and frequency content), it may be used to measure CoP displacement in situations where lower accuracy and precision is acceptable. PMID:25268919
Black, Anne C; Serowik, Kristin L; Ablondi, Karen M; Rosen, Marc I
2013-01-01
The need for accurate and reliable information about income and resources available to individuals with psychiatric disabilities is critical for the assessment of need and evaluation of programs designed to alleviate financial hardship or affect finance allocation. Measurement of finances is ubiquitous in studies of economics, poverty, and social services. However, evidence has demonstrated that these measures often contain error. We compare the 1-week test-retest reliability of income and finance data from 24 adult psychiatric outpatients using assessment-as-usual (AAU) and a new instrument, the Timeline Historical Review of Income and Financial Transactions (THRIFT). Reliability estimates obtained with the THRIFT for Income (0.77), Expenses (0.91), and Debt (0.99) domains were significantly better than those obtained with AAU. Reliability estimates for Balance did not differ. THRIFT reduced measurement error and provided more reliable information than AAU for assessment of personal finances in psychiatric patients receiving Social Security benefits. The instrument also may be useful with other low-income groups.
Psychometric Evaluation of the Brachial Assessment Tool Part 1: Reproducibility.
Hill, Bridget; Williams, Gavin; Olver, John; Ferris, Scott; Bialocerkowski, Andrea
2018-04-01
To evaluate reproducibility (reliability and agreement) of the Brachial Assessment Tool (BrAT), a new patient-reported outcome measure for adults with traumatic brachial plexus injury (BPI). Prospective repeated-measure design. Outpatient clinics. Adults with confirmed traumatic BPI (N=43; age range, 19-82y). People with BPI completed the 31-item 4-response BrAT twice, 2 weeks apart. Results for the 3 subscales and summed score were compared at time 1 and time 2 to determine reliability, including systematic differences using paired t tests, test retest using intraclass correlation coefficient model 1,1 (ICC 1,1 ), and internal consistency using Cronbach α. Agreement parameters included standard error of measurement, minimal detectable change, and limits of agreement. BrAT. Test-retest reliability was excellent (ICC 1,1 =.90-.97). Internal consistency was high (Cronbach α=.90-.98). Measurement error was relatively low (standard error of measurement range, 3.1-8.8). A change of >4 for subscale 1, >6 for subscale 2, >4 for subscale 3, and >10 for the summed score is indicative of change over and above measurement error. Limits of agreement ranged from ±4.4 (subscale 3) to 11.61 (summed score). These findings support the use of the BrAT as a reproducible patient-reported outcome measure for adults with traumatic BPI with evidence of appropriate reliability and agreement for both individual and group comparisons. Further psychometric testing is required to establish the construct validity and responsiveness of the BrAT. Copyright © 2017 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
A General Approach for Estimating Scale Score Reliability for Panel Survey Data
ERIC Educational Resources Information Center
Biemer, Paul P.; Christ, Sharon L.; Wiesen, Christopher A.
2009-01-01
Scale score measures are ubiquitous in the psychological literature and can be used as both dependent and independent variables in data analysis. Poor reliability of scale score measures leads to inflated standard errors and/or biased estimates, particularly in multivariate analysis. Reliability estimation is usually an integral step to assess…
3D measurement using combined Gray code and dual-frequency phase-shifting approach
NASA Astrophysics Data System (ADS)
Yu, Shuang; Zhang, Jing; Yu, Xiaoyang; Sun, Xiaoming; Wu, Haibin; Liu, Xin
2018-04-01
The combined Gray code and phase-shifting approach is a commonly used 3D measurement technique. In this technique, an error that equals integer multiples of the phase-shifted fringe period, i.e. period jump error, often exists in the absolute analog code, which can lead to gross measurement errors. To overcome this problem, the present paper proposes 3D measurement using a combined Gray code and dual-frequency phase-shifting approach. Based on 3D measurement using the combined Gray code and phase-shifting approach, one set of low-frequency phase-shifted fringe patterns with an odd-numbered multiple of the original phase-shifted fringe period is added. Thus, the absolute analog code measured value can be obtained by the combined Gray code and phase-shifting approach, and the low-frequency absolute analog code measured value can also be obtained by adding low-frequency phase-shifted fringe patterns. Then, the corrected absolute analog code measured value can be obtained by correcting the former by the latter, and the period jump errors can be eliminated, resulting in reliable analog code unwrapping. For the proposed approach, we established its measurement model, analyzed its measurement principle, expounded the mechanism of eliminating period jump errors by error analysis, and determined its applicable conditions. Theoretical analysis and experimental results show that the proposed approach can effectively eliminate period jump errors, reliably perform analog code unwrapping, and improve the measurement accuracy.
Eechaute, Christophe; Vaes, Peter; Duquet, William; Van Gheluwe, Bart
2007-01-01
Sudden ankle inversion tests have been used to investigate whether the onset of peroneal muscle activity is delayed in patients with chronically unstable ankle joints. Before interpreting test results of latency times in patients with chronic ankle instability and healthy subjects, the reliability of these measures must be first demonstrated. To investigate the test-retest reliability of variables measured during a sudden ankle inversion movement in standing subjects with healthy ankle joints. Validation study. Research laboratory. 15 subjects with healthy ankle joints (30 ankles). Subjects stood on an ankle inversion platform with both feet tightly fixed to independently moveable trapdoors. An unexpected sudden ankle inversion of 50 degrees was imposed. We measured latency and motor response times and electromechanical delay of the peroneus longus muscle, along with the time and angular position of the first and second decelerating moments, the mean and maximum inversion speed, and the total inversion time. Correlation coefficients and standard error of measurements were calculated. Intraclass correlation coefficients ranged from 0.17 for the electromechanical delay of the peroneus longus muscle (standard error of measurement = 2.7 milliseconds) to 0.89 for the maximum inversion speed (standard error of measurement = 34.8 milliseconds). The reliability of the latency and motor response times of the peroneus longus muscle, the time of the first and second decelerating moments, and the mean and maximum inversion speed was acceptable in subjects with healthy ankle joints and supports the investigation of the reliability of these measures in subjects with chronic ankle instability. The lower reliability of the electromechanical delay of the peroneus longus muscle and the angular positions of both decelerating moments calls the use of these variables into question.
Bravo, G; Bragança, S; Arezes, P M; Molenbroek, J F M; Castellucci, H I
2018-05-22
Despite offering many benefits, direct manual anthropometric measurement method can be problematic due to their vulnerability to measurement errors. The purpose of this literature review was to determine, whether or not the currently published anthropometric studies of school children, related to ergonomics, mentioned or evaluated the variables precision, reliability or accuracy in the direct manual measurement method. Two bibliographic databases, and the bibliographic references of all the selected papers were used for finding relevant published papers in the fields considered in this study. Forty-six (46) studies met the criteria previously defined for this literature review. However, only ten (10) studies mentioned at least one of the analyzed variables, and none has evaluated all of them. Only reliability was assessed by three papers. Moreover, in what regards the factors that affect precision, reliability and accuracy, the reviewed papers presented large differences. This was particularly clear in the instruments used for the measurements, which were not consistent throughout the studies. Additionally, it was also clear that there was a lack of information regarding the evaluators' training and procedures for anthropometric data collection, which are assumed to be the most important issues that affect precision, reliability and accuracy. Based on the review of the literature, it was possible to conclude that the considered anthropometric studies had not focused their attention to the analysis of precision, reliability and accuracy of the manual measurement methods. Hence, and with the aim of avoiding measurement errors and misleading data, anthropometric studies should put more efforts and care on testing measurement error and defining the procedures used to collect anthropometric data.
Romero-Franco, Natalia; Montaño-Munuera, Juan Antonio; Fernández-Domínguez, Juan Carlos; Jiménez-Reyes, Pedro
2017-12-18
New methods are being validated to easily evaluate the knee joint position sense (JPS) due to its role in sports movement and the risk of injury. However, no studies to date have considered the open kinetic chain (OKC) technique, despite the biomechanical differences compared to closed kinetic chain movements. To analyze the validity and reliability of a digital inclinometer to measure the knee JPS in the OKC movement. The validity, inter-tester and intra-tester reliability of a digital inclinometer for measuring knee JPS were evaluated. Sports research laboratory. Eighteen athletes (11 males and 7 females; 28.4 ± 6.6 years; 71.9 ± 14.0 kg; 1.77 ± 0.09 m; 22.8 ± 3.2 kg/m 2 ) voluntary participated in this study. Absolute angular error (AAE), relative angular error (RAE) and variable angular error (VAE) of knee JPS in an OKC. Intraclass correlation coefficient (ICC) and standard error of the mean (SEM) were calculated to determine the validity and reliability of the inclinometer. Data showed excellent validity of the inclinometer to obtain proprioceptive errors compared to the video analysis in JPS tasks (AAE: ICC = 0.981, SEM = 0.08; RAE: ICC = 0.974, SEM = 0.12; VAE: ICC = 0.973, SEM = 0.07). Inter-tester reliability was also excellent for all the proprioceptive errors (AAE: ICC = 0.967, SEM = 0.04; RAE: ICC = 0.974, SEM = 0.03; VAE: ICC = 0.939, SEM = 0.08). Similar results were obtained for intra-tester reliability (AAE: ICC = 0.861, SEM = 0.1; RAE: ICC = 0.894, SEM = 0.1; VAE: ICC = 0.700, SEM = 0.2). The digital inclinometer is a valid and reliable method to assess the knee JPS in OKC. Sport professionals may evaluate the knee JPS to monitor its deterioration during training or improvements throughout the rehabilitation process.
Conditional Standard Errors of Measurement for Scale Scores.
ERIC Educational Resources Information Center
Kolen, Michael J.; And Others
1992-01-01
A procedure is described for estimating the reliability and conditional standard errors of measurement of scale scores incorporating the discrete transformation of raw scores to scale scores. The method is illustrated using a strong true score model, and practical applications are described. (SLD)
Alsalaheen, Bara; Haines, Jamie; Yorke, Amy; Broglio, Steven P
2015-12-01
To examine the reliability, convergent, and discriminant validity of the limits of stability (LOS) test to assess dynamic postural stability in adolescents using a portable forceplate system. Cross-sectional reliability observational study. School setting. Adolescents (N=36) completed all measures during the first session. To examine the reliability of the LOS test, a subset of 15 participants repeated the LOS test after 1 week. Not applicable. Outcome measurements included the LOS test, Balance Error Scoring System, Instrumented Balance Error Scoring System, and Modified Clinical Test for Sensory Interaction on Balance. A significant relation was observed among LOS composite scores (r=.36-.87, P<.05). However, no relation was observed between LOS and static balance outcome measurements. The reliability of the LOS composite scores ranged from moderate to good (intraclass correlation coefficient model 2,1=.73-.96). The results suggest that the LOS composite scores provide unique information about dynamic postural stability, and the LOS test completed at 100% of the theoretical limit appeared to be a reliable test of dynamic postural stability in adolescents. Clinicians should use dynamic balance measurement as part of their balance assessment and should not use static balance testing (eg, Balance Error Scoring System) to make inferences about dynamic balance, especially when balance assessment is used to determine rehabilitation outcomes, or when making return to play decisions after injury. Copyright © 2015 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Schweig, Jonathan
2013-01-01
Measuring school and classroom environments has become central in a nation-wide effort to develop comprehensive programs that measure teacher quality and teacher effectiveness. Formulating successful programs necessitates accurate and reliable methods for measuring these environmental variables. This paper uses a generalizability theory framework…
ERIC Educational Resources Information Center
Meyer, J. Patrick; Liu, Xiang; Mashburn, Andrew J.
2014-01-01
Researchers often use generalizability theory to estimate relative error variance and reliability in teaching observation measures. They also use it to plan future studies and design the best possible measurement procedures. However, designing the best possible measurement procedure comes at a cost, and researchers must stay within their budget…
A proposed method to investigate reliability throughout a questionnaire
2011-01-01
Background Questionnaires are used extensively in medical and health care research and depend on validity and reliability. However, participants may differ in interest and awareness throughout long questionnaires, which can affect reliability of their answers. A method is proposed for "screening" of systematic change in random error, which could assess changed reliability of answers. Methods A simulation study was conducted to explore whether systematic change in reliability, expressed as changed random error, could be assessed using unsupervised classification of subjects by cluster analysis (CA) and estimation of intraclass correlation coefficient (ICC). The method was also applied on a clinical dataset from 753 cardiac patients using the Jalowiec Coping Scale. Results The simulation study showed a relationship between the systematic change in random error throughout a questionnaire and the slope between the estimated ICC for subjects classified by CA and successive items in a questionnaire. This slope was proposed as an awareness measure - to assessing if respondents provide only a random answer or one based on a substantial cognitive effort. Scales from different factor structures of Jalowiec Coping Scale had different effect on this awareness measure. Conclusions Even though assumptions in the simulation study might be limited compared to real datasets, the approach is promising for assessing systematic change in reliability throughout long questionnaires. Results from a clinical dataset indicated that the awareness measure differed between scales. PMID:21974842
Tung, Li-Chen; Yu, Wan-Hui; Lin, Gong-Hong; Yu, Tzu-Ying; Wu, Chien-Te; Tsai, Chia-Yin; Chou, Willy; Chen, Mei-Hsiang; Hsieh, Ching-Lin
2016-09-01
To develop a Tablet-based Symbol Digit Modalities Test (T-SDMT) and to examine the test-retest reliability and concurrent validity of the T-SDMT in patients with stroke. The study had two phases. In the first phase, six experts, nine college students and five outpatients participated in the development and testing of the T-SDMT. In the second phase, 52 outpatients were evaluated twice (2 weeks apart) with the T-SDMT and SDMT to examine the test-retest reliability and concurrent validity of the T-SDMT. The T-SDMT was developed via expert input and college student/patient feedback. Regarding test-retest reliability, the practise effects of the T-SDMT and SDMT were both trivial (d=0.12) but significant (p≦0.015). The improvement in the T-SDMT (4.7%) was smaller than that in the SDMT (5.6%). The minimal detectable changes (MDC%) of the T-SDMT and SDMT were 6.7 (22.8%) and 10.3 (32.8%), respectively. The T-SDMT and SDMT were highly correlated with each other at the two time points (Pearson's r=0.90-0.91). The T-SDMT demonstrated good concurrent validity with the SDMT. Because the T-SDMT had a smaller practise effect and less random measurement error (superior test-retest reliability), it is recommended over the SDMT for assessing information processing speed in patients with stroke. Implications for Rehabilitation The Symbol Digit Modalities Test (SDMT), a common measure of information processing speed, showed a substantial practise effect and considerable random measurement error in patients with stroke. The Tablet-based SDMT (T-SDMT) has been developed to reduce the practise effect and random measurement error of the SDMT in patients with stroke. The T-SDMT had smaller practise effect and random measurement error than the SDMT, which can provide more reliable assessments of information processing speed.
Cobb, Stephen C; James, C Roger; Hjertstedt, Matthew; Kruk, James
2011-01-01
Although abnormal foot posture long has been associated with lower extremity injury risk, the evidence is equivocal. Poor intertester reliability of traditional foot measures might contribute to the inconsistency. To investigate the validity and reliability of a digital photographic measurement method (DPMM) technology, the reliability of DPMM-quantified foot measures, and the concurrent validity of the DPMM with clinical-measurement methods (CMMs) and to report descriptive data for DPMM measures with moderate to high intratester and intertester reliability. Descriptive laboratory study. Biomechanics research laboratory. A total of 159 people participated in 3 groups. Twenty-eight people (11 men, 17 women; age = 25 ± 5 years, height = 1.71 ± 0.10 m, mass = 77.6 ± 17.3 kg) were recruited for investigation of intratester and intertester reliability of the DPMM technology; 20 (10 men, 10 women; age = 24 ± 2 years, height = 1.71 ± 0.09 m, mass = 76 ± 16 kg) for investigation of DPMM and CMM reliability and concurrent validity; and 111 (42 men, 69 women; age = 22.8 ± 4.7 years, height = 168.5 ± 10.4 cm, mass = 69.8 ± 13.3 kg) for development of a descriptive data set of the DPMM foot measurements with moderate to high intratester and intertester reliabilities. The dimensions of 10 model rectangles and the 28 participants' feet were measured, and DPMM foot posture was measured in the 111 participants. Two clinicians assessed the DPMM and CMM foot measures of the 20 participants. Validity and reliability were evaluated using mean absolute and percentage errors and intraclass correlation coefficients. Descriptive data were computed from the DPMM foot posture measures. The DPMM technology intratester and intertester reliability intraclass correlation coefficients were 1.0 for each tester and variable. Mean absolute errors were equal to or less than 0.2 mm for the bottom and right-side variables and 0.1° for the calculated angle variable. Mean percentage errors between the DPMM and criterion reference values were equal to or less than 0.4%. Intratester and intertester reliabilities of DPMM-computed structural measures of arch and navicular indices were moderate to high (>0.78), and concurrent validity was moderate to strong. The DPMM is a valid and reliable clinical and research tool for quantifying foot structure. The DPMM and the descriptive data might be used to define groups in future studies in which the relationship between foot posture and function or injury risk is investigated.
Lamadrid-Figueroa, Héctor; Téllez-Rojo, Martha M; Angeles, Gustavo; Hernández-Ávila, Mauricio; Hu, Howard
2011-01-01
In-vivo measurement of bone lead by means of K-X-ray fluorescence (KXRF) is the preferred biological marker of chronic exposure to lead. Unfortunately, considerable measurement error associated with KXRF estimations can introduce bias in estimates of the effect of bone lead when this variable is included as the exposure in a regression model. Estimates of uncertainty reported by the KXRF instrument reflect the variance of the measurement error and, although they can be used to correct the measurement error bias, they are seldom used in epidemiological statistical analyzes. Errors-in-variables regression (EIV) allows for correction of bias caused by measurement error in predictor variables, based on the knowledge of the reliability of such variables. The authors propose a way to obtain reliability coefficients for bone lead measurements from uncertainty data reported by the KXRF instrument and compare, by the use of Monte Carlo simulations, results obtained using EIV regression models vs. those obtained by the standard procedures. Results of the simulations show that Ordinary Least Square (OLS) regression models provide severely biased estimates of effect, and that EIV provides nearly unbiased estimates. Although EIV effect estimates are more imprecise, their mean squared error is much smaller than that of OLS estimates. In conclusion, EIV is a better alternative than OLS to estimate the effect of bone lead when measured by KXRF. Copyright © 2010 Elsevier Inc. All rights reserved.
Measuring human remains in the field: Grid technique, total station, or MicroScribe?
Sládek, Vladimír; Galeta, Patrik; Sosna, Daniel
2012-09-10
Although three-dimensional (3D) coordinates for human intra-skeletal landmarks are among the most important data that anthropologists have to record in the field, little is known about the reliability of various measuring techniques. We compared the reliability of three techniques used for 3D measurement of human remain in the field: grid technique (GT), total station (TS), and MicroScribe (MS). We measured 365 field osteometric points on 12 skeletal sequences excavated at the Late Medieval/Early Modern churchyard in Všeruby, Czech Republic. We compared intra-observer, inter-observer, and inter-technique variation using mean difference (MD), mean absolute difference (MAD), standard deviation of difference (SDD), and limits of agreement (LA). All three measuring techniques can be used when accepted error ranges can be measured in centimeters. When a range of accepted error measurable in millimeters is needed, MS offers the best solution. TS can achieve the same reliability as does MS, but only when the laser beam is accurately pointed into the center of the prism. When the prism is not accurately oriented, TS produces unreliable data. TS is more sensitive to initialization than is MS. GT measures human skeleton with acceptable reliability for general purposes but insufficiently when highly accurate skeletal data are needed. We observed high inter-technique variation, indicating that just one technique should be used when spatial data from one individual are recorded. Subadults are measured with slightly lower error than are adults. The effect of maximum excavated skeletal length has little practical significance in field recording. When MS is not available, we offer practical suggestions that can help to increase reliability when measuring human skeleton in the field. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Test-retest intra-rater reliability of grip force in patients with stroke.
Hammer, Ann; Lindmark, Birgitta
2003-07-01
Coefficients of repeatability and reproducibility can be guides in differentiating between real changes and measurement error. The aim was to evaluate test-retest intra-rater reliability of a clinical procedure measuring grip force with Grippit in stroke patients, to assess relationship between grip force of the hands and between sustained and peak grip force. Eighteen patients were tested using the Grippit at two occasions one hour apart. Each occasion comprised three consecutive trials per hand. The paretic hand needs to score a 50 N change within and between occasions to exceed the measurement error in 95% of the observations, irrespective of calculation method. Expressed by CV(within) the measurement error was 10%. There was no learning or fatigue effect during measuring. There was a wide variation between subjects but the mean ratio between sides was 0.66. The mean ratio between sustained and peak grip force was 0.80-0.84. The measurement errors were acceptable and the instrument can be recommended for the use in stroke patients at a department of rehabilitation medicine.
Psychophysical measurements in children: challenges, pitfalls, and considerations.
Witton, Caroline; Talcott, Joel B; Henning, G Bruce
2017-01-01
Measuring sensory sensitivity is important in studying development and developmental disorders. However, with children, there is a need to balance reliable but lengthy sensory tasks with the child's ability to maintain motivation and vigilance. We used simulations to explore the problems associated with shortening adaptive psychophysical procedures, and suggest how these problems might be addressed. We quantify how adaptive procedures with too few reversals can over-estimate thresholds, introduce substantial measurement error, and make estimates of individual thresholds less reliable. The associated measurement error also obscures group differences. Adaptive procedures with children should therefore use as many reversals as possible, to reduce the effects of both Type 1 and Type 2 errors. Differences in response consistency, resulting from lapses in attention, further increase the over-estimation of threshold. Comparisons between data from individuals who may differ in lapse rate are therefore problematic, but measures to estimate and account for lapse rates in analyses may mitigate this problem.
Plain film measurement error in acute displaced midshaft clavicle fractures
Archer, Lori Anne; Hunt, Stephen; Squire, Daniel; Moores, Carl; Stone, Craig; O’Dea, Frank; Furey, Andrew
2016-01-01
Background Clavicle fractures are common and optimal treatment remains controversial. Recent literature suggests operative fixation of acute displaced mid-shaft clavicle fractures (DMCFs) shortened more than 2 cm improves outcomes. We aimed to identify correlation between plain film and computed tomography (CT) measurement of displacement and the inter- and intraobserver reliability of repeated radiographic measurements. Methods We obtained radiographs and CT scans of patients with acute DMCFs. Three orthopedic staff and 3 residents measured radiographic displacement at time zero and 2 weeks later. The CT measurements identified absolute shortening in 3 dimensions (by subtracting the length of the fractured from the intact clavicle). We then compared shortening measured on radiographs and shortening measured in 3 dimensions on CT. Interobserver and intraobserver reliability were calculated. Results We reviewed the fractures of 22 patients. Bland–Altman repeatability coefficient calculations indicated that radiograph and CT measurements of shortening could not be correlated owing to an unacceptable amount of measurement error (6 cm). Interobserver reliability for plain radiograph measurements was excellent (Cronbach α = 0.90). Likewise, intraobserver reliabilities for plain radiograph measurements as calculated with paired t tests indicated excellent correlation (p > 0.05 in all but 1 observer [p = 0.04]). Conclusion To establish shortening as an indication for DMCF fixation, reliable measurement tools are required. The low correlation between plain film and CT measurements we observed suggests further research is necessary to establish what imaging modality reliably predicts shortening. Our results indicate weak correlation between radiograph and CT measurement of acute DMCF shortening. PMID:27438054
Reliability of drivers in urban intersections.
Gstalter, Herbert; Fastenmeier, Wolfgang
2010-01-01
The concept of human reliability has been widely used in industrial settings by human factors experts to optimise the person-task fit. Reliability is estimated by the probability that a task will successfully be completed by personnel in a given stage of system operation. Human Reliability Analysis (HRA) is a technique used to calculate human error probabilities as the ratio of errors committed to the number of opportunities for that error. To transfer this notion to the measurement of car driver reliability the following components are necessary: a taxonomy of driving tasks, a definition of correct behaviour in each of these tasks, a list of errors as deviations from the correct actions and an adequate observation method to register errors and opportunities for these errors. Use of the SAFE-task analysis procedure recently made it possible to derive driver errors directly from the normative analysis of behavioural requirements. Driver reliability estimates could be used to compare groups of tasks (e.g. different types of intersections with their respective regulations) as well as groups of drivers' or individual drivers' aptitudes. This approach was tested in a field study with 62 drivers of different age groups. The subjects drove an instrumented car and had to complete an urban test route, the main features of which were 18 intersections representing six different driving tasks. The subjects were accompanied by two trained observers who recorded driver errors using standardized observation sheets. Results indicate that error indices often vary between both the age group of drivers and the type of driving task. The highest error indices occurred in the non-signalised intersection tasks and the roundabout, which exactly equals the corresponding ratings of task complexity from the SAFE analysis. A comparison of age groups clearly shows the disadvantage of older drivers, whose error indices in nearly all tasks are significantly higher than those of the other groups. The vast majority of these errors could be explained by high task load in the intersections, as they represent difficult tasks. The discussion shows how reliability estimates can be used in a constructive way to propose changes in car design, intersection layout and regulation as well as driver training.
Mejia, Amanda F; Nebel, Mary Beth; Barber, Anita D; Choe, Ann S; Pekar, James J; Caffo, Brian S; Lindquist, Martin A
2018-05-15
Reliability of subject-level resting-state functional connectivity (FC) is determined in part by the statistical techniques employed in its estimation. Methods that pool information across subjects to inform estimation of subject-level effects (e.g., Bayesian approaches) have been shown to enhance reliability of subject-level FC. However, fully Bayesian approaches are computationally demanding, while empirical Bayesian approaches typically rely on using repeated measures to estimate the variance components in the model. Here, we avoid the need for repeated measures by proposing a novel measurement error model for FC describing the different sources of variance and error, which we use to perform empirical Bayes shrinkage of subject-level FC towards the group average. In addition, since the traditional intra-class correlation coefficient (ICC) is inappropriate for biased estimates, we propose a new reliability measure denoted the mean squared error intra-class correlation coefficient (ICC MSE ) to properly assess the reliability of the resulting (biased) estimates. We apply the proposed techniques to test-retest resting-state fMRI data on 461 subjects from the Human Connectome Project to estimate connectivity between 100 regions identified through independent components analysis (ICA). We consider both correlation and partial correlation as the measure of FC and assess the benefit of shrinkage for each measure, as well as the effects of scan duration. We find that shrinkage estimates of subject-level FC exhibit substantially greater reliability than traditional estimates across various scan durations, even for the most reliable connections and regardless of connectivity measure. Additionally, we find partial correlation reliability to be highly sensitive to the choice of penalty term, and to be generally worse than that of full correlations except for certain connections and a narrow range of penalty values. This suggests that the penalty needs to be chosen carefully when using partial correlations. Copyright © 2018. Published by Elsevier Inc.
Measurement of vertebral rotation: Perdriolle versus Raimondi.
Weiss, H R
1995-01-01
The measurement of vertebral rotation according to Perdriolle is widely used in the French-speaking and Anglo-American countries. Even in this measurement technique there may be a relatively high estimation error because of the not very accurate grading in steps of 5 degrees. The measurement according to Raimondi seems to be easier to use and is more accurate, with 2 degrees steps. The purpose of our study was to determine the technical error of both measuring methods. The apex vertebra of 40 curves on 20 anteroposterior (AP) radiographs were measured by using the Perdriolle torsion meter and the Regolo Raimondi. Interrater and intrarater reliability were computed. The thoracic Cobb angle was 43 degrees, the lumbar Cobb angle 36 degrees. The average rotation according to Perdriolle was 19.1 degrees thoracic (SD 11.14), 12.7 degrees lumbar (11.21). Measurement of vertebral rotation according to Raimondi showed an average rotation of 20.25 degrees in the thoracic region (11.40) and 13.4 degrees lumbar (10.92). The intrarater reliability was r = 0.991 (Perdriolle) and r = 0.997 (Raimondi). The average intrarater error was 1.025 degrees in the Perdriolle measurement and 0.4 degrees in the Raimondi measurement. Interrater error was on average 3.112 degrees for the Perdriolle measurement and 3.630 degrees for the Raimondi measurement. This shows that both methods are useful tools for the follow-up of vertebral rotation as projected on standard X-rays for the experienced clinical. The Raimondi ruler is easier to use and is slightly more reliable.
Reducing random measurement error in assessing postural load on the back in epidemiologic surveys.
Burdorf, A
1995-02-01
The goal of this study was to design strategies to assess postural load on the back in occupational epidemiology by taking into account the reliability of measurement methods and the variability of exposure among the workers under study. Intermethod reliability studies were evaluated to estimate the systematic bias (accuracy) and random measurement error (precision) of various methods to assess postural load on the back. Intramethod reliability studies were reviewed to estimate random variability of back load over time. Intermethod surveys have shown that questionnaires have a moderate reliability for gross activities such as sitting, whereas duration of trunk flexion and rotation should be assessed by observation methods or inclinometers. Intramethod surveys indicate that exposure variability can markedly affect the reliability of estimates of back load if the estimates are based upon a single measurement over a certain time period. Equations have been presented to evaluate various study designs according to the reliability of the measurement method, the optimum allocation of the number of repeated measurements per subject, and the number of subjects in the study. Prior to a large epidemiologic study, an exposure-oriented survey should be conducted to evaluate the performance of measurement instruments and to estimate sources of variability for back load. The strategy for assessing back load can be optimized by balancing the number of workers under study and the number of repeated measurements per worker.
Uncertainty Analysis of Seebeck Coefficient and Electrical Resistivity Characterization
NASA Technical Reports Server (NTRS)
Mackey, Jon; Sehirlioglu, Alp; Dynys, Fred
2014-01-01
In order to provide a complete description of a materials thermoelectric power factor, in addition to the measured nominal value, an uncertainty interval is required. The uncertainty may contain sources of measurement error including systematic bias error and precision error of a statistical nature. The work focuses specifically on the popular ZEM-3 (Ulvac Technologies) measurement system, but the methods apply to any measurement system. The analysis accounts for sources of systematic error including sample preparation tolerance, measurement probe placement, thermocouple cold-finger effect, and measurement parameters; in addition to including uncertainty of a statistical nature. Complete uncertainty analysis of a measurement system allows for more reliable comparison of measurement data between laboratories.
Chen, Yi-Miau; Huang, Yi-Jing; Huang, Chien-Yu; Lin, Gong-Hong; Liaw, Lih-Jiun; Lee, Shih-Chieh; Hsieh, Ching-Lin
2017-10-01
The 3-point Berg Balance Scale (BBS-3P) and 3-point Postural Assessment Scale for Stroke Patients (PASS-3P) were simplified from the BBS and PASS to overcome the complex scoring systems. The BBS-3P and PASS-3P were more feasible in busy clinical practice and showed similarly sound validity and responsiveness to the original measures. However, the reliability of the BBS-3P and PASS-3P is unknown limiting their utility and the interpretability of scores. We aimed to examine the test-retest reliability and minimal detectable change (MDC) of the BBS-3P and PASS-3P in patients with stroke. Cross-sectional study. The rehabilitation departments of a medical center and a community hospital. A total of 51 chronic stroke patients (64.7% male). Both balance measures were administered twice 7 days apart. The test-retest reliability of both the BBS-3P and PASS-3P were examined by intraclass correlation coefficients (ICC). The MDC and its percentage over the total score (MDC%) of each measure was calculated for examining the random measurement errors. The ICC values of the BBS-3P and PASS-3P were 0.99 and 0.97, respectively. The MDC% (MDC) of the BBS-3P and PASS-3P were 9.1% (5.1 points) and 8.4% (3.0 points), respectively, indicating that both measures had small and acceptable random measurement errors. Our results showed that both the BBS-3P and the PASS-3P had good test-retest reliability, with small and acceptable random measurement error. These two simplified 3-level balance measures can provide reliable results over time. Our findings support the repeated administration of the BBS-3P and PASS-3P to monitor the balance of patients with stroke. The MDC values can help clinicians and researchers interpret the change scores more precisely.
Research Measures for Dyscalculia: A Validity and Reliability Study.
ERIC Educational Resources Information Center
Geiman, R. M.
1986-01-01
This study sought to evaluate a measure of dyscalculia to determine its validity and reliability. It also tested use of the instrument with seventh graders and ascertained where errors attributed to dyscalculia were also present in an average sample of seventh graders. Results varied. (MNS)
Cobb, Stephen C.; James, C. Roger; Hjertstedt, Matthew; Kruk, James
2011-01-01
Abstract Context: Although abnormal foot posture long has been associated with lower extremity injury risk, the evidence is equivocal. Poor intertester reliability of traditional foot measures might contribute to the inconsistency. Objectives: To investigate the validity and reliability of a digital photographic measurement method (DPMM) technology, the reliability of DPMM-quantified foot measures, and the concurrent validity of the DPMM with clinical-measurement methods (CMMs) and to report descriptive data for DPMM measures with moderate to high intratester and intertester reliability. Design: Descriptive laboratory study. Setting: Biomechanics research laboratory. Patients or Other Participants: A total of 159 people participated in 3 groups. Twenty-eight people (11 men, 17 women; age = 25 ± 5 years, height = 1.71 ± 0.10 m, mass = 77.6 ± 17.3 kg) were recruited for investigation of intratester and intertester reliability of the DPMM technology; 20 (10 men, 10 women; age = 24 ± 2 years, height = 1.71 ± 0.09 m, mass = 76 ± 16 kg) for investigation of DPMM and CMM reliability and concurrent validity; and 111 (42 men, 69 women; age = 22.8 ± 4.7 years, height = 168.5 ± 10.4 cm, mass = 69.8 ± 13.3 kg) for development of a descriptive data set of the DPMM foot measurements with moderate to high intratester and intertester reliabilities. Intervention(s): The dimensions of 10 model rectangles and the 28 participants' feet were measured, and DPMM foot posture was measured in the 111 participants. Two clinicians assessed the DPMM and CMM foot measures of the 20 participants. Main Outcome Measure(s): Validity and reliability were evaluated using mean absolute and percentage errors and intraclass correlation coefficients. Descriptive data were computed from the DPMM foot posture measures. Results: The DPMM technology intratester and intertester reliability intraclass correlation coefficients were 1.0 for each tester and variable. Mean absolute errors were equal to or less than 0.2 mm for the bottom and right-side variables and 0.1° for the calculated angle variable. Mean percentage errors between the DPMM and criterion reference values were equal to or less than 0.4%. Intratester and intertester reliabilities of DPMM-computed structural measures of arch and navicular indices were moderate to high (>0.78), and concurrent validity was moderate to strong. Conclusions: The DPMM is a valid and reliable clinical and research tool for quantifying foot structure. The DPMM and the descriptive data might be used to define groups in future studies in which the relationship between foot posture and function or injury risk is investigated. PMID:21214347
Intrarater Reliability and Other Psychometrics of the Health Promoting Activities Scale (HPAS).
Muskett, Rachel; Bourke-Taylor, Helen; Hewitt, Alana
The Health Promoting Activities Scale (HPAS) measures the self-rated frequency with which adults participate in activities that promote health. We evaluated the internal consistency, construct validity, and intrarater reliability of the HPAS with a cohort of mothers (N = 56) of school-age children. We used an online survey that included the HPAS and measures of mental and physical health. Statistical analysis included intraclass correlation coefficients (ICCs), measurement error, error range, limits of agreement, and minimum detectable change (MDC). The HPAS showed good internal consistency (Cronbach's α = .73). Construct validity was supported by a significant difference in HPAS scores among participants grouped by physical activity level; no other differences were significant. Results included a high aggregate ICC of .90 and an MDC of 5 points. Our evaluation of the HPAS revealed good reliability and stability, suggesting suitability for ongoing evaluation as an outcome measure. Copyright © 2017 by the American Occupational Therapy Association, Inc.
Hoppe, Matthias W; Baumgart, Christian; Polglaze, Ted; Freiwald, Jürgen
2018-01-01
This study aimed to investigate the validity and reliability of global (GPS) and local (LPS) positioning systems for measuring distances covered and sprint mechanical properties in team sports. Here, we evaluated two recently released 18 Hz GPS and 20 Hz LPS technologies together with one established 10 Hz GPS technology. Six male athletes (age: 27±2 years; VO2max: 48.8±4.7 ml/min/kg) performed outdoors on 10 trials of a team sport-specific circuit that was equipped with double-light timing gates. The circuit included various walking, jogging, and sprinting sections that were performed either in straight-lines or with changes of direction. During the circuit, athletes wore two devices of each positioning system. From the reported and filtered velocity data, the distances covered and sprint mechanical properties (i.e., the theoretical maximal horizontal velocity, force, and power output) were computed. The sprint mechanical properties were modeled via an inverse dynamic approach applied to the center of mass. The validity was determined by comparing the measured and criterion data via the typical error of estimate (TEE), whereas the reliability was examined by comparing the two devices of each technology (i.e., the between-device reliability) via the coefficient of variation (CV). Outliers due to measurement errors were statistically identified and excluded from validity and reliability analyses. The 18 Hz GPS showed better validity and reliability for determining the distances covered (TEE: 1.6-8.0%; CV: 1.1-5.1%) and sprint mechanical properties (TEE: 4.5-14.3%; CV: 3.1-7.5%) than the 10 Hz GPS (TEE: 3.0-12.9%; CV: 2.5-13.0% and TEE: 4.1-23.1%; CV: 3.3-20.0%). However, the 20 Hz LPS demonstrated superior validity and reliability overall (TEE: 1.0-6.0%; CV: 0.7-5.0% and TEE: 2.1-9.2%; CV: 1.6-7.3%). For the 10 Hz GPS, 18 Hz GPS, and 20 Hz LPS, the relative loss of data sets due to measurement errors was 10.0%, 20.0%, and 15.8%, respectively. This study shows that 18 Hz GPS has enhanced validity and reliability for determining movement patterns in team sports compared to 10 Hz GPS, whereas 20 Hz LPS had superior validity and reliability overall. However, compared to 10 Hz GPS, 18 Hz GPS and 20 Hz LPS technologies had more outliers due to measurement errors, which limits their practical applications at this time.
47 CFR 101.75 - Involuntary relocation procedures.
Code of Federal Regulations, 2010 CFR
2010-10-01
... engineering, equipment, site and FCC fees, as well as any legitimate and prudent transaction expenses incurred... reliability of their system. For digital data systems, reliability is measured by the percent of time the bit error rate (BER) exceeds a desired value, and for analog or digital voice transmissions, it is measured...
Interpreting Variance Components as Evidence for Reliability and Validity.
ERIC Educational Resources Information Center
Kane, Michael T.
The reliability and validity of measurement is analyzed by a sampling model based on generalizability theory. A model for the relationship between a measurement procedure and an attribute is developed from an analysis of how measurements are used and interpreted in science. The model provides a basis for analyzing the concept of an error of…
Assessing the Reliability of Curriculum-Based Measurement: An Application of Latent Growth Modeling
ERIC Educational Resources Information Center
Yeo, Seungsoo; Kim, Dong-Il; Branum-Martin, Lee; Wayman, Miya Miura; Espin, Christine A.
2012-01-01
The purpose of this study was to demonstrate the use of Latent Growth Modeling (LGM) as a method for estimating reliability of Curriculum-Based Measurement (CBM) progress-monitoring data. The LGM approach permits the error associated with each measure to differ at each time point, thus providing an alternative method for examining of the…
The Validity of Reliability Assessments.
ERIC Educational Resources Information Center
Basch, Charles E.; Gold, Robert S.
1985-01-01
Reliability guides research design and is used as a standard for judging the credibility of findings and inferences. Using data gathered in a school health education curriculum evaluation as an example, possible errors in hypothesis testing are examined. Appropriateness of internal consistency as a measure of reliability is discussed and…
Simões, Maria do Socorro Mp; Garcia, Isabel Ff; Costa, Lucíola da Cm; Lunardi, Adriana C
2018-05-01
The Life-Space Assessment (LSA) assesses mobility from the spaces that older adults go, and how often and how independent they move. Despite its increased use, LSA measurement properties remain unclear. The aim of the present study was to analyze the content validity, reliability, construct validity and interpretability of the LSA for Brazilian community-dwelling older adults. In this clinimetric study we analyzed the measurement properties (content validity, reliability, construct validity and interpretability) of the LSA administered to 80 Brazilian community-dwelling older adults. Reliability was analyzed by Cronbach's alpha (internal consistency), intraclass correlation coefficients and 95% confidence interval (reproducibility), and standard error of measurement (measurement error). Construct validity was analyzed by Pearson's correlations between the LSA and accelerometry (time in inactivity and moderate-to-vigorous activities), and interpretability was analyzed by determination of the minimal detectable change, and floor and ceiling effects. The LSA met the criteria for content validity. The Cronbach's alpha was 0.92, intraclass correlation coefficient was 0.97 (95% confidence interval 0.95-0.98) and standard error of measurement was 4.12. The LSA showed convergence with accelerometry (negative correlation with time in inactivity and positive correlation with time in moderate to vigorous activities), the minimal detectable change was 0.36 and we observed no floor or ceiling effects. The LSA showed adequate reliability, validity and interpretability for life-space mobility assessment of Brazilian community-dwelling older adults. Geriatr Gerontol Int 2018; 18: 783-789. © 2018 Japan Geriatrics Society.
Reliability of Semi-Automated Segmentations in Glioblastoma.
Huber, T; Alber, G; Bette, S; Boeckh-Behrens, T; Gempt, J; Ringel, F; Alberts, E; Zimmer, C; Bauer, J S
2017-06-01
In glioblastoma, quantitative volumetric measurements of contrast-enhancing or fluid-attenuated inversion recovery (FLAIR) hyperintense tumor compartments are needed for an objective assessment of therapy response. The aim of this study was to evaluate the reliability of a semi-automated, region-growing segmentation tool for determining tumor volume in patients with glioblastoma among different users of the software. A total of 320 segmentations of tumor-associated FLAIR changes and contrast-enhancing tumor tissue were performed by different raters (neuroradiologists, medical students, and volunteers). All patients underwent high-resolution magnetic resonance imaging including a 3D-FLAIR and a 3D-MPRage sequence. Segmentations were done using a semi-automated, region-growing segmentation tool. Intra- and inter-rater-reliability were addressed by intra-class-correlation (ICC). Root-mean-square error (RMSE) was used to determine the precision error. Dice score was calculated to measure the overlap between segmentations. Semi-automated segmentation showed a high ICC (> 0.985) for all groups indicating an excellent intra- and inter-rater-reliability. Significant smaller precision errors and higher Dice scores were observed for FLAIR segmentations compared with segmentations of contrast-enhancement. Single rater segmentations showed the lowest RMSE for FLAIR of 3.3 % (MPRage: 8.2 %). Both, single raters and neuroradiologists had the lowest precision error for longitudinal evaluation of FLAIR changes. Semi-automated volumetry of glioblastoma was reliably performed by all groups of raters, even without neuroradiologic expertise. Interestingly, segmentations of tumor-associated FLAIR changes were more reliable than segmentations of contrast enhancement. In longitudinal evaluations, an experienced rater can detect progressive FLAIR changes of less than 15 % reliably in a quantitative way which could help to detect progressive disease earlier.
Laar, Matilda E; Marquis, Grace S; Lartey, Anna; Gray-Donald, Katherine
2018-02-17
Length measurements are important in growth, monitoring and promotion (GMP) for the surveillance of a child's weight-for-length and length-for-age. These two indices provide an indication of a child's risk of becoming wasted or stunted, and are more informative about a child's growth than the widely used weight-for-age index (underweight). Although the introduction of length measurements in GMP is recommended by the World Health Organization, concerns about the reliability of length measurements collected in rural outreach settings have been expressed by stakeholders. Our aim was to describe the reliability and challenges associated with community health personnel measuring length for rural outreach GMP activities. Two reliability studies (A and B), using 10 children less than 24 months each, were conducted in the GMP services of a rural district in Ghana. Fifteen nurses and 15 health volunteers (HV) with no prior experience in length measurements were trained. Intra- and inter-observer technical error of measurement (TEM), average bias from expert anthropometrist, and coefficient of reliability (R) of length measurements were assessed and compared across sessions. Observations and interviews were used to understand the ability and experiences of health personnel with measuring length at outreach GMP. Inter-observer TEM was larger than intra-observer TEM for both nurses and HV at both sessions and was unacceptably (compared to error standards) high in both groups at both time points. Average biases from expert's measurements were within acceptable limits, however, both groups tended to underestimate length measurements. The R for lengths collected by nurses (92.3%) was higher at session B compared to that of HV (87.5%). Length measurements taken by nurses and HV, and those taken by an experienced anthropometrist at GMP sessions were of moderate agreement (kappa = 0.53, p < 0.0001). The reliability of length measurements improved after two refresher trainings for nurses but not for HV. In addition, length measurements taken during GMP sessions may be susceptible to errors due to overburdened health personnel and crowded GMP clinics. There is need for both pre- and in-service training of nurses and HV on length measurements and procedures to improve reliability of length measurements.
Gwynne, Craig R; Curran, Sarah A
2014-12-01
Clinical assessment of lower limb kinematics during dynamic tasks may identify individuals who demonstrate abnormal movement patterns that may lead to etiology of exacerbation of knee conditions such as patellofemoral joint (PFJt) pain. The purpose of this study was to determine the reliability, validity and associated measurement error of a clinically appropriate two-dimensional (2-D) procedure of quantifying frontal plane knee alignment during single limb squats. Nine female and nine male recreationally active subjects with no history of PFJt pain had frontal plane limb alignment assessed using three-dimensional (3-D) motion analysis and digital video cameras (2-D analysis) while performing single limb squats. The association between 2-D and 3-D measures was quantified using Pearson's product correlation coefficients. Intraclass correlation coefficients (ICCs) were determined for within- and between-session reliability of 2-D data and standard error of measurement (SEM) was used to establish measurement error. Frontal plane limb alignment assessed with 2-D analysis demonstrated good correlation compared with 3-D methods (r = 0.64 to 0.78, p < 0.001). Within-session (0.86) and between-session ICCs (0.74) demonstrated good reliability for 2-D measures and SEM scores ranged from 2° to 4°. 2-D measures have good consistency and may provide a valid measure of lower limb alignment when compared to existing 3-D methods. Assessment of lower limb kinematics using 2-D methods may be an accurate and clinically useful alternative to 3-D motion analysis when identifying individuals who demonstrate abnormal movement patterns associated with PFJt pain. 2b.
Reliability Estimation for Aggregated Data: Applications for Organizational Research.
ERIC Educational Resources Information Center
Hart, Roland J.; Bradshaw, Stephen C.
This report provides the statistical tools necessary to measure the extent of error that exists in organizational record data and group survey data. It is felt that traditional methods of measuring error are inappropriate or incomplete when applied to organizational groups, especially in studies of organizational change when the same variables are…
Measures of Linguistic Accuracy in Second Language Writing Research.
ERIC Educational Resources Information Center
Polio, Charlene G.
1997-01-01
Investigates the reliability of measures of linguistic accuracy in second language writing. The study uses a holistic scale, error-free T-units, and an error classification system on the essays of English-as-a-Second-Language students and discusses why disagreements arise within a rater and between raters. (24 references) (Author/CK)
Tracking Progress in Improving Diagnosis: A Framework for Defining Undesirable Diagnostic Events.
Olson, Andrew P J; Graber, Mark L; Singh, Hardeep
2018-01-29
Diagnostic error is a prevalent, harmful, and costly phenomenon. Multiple national health care and governmental organizations have recently identified the need to improve diagnostic safety as a high priority. A major barrier, however, is the lack of standardized, reliable methods for measuring diagnostic safety. Given the absence of reliable and valid measures for diagnostic errors, we need methods to help establish some type of baseline diagnostic performance across health systems, as well as to enable researchers and health systems to determine the impact of interventions for improving the diagnostic process. Multiple approaches have been suggested but none widely adopted. We propose a new framework for identifying "undesirable diagnostic events" (UDEs) that health systems, professional organizations, and researchers could further define and develop to enable standardized measurement and reporting related to diagnostic safety. We propose an outline for UDEs that identifies both conditions prone to diagnostic error and the contexts of care in which these errors are likely to occur. Refinement and adoption of this framework across health systems can facilitate standardized measurement and reporting of diagnostic safety.
Huang, Sheau-Ling; Hsieh, Ching-Lin; Wu, Ruey-Meei
2017-01-01
Background The Beck Depression Inventory II (BDI-II) and the Taiwan Geriatric Depression Scale (TGDS) are self-report scales used for assessing depression in patients with Parkinson’s disease (PD) and geriatric people. The minimal detectable change (MDC) represents the least amount of change that indicates real difference (i.e., beyond random measurement error) for a single subject. Our aim was to investigate the test-retest reliability and MDC of the BDI-II and the TGDS in people with PD. Methods Seventy patients were recruited from special clinics for movement disorders at a medical center. The patients’ mean age was 67.7 years, and 63.0% of the patients were male. All patients were assessed with the BDI-II and the TGDS twice, 2 weeks apart. We used the intraclass correlation coefficient (ICC) to determine the reliability between test and retest. We calculated the MDC based on standard error of measurement. The MDC% was calculated (i.e., by dividing the MDC by the possible maximal score of the measure). Results The test-retest reliabilities of the BDI-II/TGDS were high (ICC = 0.86/0.89). The MDCs (MDC%s) of the BDI-II and TGDS were 8.7 (13.8%) and 5.4 points (18.0%), respectively. Both measures had acceptable to nearly excellent random measurement errors. Conclusions The test-retest reliabilities of the BDI-II and the TGDS are high. The MDCs of both measures are acceptable to nearly excellent in people with PD. These findings imply that the BDI-II and the TGDS are suitable for use in a research context and in clinical settings to detect real change in a single subject. PMID:28945776
Alderighi, Marzia; Ferrari, Raffaello; Maghini, Irene; Del Felice, Alessandra; Masiero, Stefano
2016-11-21
Radiographic examination is the gold standard to evaluate spine curves, but ionising radiations limit routine use. Non-invasive methods, such as skin-surface goniometer (IncliMed®) should be used instead. To evaluate intra- and interrater reliability to assess sagittal curves and mobility of the spine with IncliMed®. a reliability study on agonistic football players. Thoracic kyphosis, lumbar lordosis and mobility of the spine were assessed by IncliMed®. Measurements were repeated twice by each examiner during the same session with between-rater blinding. Intrarater and interrater reliability were measured by Intraclass Correlation Coefficient (ICC), 95% Confidence Interval (CI 95%) and Standard Error of Measurement (SEM). Thirty-four healthy female football players (19.17 ± 4.52 years) were enrolled. Statistical results showed high intrarater (0.805-0.923) and interrater (0.701-0.886) reliability (ICC > 0.8). The obtained intra- and interrater SEM were low, with overall absolute intrarater values between 1.39° and 2.76° and overall interrater values between 1.71° and 4.25°. IncliMed® provides high intra- and interrater reliability in healthy subjects, with limited Standard Error of Measurement. These results encourage its use in clinical practice and scientific research.
Fitzgerald, John S; Johnson, LuAnn; Tomkinson, Grant; Stein, Jesse; Roemmich, James N
2018-05-01
Mechanography during the vertical jump may enhance screening and determining mechanistic causes underlying physical performance changes. Utility of jump mechanography for evaluation is limited by scant test-retest reliability data on force-time variables. This study examined the test-retest reliability of eight jump execution variables assessed from mechanography. Thirty-two women (mean±SD: age 20.8 ± 1.3 yr) and 16 men (age 22.1 ± 1.9 yr) attended a familiarization session and two testing sessions, all one week apart. Participants performed two variations of the squat jump with squat depth self-selected and controlled using a goniometer to 80º knee flexion. Test-retest reliability was quantified as the systematic error (using effect size between jumps), random error (using coefficients of variation), and test-retest correlations (using intra-class correlation coefficients). Overall, jump execution variables demonstrated acceptable reliability, evidenced by small systematic errors (mean±95%CI: 0.2 ± 0.07), moderate random errors (mean±95%CI: 17.8 ± 3.7%), and very strong test-retest correlations (range: 0.73-0.97). Differences in random errors between controlled and self-selected protocols were negligible (mean±95%CI: 1.3 ± 2.3%). Jump execution variables demonstrated acceptable reliability, with no meaningful differences between the controlled and self-selected jump protocols. To simplify testing, a self-selected jump protocol can be used to assess force-time variables with negligible impact on measurement error.
Sample Size for Estimation of G and Phi Coefficients in Generalizability Theory
ERIC Educational Resources Information Center
Atilgan, Hakan
2013-01-01
Problem Statement: Reliability, which refers to the degree to which measurement results are free from measurement errors, as well as its estimation, is an important issue in psychometrics. Several methods for estimating reliability have been suggested by various theories in the field of psychometrics. One of these theories is the generalizability…
Keller, Lisa A; Clauser, Brian E; Swanson, David B
2010-12-01
In recent years, demand for performance assessments has continued to grow. However, performance assessments are notorious for lower reliability, and in particular, low reliability resulting from task specificity. Since reliability analyses typically treat the performance tasks as randomly sampled from an infinite universe of tasks, these estimates of reliability may not be accurate. For tests built according to a table of specifications, tasks are randomly sampled from different strata (content domains, skill areas, etc.). If these strata remain fixed in the test construction process, ignoring this stratification in the reliability analysis results in an underestimate of "parallel forms" reliability, and an overestimate of the person-by-task component. This research explores the effect of representing and misrepresenting the stratification appropriately in estimation of reliability and the standard error of measurement. Both multivariate and univariate generalizability studies are reported. Results indicate that the proper specification of the analytic design is essential in yielding the proper information both about the generalizability of the assessment and the standard error of measurement. Further, illustrative D studies present the effect under a variety of situations and test designs. Additional benefits of multivariate generalizability theory in test design and evaluation are also discussed.
Ramírez-Vélez, Robinson; Rodrigues-Bezerra, Diogo; Correa-Bautista, Jorge Enrique; Izquierdo, Mikel; Lobelo, Felipe
2015-01-01
Substantial evidence indicates that youth physical fitness levels are an important marker of lifestyle and cardio-metabolic health profiles and predict future risk of chronic diseases. The reliability physical fitness tests have not been explored in Latino-American youth population. This study’s aim was to examine the reliability of health-related physical fitness tests that were used in the Colombian health promotion “Fuprecol study”. Participants were 229 Colombian youth (boys n = 124 and girls n = 105) aged 9 to 17.9 years old. Five components of health-related physical fitness were measured: 1) morphological component: height, weight, body mass index (BMI), waist circumference, triceps skinfold, subscapular skinfold, and body fat (%) via impedance; 2) musculoskeletal component: handgrip and standing long jump test; 3) motor component: speed/agility test (4x10 m shuttle run); 4) flexibility component (hamstring and lumbar extensibility, sit-and-reach test); 5) cardiorespiratory component: 20-meter shuttle-run test (SRT) to estimate maximal oxygen consumption. The tests were performed two times, 1 week apart on the same day of the week, except for the SRT which was performed only once. Intra-observer technical errors of measurement (TEMs) and inter-rater (reliability) were assessed in the morphological component. Reliability for the Musculoskeletal, motor and cardiorespiratory fitness components was examined using Bland–Altman tests. For the morphological component, TEMs were small and reliability was greater than 95% of all cases. For the musculoskeletal, motor, flexibility and cardiorespiratory components, we found adequate reliability patterns in terms of systematic errors (bias) and random error (95% limits of agreement). When the fitness assessments were performed twice, the systematic error was nearly 0 for all tests, except for the sit and reach (mean difference: -1.03% [95% CI = -4.35% to -2.28%]. The results from this study indicate that the “Fuprecol study” health-related physical fitness battery, administered by physical education teachers, was reliable for measuring health-related components of fitness in children and adolescents aged 9–17.9 years old in a school setting in Colombia. PMID:26474474
Ramírez-Vélez, Robinson; Rodrigues-Bezerra, Diogo; Correa-Bautista, Jorge Enrique; Izquierdo, Mikel; Lobelo, Felipe
2015-01-01
Substantial evidence indicates that youth physical fitness levels are an important marker of lifestyle and cardio-metabolic health profiles and predict future risk of chronic diseases. The reliability physical fitness tests have not been explored in Latino-American youth population. This study's aim was to examine the reliability of health-related physical fitness tests that were used in the Colombian health promotion "Fuprecol study". Participants were 229 Colombian youth (boys n = 124 and girls n = 105) aged 9 to 17.9 years old. Five components of health-related physical fitness were measured: 1) morphological component: height, weight, body mass index (BMI), waist circumference, triceps skinfold, subscapular skinfold, and body fat (%) via impedance; 2) musculoskeletal component: handgrip and standing long jump test; 3) motor component: speed/agility test (4x10 m shuttle run); 4) flexibility component (hamstring and lumbar extensibility, sit-and-reach test); 5) cardiorespiratory component: 20-meter shuttle-run test (SRT) to estimate maximal oxygen consumption. The tests were performed two times, 1 week apart on the same day of the week, except for the SRT which was performed only once. Intra-observer technical errors of measurement (TEMs) and inter-rater (reliability) were assessed in the morphological component. Reliability for the Musculoskeletal, motor and cardiorespiratory fitness components was examined using Bland-Altman tests. For the morphological component, TEMs were small and reliability was greater than 95% of all cases. For the musculoskeletal, motor, flexibility and cardiorespiratory components, we found adequate reliability patterns in terms of systematic errors (bias) and random error (95% limits of agreement). When the fitness assessments were performed twice, the systematic error was nearly 0 for all tests, except for the sit and reach (mean difference: -1.03% [95% CI = -4.35% to -2.28%]. The results from this study indicate that the "Fuprecol study" health-related physical fitness battery, administered by physical education teachers, was reliable for measuring health-related components of fitness in children and adolescents aged 9-17.9 years old in a school setting in Colombia.
Brackley, Victoria; Ball, Kevin; Tor, Elaine
2018-05-12
The effectiveness of the swimming turn is highly influential to overall performance in competitive swimming. The push-off or wall contact, within the turn phase, is directly involved in determining the speed the swimmer leaves the wall. Therefore, it is paramount to develop reliable methods to measure the wall-contact-time during the turn phase for training and research purposes. The aim of this study was to determine the concurrent validity and reliability of the Pool Pad App to measure wall-contact-time during the freestyle and backstroke tumble turn. The wall-contact-times of nine elite and sub-elite participants were recorded during their regular training sessions. Concurrent validity statistics included the standardised typical error estimate, linear analysis and effect sizes while the intraclass correlating coefficient (ICC) was used for the reliability statistics. The standardised typical error estimate resulted in a moderate Cohen's d effect size with an R 2 value of 0.80 and the ICC between the Pool Pad and 2D video footage was 0.89. Despite these measurement differences, the results from this concurrent validity and reliability analyses demonstrated that the Pool Pad is suitable for measuring wall-contact-time during the freestyle and backstroke tumble turn within a training environment.
A survey of quality measures for gray-scale image compression
NASA Technical Reports Server (NTRS)
Eskicioglu, Ahmet M.; Fisher, Paul S.
1993-01-01
Although a variety of techniques are available today for gray-scale image compression, a complete evaluation of these techniques cannot be made as there is no single reliable objective criterion for measuring the error in compressed images. The traditional subjective criteria are burdensome, and usually inaccurate or inconsistent. On the other hand, being the most common objective criterion, the mean square error (MSE) does not have a good correlation with the viewer's response. It is now understood that in order to have a reliable quality measure, a representative model of the complex human visual system is required. In this paper, we survey and give a classification of the criteria for the evaluation of monochrome image quality.
ERIC Educational Resources Information Center
Quarm, Daisy
1981-01-01
Findings for couples (N=119) show wife's work, money, and spare time low between-spouse correlations are due in part to random measurement error. Suggests that increasing reliability of measures by creating multi-item indices can also increase correlations. Car purchase, vacation, and child discipline were not accounted for by random measurement…
San José, Verónica; Bellot-Arcís, Carlos; Tarazona, Beatriz; Zamora, Natalia; O Lagravère, Manuel
2017-01-01
Background To compare the reliability and accuracy of direct and indirect dental measurements derived from two types of 3D virtual models: generated by intraoral laser scanning (ILS) and segmented cone beam computed tomography (CBCT), comparing these with a 2D digital model. Material and Methods One hundred patients were selected. All patients’ records included initial plaster models, an intraoral scan and a CBCT. Patients´ dental arches were scanned with the iTero® intraoral scanner while the CBCTs were segmented to create three-dimensional models. To obtain 2D digital models, plaster models were scanned using a conventional 2D scanner. When digital models had been obtained using these three methods, direct dental measurements were measured and indirect measurements were calculated. Differences between methods were assessed by means of paired t-tests and regression models. Intra and inter-observer error were analyzed using Dahlberg´s d and coefficients of variation. Results Intraobserver and interobserver error for the ILS model was less than 0.44 mm while for segmented CBCT models, the error was less than 0.97 mm. ILS models provided statistically and clinically acceptable accuracy for all dental measurements, while CBCT models showed a tendency to underestimate measurements in the lower arch, although within the limits of clinical acceptability. Conclusions ILS and CBCT segmented models are both reliable and accurate for dental measurements. Integration of ILS with CBCT scans would get dental and skeletal information altogether. Key words:CBCT, intraoral laser scanner, 2D digital models, 3D models, dental measurements, reliability. PMID:29410764
Jackson, Steven M; Cheng, M Samuel; Smith, A Russell; Kolber, Morey J
2017-02-01
Hand held dynamometry (HHD) is a more objective way to quantify muscle force production (MP) compared to traditional manual muscle testing. HHD reliability can be negatively impacted by both the strength of the tester and the subject particularly in the lower extremities due to larger muscle groups. The primary aim of this investigation was to assess intrarater reliability of HHD with use of a portable stabilization device for lower extremity MP in an athletic population. Isometric lower extremity strength was measured for bilateral lower extremities including hip abductors, external rotators, adductors, knee extensors, and ankle plantar flexors was measured in a sample of healthy recreational runners (8 male, 7 females, = 30 limbs) training for a marathon. These measurements were assessed using an intrasession intrarater reliability design. Intraclass correlation coefficients (ICC) were calculated using 3,1 model based on the single rater design. The standard error of measurement (SEM) for each muscle group was also calculated. ICC were excellent ranging from ICC (3,1) = 0.93-0.98 with standard error of measurements ranging from 0.58 to 17.2 N. This study establishes the use of a HHD with a portable stabilization device as demonstrating good reliability within testers for measuring lower extremity muscle performance in an active healthy population. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kumar, Praveen; Cruziah, Reynold; Bradley, Michael; Gray, Selena; Swinkels, Annette
2016-06-01
Glenohumeral subluxation (GHS) is reported in up to 81% of patients with stroke. Ultrasonographic measurements of GHS by measuring the acromion-greater tuberosity (AGT) have been found to be reliable for experienced raters. The primary aim was to assess the intra-rater reliability of measurements of AGT distance in people with stroke following a short course of rater training. A secondary aim was to compare the inter-rater reliability of these measurements between novice and experienced raters. Patients with stroke (n = 16; 5 men, 11 women; 74 ± 10 years) with 1-sided weakness who gave informed consent were recruited. Ultrasonographic measurements were recorded at the bedside by two physiotherapists with patients seated upright in a hospital chair. Reliability was assessed by intra-class correlation coefficients (ICCs) and the standard error of measurements (SEM). Minimum detectable change (MDC90) scores were used to estimate the magnitude of change that is likely to exceed measurement error. Mean ± SD AGT distances on the affected and unaffected sides for rater 1 were 2.2 ± 0.7 and 1.7 ± 0.4 cm, respectively. Corresponding values for rater 2 were 2.5 ± 0.6 and 2.0 ± 0.4 cm. Intra-class correlation coefficient values for the affected and unaffected shoulders for rater 1 were 0.96 and 0.91, respectively. Corresponding values for rater 2 were 0.95 and 0.90.SEM and MDC90 for both affected and unaffected shoulders were ≤ 0.2 cm. Inter-rater reliability coefficients were 0.86 (affected) and 0.76 (unaffected) shoulders. Ultrasonographic measurement of AGT distance demonstrates excellent intra-rater reliability for a novice rater. Inter-rater reliability of ultrasonographic measurement of AGT also demonstrates good reliability between novice and experienced raters.
Raymond, Mark R; Clauser, Brian E; Furman, Gail E
2010-10-01
The use of standardized patients to assess communication skills is now an essential part of assessing a physician's readiness for practice. To improve the reliability of communication scores, it has become increasingly common in recent years to use statistical models to adjust ratings provided by standardized patients. This study employed ordinary least squares regression to adjust ratings, and then used generalizability theory to evaluate the impact of these adjustments on score reliability and the overall standard error of measurement. In addition, conditional standard errors of measurement were computed for both observed and adjusted scores to determine whether the improvements in measurement precision were uniform across the score distribution. Results indicated that measurement was generally less precise for communication ratings toward the lower end of the score distribution; and the improvement in measurement precision afforded by statistical modeling varied slightly across the score distribution such that the most improvement occurred in the upper-middle range of the score scale. Possible reasons for these patterns in measurement precision are discussed, as are the limitations of the statistical models used for adjusting performance ratings.
Error measuring system of rotary Inductosyn
NASA Astrophysics Data System (ADS)
Liu, Chengjun; Zou, Jibin; Fu, Xinghe
2008-10-01
The inductosyn is a kind of high-precision angle-position sensor. It has important applications in servo table, precision machine tool and other products. The precision of inductosyn is calibrated by its error. It's an important problem about the error measurement in the process of production and application of the inductosyn. At present, it mainly depends on the method of artificial measurement to obtain the error of inductosyn. Therefore, the disadvantages can't be ignored such as the high labour intensity of the operator, the occurrent error which is easy occurred and the poor repeatability, and so on. In order to solve these problems, a new automatic measurement method is put forward in this paper which based on a high precision optical dividing head. Error signal can be obtained by processing the output signal of inductosyn and optical dividing head precisely. When inductosyn rotating continuously, its zero position error can be measured dynamically, and zero error curves can be output automatically. The measuring and calculating errors caused by man-made factor can be overcome by this method, and it makes measuring process more quickly, exactly and reliably. Experiment proves that the accuracy of error measuring system is 1.1 arc-second (peak - peak value).
Validity of radiographic assessment of the knee joint space using automatic image analysis.
Komatsu, Daigo; Hasegawa, Yukiharu; Kojima, Toshihisa; Seki, Taisuke; Ikeuchi, Kazuma; Takegami, Yasuhiko; Amano, Takafumi; Higuchi, Yoshitoshi; Kasai, Takehiro; Ishiguro, Naoki
2016-09-01
The present study investigated whether there were differences between automatic and manual measurements of the minimum joint space width (mJSW) on knee radiographs. Knee radiographs of 324 participants in a systematic health screening were analyzed using the following three methods: manual measurement of film-based radiographs (Manual), manual measurement of digitized radiographs (Digital), and automatic measurement of digitized radiographs (Auto). The mean mJSWs on the medial and lateral sides of the knees were determined using each method, and measurement reliability was evaluated using intra-class correlation coefficients. Measurement errors were compared between normal knees and knees with radiographic osteoarthritis. All three methods demonstrated good reliability, although the reliability was slightly lower with the Manual method than with the other methods. On the medial and lateral sides of the knees, the mJSWs were the largest in the Manual method and the smallest in the Auto method. The measurement errors of each method were significantly larger for normal knees than for radiographic osteoarthritis knees. The mJSW measurements are more accurate and reliable with the Auto method than with the Manual or Digital method, especially for normal knees. Therefore, the Auto method is ideal for the assessment of the knee joint space.
Y-balance test: a reliability study involving multiple raters.
Shaffer, Scott W; Teyhen, Deydre S; Lorenson, Chelsea L; Warren, Rick L; Koreerat, Christina M; Straseske, Crystal A; Childs, John D
2013-11-01
The Y-balance test (YBT) is one of the few field expedient tests that have shown predictive validity for injury risk in an athletic population. However, analysis of the YBT in a heterogeneous population of active adults (e.g., military, specific occupations) involving multiple raters with limited experience in a mass screening setting is lacking. The primary purpose of this study was to determine interrater test-retest reliability of the YBT in a military setting using multiple raters. Sixty-four service members (53 males, 11 females) actively conducting military training volunteered to participate. Interrater test-retest reliability of the maximal reach had intraclass correlation coefficients (2,1) of 0.80 to 0.85 with a standard error of measurement ranging from 3.1 to 4.2 cm for the 3 reach directions (anterior, posteromedial, and posterolateral). Interrater test-retest reliability of the average reach of 3 trails had an intraclass correlation coefficients (2,3) range of 0.85 to 0.93 with an associated standard error of measurement ranging from 2.0 to 3.5cm. The YBT showed good interrater test-retest reliability with an acceptable level of measurement error among multiple raters screening active duty service members. In addition, 31.3% (n = 20 of 64) of participants exhibited an anterior reach asymmetry of >4cm, suggesting impaired balance symmetry and potentially increased risk for injury. Reprint & Copyright © 2013 Association of Military Surgeons of the U.S.
Dhital, Anup; Bancroft, Jared B; Lachapelle, Gérard
2013-11-07
In natural and urban canyon environments, Global Navigation Satellite System (GNSS) signals suffer from various challenges such as signal multipath, limited or lack of signal availability and poor geometry. Inertial sensors are often employed to improve the solution continuity under poor GNSS signal quality and availability conditions. Various fault detection schemes have been proposed in the literature to detect and remove biased GNSS measurements to obtain a more reliable navigation solution. However, many of these methods are found to be sub-optimal and often lead to unavailability of reliability measures, mostly because of the improper characterization of the measurement errors. A robust filtering architecture is thus proposed which assumes a heavy-tailed distribution for the measurement errors. Moreover, the proposed filter is capable of adapting to the changing GNSS signal conditions such as when moving from open sky conditions to deep canyons. Results obtained by processing data collected in various GNSS challenged environments show that the proposed scheme provides a robust navigation solution without having to excessively reject usable measurements. The tests reported herein show improvements of nearly 15% and 80% for position accuracy and reliability, respectively, when applying the above approach.
Dhital, Anup; Bancroft, Jared B.; Lachapelle, Gérard
2013-01-01
In natural and urban canyon environments, Global Navigation Satellite System (GNSS) signals suffer from various challenges such as signal multipath, limited or lack of signal availability and poor geometry. Inertial sensors are often employed to improve the solution continuity under poor GNSS signal quality and availability conditions. Various fault detection schemes have been proposed in the literature to detect and remove biased GNSS measurements to obtain a more reliable navigation solution. However, many of these methods are found to be sub-optimal and often lead to unavailability of reliability measures, mostly because of the improper characterization of the measurement errors. A robust filtering architecture is thus proposed which assumes a heavy-tailed distribution for the measurement errors. Moreover, the proposed filter is capable of adapting to the changing GNSS signal conditions such as when moving from open sky conditions to deep canyons. Results obtained by processing data collected in various GNSS challenged environments show that the proposed scheme provides a robust navigation solution without having to excessively reject usable measurements. The tests reported herein show improvements of nearly 15% and 80% for position accuracy and reliability, respectively, when applying the above approach. PMID:24212120
Wind power error estimation in resource assessments.
Rodríguez, Osvaldo; Del Río, Jesús A; Jaramillo, Oscar A; Martínez, Manuel
2015-01-01
Estimating the power output is one of the elements that determine the techno-economic feasibility of a renewable project. At present, there is a need to develop reliable methods that achieve this goal, thereby contributing to wind power penetration. In this study, we propose a method for wind power error estimation based on the wind speed measurement error, probability density function, and wind turbine power curves. This method uses the actual wind speed data without prior statistical treatment based on 28 wind turbine power curves, which were fitted by Lagrange's method, to calculate the estimate wind power output and the corresponding error propagation. We found that wind speed percentage errors of 10% were propagated into the power output estimates, thereby yielding an error of 5%. The proposed error propagation complements the traditional power resource assessments. The wind power estimation error also allows us to estimate intervals for the power production leveled cost or the investment time return. The implementation of this method increases the reliability of techno-economic resource assessment studies.
Wind Power Error Estimation in Resource Assessments
Rodríguez, Osvaldo; del Río, Jesús A.; Jaramillo, Oscar A.; Martínez, Manuel
2015-01-01
Estimating the power output is one of the elements that determine the techno-economic feasibility of a renewable project. At present, there is a need to develop reliable methods that achieve this goal, thereby contributing to wind power penetration. In this study, we propose a method for wind power error estimation based on the wind speed measurement error, probability density function, and wind turbine power curves. This method uses the actual wind speed data without prior statistical treatment based on 28 wind turbine power curves, which were fitted by Lagrange's method, to calculate the estimate wind power output and the corresponding error propagation. We found that wind speed percentage errors of 10% were propagated into the power output estimates, thereby yielding an error of 5%. The proposed error propagation complements the traditional power resource assessments. The wind power estimation error also allows us to estimate intervals for the power production leveled cost or the investment time return. The implementation of this method increases the reliability of techno-economic resource assessment studies. PMID:26000444
Hu, Zhi-Jun; He, Jian; Zhao, Feng-Dong; Fang, Xiang-Qian; Zhou, Li-Na; Fan, Shun-Wu
2011-06-01
A reliability study was conducted. To estimate the intra- and intermeasurement errors in the measurements of functional cross-sectional area (FCSA), density, and T2 signal intensity of paraspinal muscles using computed tomography (CT) scan and magnetic resonance imaging (MRI). CT scan and MRI had been used widely to measure the cross-sectional area and degeneration of the back muscles in spine and muscle research. But there is still no systemic study to analyze the reliability of these measurements. This study measured the FCSA and fatty infiltration (density on CT scan and T2 signal intensity on MRI) of the paraspinal muscles at L3-L4, L4-L5, and L5-S1 in 29 patients with chronic low back pain. Two experienced musculoskeletal radiologists and one superior spine surgeon traced the region of interest twice within 3 weeks for measurement of the intra- and interobserver reliability. The intraclass correlation coefficients (ICCs) of the intra-reliability ranged from fair to excellent for FCSA, and good to excellent for fatty infiltration. The ICCs of the inter-reliability ranged from fair to excellent for FCSA, and good to excellent for fatty infiltration. There were no significant differences between CT scan and MRI in reliability results, except in the relative standard error of fatty infiltration measurement. The ICCs of the FCSA measurement between CT scan and MRI ranged from poor to good. The reliabilities of the CT scan and MRI for measuring the FCSA and fatty infiltration of the atrophied lumbar paraspinal muscles were acceptable. It was reliable for using uniform one image method for a single paraspinal muscle evaluation study. And the authors preferred to advise the MRI other than CT scan for paraspinal muscles measurements of FCSA and fatty infiltration.
Baumgart, Christian; Polglaze, Ted; Freiwald, Jürgen
2018-01-01
This study aimed to investigate the validity and reliability of global (GPS) and local (LPS) positioning systems for measuring distances covered and sprint mechanical properties in team sports. Here, we evaluated two recently released 18 Hz GPS and 20 Hz LPS technologies together with one established 10 Hz GPS technology. Six male athletes (age: 27±2 years; VO2max: 48.8±4.7 ml/min/kg) performed outdoors on 10 trials of a team sport-specific circuit that was equipped with double-light timing gates. The circuit included various walking, jogging, and sprinting sections that were performed either in straight-lines or with changes of direction. During the circuit, athletes wore two devices of each positioning system. From the reported and filtered velocity data, the distances covered and sprint mechanical properties (i.e., the theoretical maximal horizontal velocity, force, and power output) were computed. The sprint mechanical properties were modeled via an inverse dynamic approach applied to the center of mass. The validity was determined by comparing the measured and criterion data via the typical error of estimate (TEE), whereas the reliability was examined by comparing the two devices of each technology (i.e., the between-device reliability) via the coefficient of variation (CV). Outliers due to measurement errors were statistically identified and excluded from validity and reliability analyses. The 18 Hz GPS showed better validity and reliability for determining the distances covered (TEE: 1.6–8.0%; CV: 1.1–5.1%) and sprint mechanical properties (TEE: 4.5–14.3%; CV: 3.1–7.5%) than the 10 Hz GPS (TEE: 3.0–12.9%; CV: 2.5–13.0% and TEE: 4.1–23.1%; CV: 3.3–20.0%). However, the 20 Hz LPS demonstrated superior validity and reliability overall (TEE: 1.0–6.0%; CV: 0.7–5.0% and TEE: 2.1–9.2%; CV: 1.6–7.3%). For the 10 Hz GPS, 18 Hz GPS, and 20 Hz LPS, the relative loss of data sets due to measurement errors was 10.0%, 20.0%, and 15.8%, respectively. This study shows that 18 Hz GPS has enhanced validity and reliability for determining movement patterns in team sports compared to 10 Hz GPS, whereas 20 Hz LPS had superior validity and reliability overall. However, compared to 10 Hz GPS, 18 Hz GPS and 20 Hz LPS technologies had more outliers due to measurement errors, which limits their practical applications at this time. PMID:29420620
A Test Reliability Analysis of an Abbreviated Version of the Pupil Control Ideology Form.
ERIC Educational Resources Information Center
Gaffney, Patrick V.
A reliability analysis was conducted of an abbreviated, 10-item version of the Pupil Control Ideology Form (PCI), using the Cronbach's alpha technique (L. J. Cronbach, 1951) and the computation of the standard error of measurement. The PCI measures a teacher's orientation toward pupil control. Subjects were 168 preservice teachers from one private…
ERIC Educational Resources Information Center
Stevens, Christopher John; Dascombe, Ben James
2015-01-01
Sports performance testing is one of the most common and important measures used in sport science. Performance testing protocols must have high reliability to ensure any changes are not due to measurement error or inter-individual differences. High validity is also important to ensure test performance reflects true performance. Time-trial…
MEASUREMENT: ACCOUNTING FOR RELIABILITY IN PERFORMANCE ESTIMATES.
Waterman, Brian; Sutter, Robert; Burroughs, Thomas; Dunagan, W Claiborne
2014-01-01
When evaluating physician performance measures, physician leaders are faced with the quandary of determining whether departures from expected physician performance measurements represent a true signal or random error. This uncertainty impedes the physician leader's ability and confidence to take appropriate performance improvement actions based on physician performance measurements. Incorporating reliability adjustment into physician performance measurement is a valuable way of reducing the impact of random error in the measurements, such as those caused by small sample sizes. Consequently, the physician executive has more confidence that the results represent true performance and is positioned to make better physician performance improvement decisions. Applying reliability adjustment to physician-level performance data is relatively new. As others have noted previously, it's important to keep in mind that reliability adjustment adds significant complexity to the production, interpretation and utilization of results. Furthermore, the methods explored in this case study only scratch the surface of the range of available Bayesian methods that can be used for reliability adjustment; further study is needed to test and compare these methods in practice and to examine important extensions for handling specialty-specific concerns (e.g., average case volumes, which have been shown to be important in cardiac surgery outcomes). Moreover, it's important to note that the provider group average as a basis for shrinkage is one of several possible choices that could be employed in practice and deserves further exploration in future research. With these caveats, our results demonstrate that incorporating reliability adjustment into physician performance measurements is feasible and can notably reduce the incidence of "real" signals relative to what one would expect to see using more traditional approaches. A physician leader who is interested in catalyzing performance improvement through focused, effective physician performance improvement is well advised to consider the value of incorporating reliability adjustments into their performance measurement system.
Reliability of Measurements Performed by Community-Drawn Anthropometrists from Rural Ethiopia
Ayele, Berhan; Aemere, Abaineh; Gebre, Teshome; Tadesse, Zerihun; Stoller, Nicole E.; See, Craig W.; Yu, Sun N.; Gaynor, Bruce D.; McCulloch, Charles E.; Porco, Travis C.; Emerson, Paul M.; Lietman, Thomas M.; Keenan, Jeremy D.
2012-01-01
Background Undernutrition is an important risk factor for childhood mortality, and remains a major problem facing many developing countries. Millennium Development Goal 1 calls for a reduction in underweight children, implemented through a variety of interventions. To adequately judge the impact of these interventions, it is important to know the reproducibility of the main indicators for undernutrition. In this study, we trained individuals from rural communities in Ethiopia in anthropometry techniques and measured intra- and inter-observer reliability. Methods and Findings We trained 6 individuals without prior anthropometry experience to perform weight, height, and middle upper arm circumference (MUAC) measurements. Two anthropometry teams were dispatched to 18 communities in rural Ethiopia and measurements performed on all consenting pre-school children. Anthropometry teams performed a second independent measurement on a convenience sample of children in order to assess intra-anthropometrist reliability. Both teams measured the same children in 2 villages to assess inter-anthropometrist reliability. We calculated several metrics of measurement reproducibility, including the technical error of measurement (TEM) and relative TEM. In total, anthropometry teams performed measurements on 606 pre-school children, 84 of which had repeat measurements performed by the same team, and 89 of which had measurements performed by both teams. Intra-anthropometrist TEM (and relative TEM) were 0.35 cm (0.35%) for height, 0.05 kg (0.39%) for weight, and 0.18 cm (1.27%) for MUAC. Corresponding values for inter-anthropometrist reliability were 0.67 cm (0.75%) for height, 0.09 kg (0.79%) for weight, and 0.22 kg (1.53%) for MUAC. Inter-anthropometrist measurement error was greater for smaller children than for larger children. Conclusion Measurements of height and weight were more reproducible than measurements of MUAC and measurements of larger children were more reliable than those for smaller children. Community-drawn anthropometrists can provide reliable measurements that could be used to assess the impact of interventions for childhood undernutrition. PMID:22291939
Brogårdh, Christina; Flansbjer, Ulla-Britt; Carlsson, Håkan; Lexell, Jan
2015-10-01
Muscle weakness in the upper limb is common in persons with late effects of polio. To be able to measure muscle strength and follow changes over time, reliable measurements are needed. To evaluate the intra-rater reliability of isometric and isokinetic arm and hand muscle strength measurements in persons with late effects of polio. A test-retest design. A university hospital outpatient clinic. Twenty-eight persons (mean age 68 years, SD 11 years) with late effects of polio in their upper limbs. Isometric shoulder abduction, isokinetic concentric elbow flexion and extension, isometric elbow flexion, and isometric grip strength were measured twice, 14 days apart. Reliability was evaluated with the intra-class correlation coefficient, the mean difference between the test sessions (d¯), together with the 95% confidence intervals for d¯ , the standard error of measurement (SEM and SEM%), the smallest real difference (SRD and SRD%), and Bland-Altman graphs. A fixed dynamometer (Biodex) was used to measure arm strength and an electronic dynamometer (GRIP-it) was used to measure grip strength. Intra-rater reliability was high, with intra-class correlation coefficients between 0.87 and 0.98. The SEM%, representing the smallest change for a group of persons, ranged from 7%-24% for all strength measurements, and the SRD%, representing the smallest change for an individual person, ranged from 20%-67%. Muscle strength in the upper limbs can be reliably measured in persons with late effects of polio. However, the measurement errors indicate that the method is more suitable to detect changes in muscle strength for a group of persons than for an individual person. Copyright © 2015 American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
Ekstrand, Elisabeth; Lexell, Jan; Brogårdh, Christina
2015-09-01
To evaluate the test-retest reliability of isometric and isokinetic muscle strength measurements in the upper extremity after stroke. A test-retest design. Forty-five persons with mild to moderate paresis in the upper extremity > 6 months post-stroke. Isometric arm strength (shoulder abduction, elbow flexion), isokinetic arm strength (elbow extension/flexion) and isometric grip strength were measured with electronic dynamometers. Reliability was evaluated with intra-class correlation coefficients (ICC), changes in the mean, standard error of measurements (SEM) and smallest real differences (SRD). Reliability was high (ICCs: 0.92-0.97). The absolute and relative (%) SEM ranged from 2.7 Nm (5.6%) to 3.0 Nm (9.4%) for isometric arm strength, 2.6 Nm (7.4%) to 2.9 Nm (12.6%) for isokinetic arm strength, and 22.3 N (7.6%) to 26.4 N (9.2%) for grip strength. The absolute and relative (%) SRD ranged from 7.5 Nm (15.5%) to 8.4 Nm (26.1%) for isometric arm strength, 7.1 Nm (20.6%) to 8.0 Nm (34.8%) for isokinetic arm strength, and 61.8 N (21.0%) to 73.3 N (25.6%) for grip strength. Muscle strength in the upper extremity can be reliably measured in persons with chronic stroke. Isometric measurements yield smaller measurement errors than isokinetic measurements and might be preferred, but the choice depends on the research question.
Use of the smartphone for end vertebra selection in scoliosis.
Pepe, Murad; Kocadal, Onur; Iyigun, Abdullah; Gunes, Zafer; Aksahin, Ertugrul; Aktekin, Cem Nuri
2017-03-01
The aim of our study was to develop a smartphone-aided end vertebra selection method and to investigate its effectiveness in Cobb angle measurement. Twenty-nine adolescent idiopathic scoliosis patients' pre-operative posteroanterior scoliosis radiographs were used for end vertebra selection and Cobb angle measurement by standard method and smartphone-aided method. Measurements were performed by 7 examiners. The intraclass correlation coefficient was used to analyze selection and measurement reliability. Summary statistics of variance calculations were used to provide 95% prediction limits for the error in Cobb angle measurements. A paired 2-tailed t test was used to analyze end vertebra selection differences. Mean absolute Cobb angle difference was 3.6° for the manual method and 1.9° for the smartphone-aided method. Both intraobserver and interobserver reliability were found excellent in manual and smartphone set for Cobb angle measurement. Both intraobserver and interobserver reliability were found excellent in manual and smartphone set for end vertebra selection. But reliability values of manual set were lower than smartphone. Two observers selected significantly different end vertebra in their repeated selections for manual method. Smartphone-aided method for end vertebra selection and Cobb angle measurement showed excellent reliability. We can expect a reduction in measurement error rates with the widespread use of this method in clinical practice. Level III, Diagnostic study. Copyright © 2016 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.
Reliability and concurrent validity of Futrex and bioelectrical impedance.
Vehrs, P; Morrow, J R; Butte, N
1998-11-01
Thirty Caucasian males (aged 19-32yr) participated in this study designed to investigate the reliability of multiple bioelectrical impedance analysis (BIA) and near-infrared spectroscopy (Futrex, FTX) measurements and the validity of BIA and FTX estimations of hydrostatically (UW) determined percent body fat (%BF). Two BIA and two FTX instruments were used to make 6 measurements each of resistance (R) and optical density (OD) respectively over a 30 min period on two consecutive days. Repeated measures ANOVA indicated that FTX and BIA, using manufacturer's equations, significantly (p<0.01) under predicted UW by 2.4 and 3.8%BF respectively. Standard error of estimate (SEE) and total error (TE) terms provided by regression analysis for FTX (4.6 and 5.31%BF respectively) and BIA (5.65 and 6.95%BF, respectively) were high. Dependent t-tests revealed no significant differences in either FTX or BIA predictions of %BF using two machines. Intraclass reliabilities for BIA and FTX estimates of UW %BF across trials, days, and machines all exceeded 0.97. A significant random error term associated with FTX and a significant subject-by-day interaction associated with BIA was revealed using the generalizability model. Although FTX and BIA estimates of UW %BF were reliable, due to the significant underestimation of UW %BF and high SEE and TE, neither FTX nor BIA were considered valid estimates of hydrostatically determined %BF.
Validity and Reliability of a New Device (WIMU®) for Measuring Hamstring Muscle Extensibility.
Muyor, José M
2017-09-01
The aims of the current study were 1) to evaluate the validity of the WIMU ® system for measuring hamstring muscle extensibility in the passive straight leg raise (PSLR) test using an inclinometer for the criterion and 2) to determine the test-retest reliability of the WIMU ® system to measure hamstring muscle extensibility during the PSLR test. 55 subjects were evaluated on 2 separate occasions. Data from a Unilever inclinometer and WIMU ® system were collected simultaneously. Intraclass correlation coefficients (ICCs) for the validity were very high (0.983-1); a very low systematic bias (-0.21°--0.42°), random error (0.05°-0.04°) and standard error of the estimate (0.43°-0.34°) were observed (left-right leg, respectively) between the 2 devices (inclinometer and the WIMU ® system). The R 2 between the devices was 0.999 (p<0.001) in both the left and right legs. The test-retest reliability of the WIMU ® system was excellent, with ICCs ranging from 0.972-0.995, low coefficients of variation (0.01%), and a low standard error of the estimate (0.19-0.31°). The WIMU ® system showed strong concurrent validity and excellent test-retest reliability for the evaluation of hamstring muscle extensibility in the PSLR test. © Georg Thieme Verlag KG Stuttgart · New York.
Ort, Rebecca; Metzler, Philipp; Kruse, Astrid L.; Matthews, Felix; Zemann, Wolfgang; Grätz, Klaus W.; Luebbers, Heinz-Theo
2012-01-01
Ample data exists about the high precision of three-dimensional (3D) scanning devices and their data acquisition of the facial surface. However, a question remains regarding which facial landmarks are reliable if identified in 3D images taken under clinical circumstances. Sources of error to be addressed could be technical, user dependent, or patient respectively anatomy related. Based on clinical 3D photos taken with the 3dMDface system, the intra observer repeatability of 27 facial landmarks in six cleft lip (CL) infants and one non-CL infant was evaluated based on a total of over 1,100 measurements. Data acquisition was sometimes challenging but successful in all patients. The mean error was 0.86 mm, with a range of 0.39 mm (Exocanthion) to 2.21 mm (soft gonion). Typically, landmarks provided a small mean error but still showed quite a high variance in measurements, for example, exocanthion from 0.04 mm to 0.93 mm. Vice versa, relatively imprecise landmarks still provide accurate data regarding specific spatial planes. One must be aware of the fact that the degree of precision is dependent on landmarks and spatial planes in question. In clinical investigations, the degree of reliability for landmarks evaluated should be taken into account. Additional reliability can be achieved via multiple measuring. PMID:22919476
Furlan, Leonardo; Sterr, Annette
2018-01-01
Motor learning studies face the challenge of differentiating between real changes in performance and random measurement error. While the traditional p -value-based analyses of difference (e.g., t -tests, ANOVAs) provide information on the statistical significance of a reported change in performance scores, they do not inform as to the likely cause or origin of that change, that is, the contribution of both real modifications in performance and random measurement error to the reported change. One way of differentiating between real change and random measurement error is through the utilization of the statistics of standard error of measurement (SEM) and minimal detectable change (MDC). SEM is estimated from the standard deviation of a sample of scores at baseline and a test-retest reliability index of the measurement instrument or test employed. MDC, in turn, is estimated from SEM and a degree of confidence, usually 95%. The MDC value might be regarded as the minimum amount of change that needs to be observed for it to be considered a real change, or a change to which the contribution of real modifications in performance is likely to be greater than that of random measurement error. A computer-based motor task was designed to illustrate the applicability of SEM and MDC to motor learning research. Two studies were conducted with healthy participants. Study 1 assessed the test-retest reliability of the task and Study 2 consisted in a typical motor learning study, where participants practiced the task for five consecutive days. In Study 2, the data were analyzed with a traditional p -value-based analysis of difference (ANOVA) and also with SEM and MDC. The findings showed good test-retest reliability for the task and that the p -value-based analysis alone identified statistically significant improvements in performance over time even when the observed changes could in fact have been smaller than the MDC and thereby caused mostly by random measurement error, as opposed to by learning. We suggest therefore that motor learning studies could complement their p -value-based analyses of difference with statistics such as SEM and MDC in order to inform as to the likely cause or origin of any reported changes in performance.
NASA Astrophysics Data System (ADS)
Zheng, Yuejiu; Ouyang, Minggao; Han, Xuebing; Lu, Languang; Li, Jianqiu
2018-02-01
Sate of charge (SOC) estimation is generally acknowledged as one of the most important functions in battery management system for lithium-ion batteries in new energy vehicles. Though every effort is made for various online SOC estimation methods to reliably increase the estimation accuracy as much as possible within the limited on-chip resources, little literature discusses the error sources for those SOC estimation methods. This paper firstly reviews the commonly studied SOC estimation methods from a conventional classification. A novel perspective focusing on the error analysis of the SOC estimation methods is proposed. SOC estimation methods are analyzed from the views of the measured values, models, algorithms and state parameters. Subsequently, the error flow charts are proposed to analyze the error sources from the signal measurement to the models and algorithms for the widely used online SOC estimation methods in new energy vehicles. Finally, with the consideration of the working conditions, choosing more reliable and applicable SOC estimation methods is discussed, and the future development of the promising online SOC estimation methods is suggested.
ERIC Educational Resources Information Center
Wang, Tianyou; And Others
M. J. Kolen, B. A. Hanson, and R. L. Brennan (1992) presented a procedure for assessing the conditional standard error of measurement (CSEM) of scale scores using a strong true-score model. They also investigated the ways of using nonlinear transformation from number-correct raw score to scale score to equalize the conditional standard error along…
Arifin, Nooranida; Abu Osman, Noor Azuan; Wan Abas, Wan Abu Bakar
2014-04-01
The measurements of postural balance often involve measurement error, which affects the analysis and interpretation of the outcomes. In most of the existing clinical rehabilitation research, the ability to produce reliable measures is a prerequisite for an accurate assessment of an intervention after a period of time. Although clinical balance assessment has been performed in previous study, none has determined the intrarater test-retest reliability of static and dynamic stability indexes during dominant single stance. In this study, one rater examined 20 healthy university students (female=12, male=8) in two sessions separated by 7 day intervals. Three stability indexes--the overall stability index (OSI), anterior/posterior stability index (APSI), and medial/ lateral stability index (MLSI) in static and dynamic conditions--were measured during single dominant stance. Intraclass correlation coefficient (ICC), standard error measurement (SEM) and 95% confidence interval (95% CI) were calculated. Test-retest ICCs for OSI, APSI, and MLSI were 0.85, 0.78, and 0.84 during static condition and were 0.77, 0.77, and 0.65 during dynamic condition, respectively. We concluded that the postural stability assessment using Biodex stability system demonstrates good-to-excellent test-retest reliability over a 1 week time interval.
NASA Astrophysics Data System (ADS)
Nair, S. P.; Righetti, R.
2015-05-01
Recent elastography techniques focus on imaging information on properties of materials which can be modeled as viscoelastic or poroelastic. These techniques often require the fitting of temporal strain data, acquired from either a creep or stress-relaxation experiment to a mathematical model using least square error (LSE) parameter estimation. It is known that the strain versus time relationships for tissues undergoing creep compression have a non-linear relationship. In non-linear cases, devising a measure of estimate reliability can be challenging. In this article, we have developed and tested a method to provide non linear LSE parameter estimate reliability: which we called Resimulation of Noise (RoN). RoN provides a measure of reliability by estimating the spread of parameter estimates from a single experiment realization. We have tested RoN specifically for the case of axial strain time constant parameter estimation in poroelastic media. Our tests show that the RoN estimated precision has a linear relationship to the actual precision of the LSE estimator. We have also compared results from the RoN derived measure of reliability against a commonly used reliability measure: the correlation coefficient (CorrCoeff). Our results show that CorrCoeff is a poor measure of estimate reliability for non-linear LSE parameter estimation. While the RoN is specifically tested only for axial strain time constant imaging, a general algorithm is provided for use in all LSE parameter estimation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carlson, J.J.; Bouchard, A.M.; Osbourn, G.C.
Future generation automated human biometric identification and verification will require multiple features/sensors together with internal and external information sources to achieve high performance, accuracy, and reliability in uncontrolled environments. The primary objective of the proposed research is to develop a theoretical and practical basis for identifying and verifying people using standoff biometric features that can be obtained with minimal inconvenience during the verification process. The basic problem involves selecting sensors and discovering features that provide sufficient information to reliably verify a person`s identity under the uncertainties caused by measurement errors and tactics of uncooperative subjects. A system was developed formore » discovering hand, face, ear, and voice features and fusing them to verify the identity of people. The system obtains its robustness and reliability by fusing many coarse and easily measured features into a near minimal probability of error decision algorithm.« less
On the Use, the Misuse, and the Very Limited Usefulness of Cronbach's Alpha
ERIC Educational Resources Information Center
Sijtsma, Klaas
2009-01-01
This discussion paper argues that both the use of Cronbach's alpha as a reliability estimate and as a measure of internal consistency suffer from major problems. First, alpha always has a value, which cannot be equal to the test score's reliability given the inter-item covariance matrix and the usual assumptions about measurement error. Second, in…
Validity and test-retest reliability of the six-spot step test in persons after stroke.
Arvidsson Lindvall, Mialinn; Anderzén-Carlsson, Agneta; Appelros, Peter; Forsberg, Anette
2018-06-06
After stroke, asymmetric weight distribution is common with decreased balance control in standing and walking. The six-spot step test (SSST) includes a 5-m walk during which one leg shoves wooden blocks out of circles marked on the floor, thus assessing the ability to take load on each leg. The aim of the present study was to investigate the convergent and discriminant validity and test-retest reliability of the SSST in persons with stroke. Eighty-one participants were included. A cross-sectional study was performed, in which the SSST was conducted twice, 3-7 days apart. Validity was investigated using measures of dynamic balance and walking. Reliability was assessed using intraclass correlation coefficient, standard error of the measurement (SEM), and smallest real difference (SRD). The convergent validity was strong to moderate, and the test-retest reliability was good. The SEM% was 14.7%, and the SRD% was 40.8% based on the mean of four walks shoving twice with the paretic and twice with the non-paretic leg. Values on random measurement error were high affecting the use of the SSST for follow-up evaluations but the SSST can be a complementary measure of gait and balance.
Test Reliability at the Individual Level
Hu, Yueqin; Nesselroade, John R.; Erbacher, Monica K.; Boker, Steven M.; Burt, S. Alexandra; Keel, Pamela K.; Neale, Michael C.; Sisk, Cheryl L.; Klump, Kelly
2016-01-01
Reliability has a long history as one of the key psychometric properties of a test. However, a given test might not measure people equally reliably. Test scores from some individuals may have considerably greater error than others. This study proposed two approaches using intraindividual variation to estimate test reliability for each person. A simulation study suggested that the parallel tests approach and the structural equation modeling approach recovered the simulated reliability coefficients. Then in an empirical study, where forty-five females were measured daily on the Positive and Negative Affect Schedule (PANAS) for 45 consecutive days, separate estimates of reliability were generated for each person. Results showed that reliability estimates of the PANAS varied substantially from person to person. The methods provided in this article apply to tests measuring changeable attributes and require repeated measures across time on each individual. This article also provides a set of parallel forms of PANAS. PMID:28936107
Weafer, Jessica; Baggott, Matthew J; de Wit, Harriet
2013-12-01
Behavioral measures of impulsivity are widely used in substance abuse research, yet relatively little attention has been devoted to establishing their psychometric properties, especially their reliability over repeated administration. The current study examined the test-retest reliability of a battery of standardized behavioral impulsivity tasks, including measures of impulsive choice (i.e., delay discounting, probability discounting, and the Balloon Analogue Risk Task), impulsive action (i.e., the stop signal task, the go/no-go task, and commission errors on the continuous performance task), and inattention (i.e., attention lapses on a simple reaction time task and omission errors on the continuous performance task). Healthy adults (n = 128) performed the battery on two separate occasions. Reliability estimates for the individual tasks ranged from moderate to high, with Pearson correlations within the specific impulsivity domains as follows: impulsive choice (r range: .76-.89, ps < .001); impulsive action (r range: .65-.73, ps < .001); and inattention (r range: .38-.42, ps < .001). Additionally, the influence of day-to-day fluctuations in mood, as measured by the Profile of Mood States, was assessed in relation to variability in performance on each of the behavioral tasks. Change in performance on the delay discounting task was significantly associated with change in positive mood and arousal. No other behavioral measures were significantly associated with mood. In sum, the current analysis demonstrates that behavioral measures of impulsivity are reliable measures and thus can be confidently used to assess various facets of impulsivity as intermediate phenotypes for drug abuse.
Weafer, Jessica; Baggott, Matthew J.; de Wit, Harriet
2014-01-01
Behavioral measures of impulsivity are widely used in substance abuse research, yet relatively little attention has been devoted to establishing their psychometric properties, especially their reliability over repeated administration. The current study examined the test-retest reliability of a battery of standardized behavioral impulsivity tasks, including measures of impulsive choice (delay discounting, probability discounting, and the Balloon Analogue Risk Task), impulsive action (the stop signal task, the go/no-go task, and commission errors on the continuous performance task), and inattention (attention lapses on a simple reaction time task and omission errors on the continuous performance task). Healthy adults (n=128) performed the battery on two separate occasions. Reliability estimates for the individual tasks ranged from moderate to high, with Pearson correlations within the specific impulsivity domains as follows: impulsive choice (r = .76 - .89, ps < .001); impulsive action (r = .65 - .73, ps < .001); and inattention (r = .38-.42, ps < .001). Additionally, the influence of day-to-day fluctuations in mood as measured by the Profile of Mood States was assessed in relation to variability in performance on each of the behavioral tasks. Change in performance on the delay discounting task was significantly associated with change in positive mood and arousal. No other behavioral measures were significantly associated with mood. In sum, the current analysis demonstrates that behavioral measures of impulsivity are reliable measures and thus can be confidently used to assess various facets of impulsivity as intermediate phenotypes for drug abuse. PMID:24099351
Hashmi, Farina; Wright, Ciaran; Nester, Christopher; Lam, Sharon
2015-01-01
Hyperkeratosis of foot skin is a common skin problem affecting people of different ages. The clinical presentation of this condition can range from dry flaky skin, which can lead to fissures, to hard callused skin which is often painful and debilitating. The purpose of this study was to test the reliability of certain non-invasive skin measurement devices on foot skin in normal and hyperkeratotic states, with a view to confirming their use as quantitative outcome measures in future clinical trials. Twelve healthy adult participants with a range of foot skin conditions (xerotic skin, heel fissures and plantar calluses) were recruited to the study. Measurements of normal and hyperkeratotic skin sites were taken using the following devices: Corneometer® CM 825, Cutometer® 580 MPA, Reviscometer® RVM 600, Visioline® VL 650 Quantiride® and Visioscan® VC 98, by two investigators on two consecutive days. The intra and inter rater reliability and standard error of measurement for each device was calculated. The data revealed the majority of the devices to be reliable measurement tools for normal and hyperkeratotic foot skin (ICC values > 0.6). The surface evaluation parameters for skin: SEsc and SEsm have greater reliability compared to the SEr measure. The Cutometer® is sensitive to soft tissue movement within the probe, therefore measurement of plantar soft tissue areas should be approached with caution. Reviscometer® measures on callused skin demonstrated an unusually high degree of error. These results confirm the intra and inter rater reliability of the Corneometer®, Cutometer®, Visioline® and Visioscan® in quantifying specific foot skin biophysical properties.
Reliability of System Identification Techniques to Assess Standing Balance in Healthy Elderly
Maier, Andrea B.; Aarts, Ronald G. K. M.; van Gerven, Joop M. A.; Arendzen, J. Hans; Schouten, Alfred C.; Meskers, Carel G. M.; van der Kooij, Herman
2016-01-01
Objectives System identification techniques have the potential to assess the contribution of the underlying systems involved in standing balance by applying well-known disturbances. We investigated the reliability of standing balance parameters obtained with multivariate closed loop system identification techniques. Methods In twelve healthy elderly balance tests were performed twice a day during three days. Body sway was measured during two minutes of standing with eyes closed and the Balance test Room (BalRoom) was used to apply four disturbances simultaneously: two sensory disturbances, to the proprioceptive and the visual system, and two mechanical disturbances applied at the leg and trunk segment. Using system identification techniques, sensitivity functions of the sensory disturbances and the neuromuscular controller were estimated. Based on the generalizability theory (G theory), systematic errors and sources of variability were assessed using linear mixed models and reliability was assessed by computing indexes of dependability (ID), standard error of measurement (SEM) and minimal detectable change (MDC). Results A systematic error was found between the first and second trial in the sensitivity functions. No systematic error was found in the neuromuscular controller and body sway. The reliability of 15 of 25 parameters and body sway were moderate to excellent when the results of two trials on three days were averaged. To reach an excellent reliability on one day in 7 out of 25 parameters, it was predicted that at least seven trials must be averaged. Conclusion This study shows that system identification techniques are a promising method to assess the underlying systems involved in standing balance in elderly. However, most of the parameters do not appear to be reliable unless a large number of trials are collected across multiple days. To reach an excellent reliability in one third of the parameters, a training session for participants is needed and at least seven trials of two minutes must be performed on one day. PMID:26953694
Use of units of measurement error in anthropometric comparisons.
Lucas, Teghan; Henneberg, Maciej
2017-09-01
Anthropometrists attempt to minimise measurement errors, however, errors cannot be eliminated entirely. Currently, measurement errors are simply reported. Measurement errors should be included into analyses of anthropometric data. This study proposes a method which incorporates measurement errors into reported values, replacing metric units with 'units of technical error of measurement (TEM)' by applying these to forensics, industrial anthropometry and biological variation. The USA armed forces anthropometric survey (ANSUR) contains 132 anthropometric dimensions of 3982 individuals. Concepts of duplication and Euclidean distance calculations were applied to the forensic-style identification of individuals in this survey. The National Size and Shape Survey of Australia contains 65 anthropometric measurements of 1265 women. This sample was used to show how a woman's body measurements expressed in TEM could be 'matched' to standard clothing sizes. Euclidean distances show that two sets of repeated anthropometric measurements of the same person cannot be matched (> 0) on measurements expressed in millimetres but can in units of TEM (= 0). Only 81 women can fit into any standard clothing size when matched using centimetres, with units of TEM, 1944 women fit. The proposed method can be applied to all fields that use anthropometry. Units of TEM are considered a more reliable unit of measurement for comparisons.
Poor Reliability of Wrist Blood Pressure Self-Measurement at Home: A Population-Based Study.
Casiglia, Edoardo; Tikhonoff, Valérie; Albertini, Federica; Palatini, Paolo
2016-10-01
The reliability of blood pressure measurement with wrist devices, which has not previously been assessed under real-life circumstances in general population, is dependent on correct positioning of the wrist device at heart level. We determined whether an error was present when blood pressure was self-measured at the wrist in 721 unselected subjects from the general population. After training, blood pressure was measured in the office and self-measured at home with an upper-arm device (the UA-767 Plus) and a wrist device (the UB-542, not provided with a position sensor). The upper-arm-wrist blood pressure difference detected in the office was used as the reference measurement. The discrepancy between office and home differences was the home measurement error. In the office, systolic blood pressure was 2.5% lower at wrist than at arm (P=0.002), whereas at home, systolic and diastolic blood pressures were higher at wrist than at arm (+5.6% and +5.4%, respectively; P<0.0001 for both); 621 subjects had home measurement error of at least ±5 mm Hg and 455 of at least ±10 mm Hg (bad measurers). In multivariable linear regression, a lower cognitive pattern independently determined both the systolic and the diastolic home measurement error and a longer forearm the systolic error only. This was confirmed by logistic regression having bad measurers as dependent variable. The use of wrist devices for home self-measurement, therefore, leads to frequent detection of falsely elevated blood pressure values likely because of a poor memory and rendition of the instructions, leading to the wrong position of the wrist. © 2016 American Heart Association, Inc.
Elliot, Catherine A; Hamlin, Michael J; Lizamore, Catherine A
2017-07-28
The purpose of this study was to investigate the validity and reliability of the Hexoskin® vest for measuring respiration and heart rate (HR) in elite cyclists during a progressive test to exhaustion. Ten male elite cyclists (age 28.8 ± 12.5 yr, height 179.3 ± 6.0 cm, weight 73.2 ± 9.1 kg, V˙ O2max 60.7 ± 7.8 ml.kg.min mean ± SD) conducted a maximal aerobic cycle ergometer test using a ramped protocol (starting at 100W with 25W increments each min to failure) during two separate occasions over a 3-4 day period. Compared to the criterion measure (Metamax 3B) the Hexoskin® vest showed mainly small typical errors (1.3-6.2%) for HR and breathing frequency (f), but larger typical errors (9.5-19.6%) for minute ventilation (V˙E) during the progressive test to exhaustion. The typical error indicating the reliability of the Hexoskin® vest at moderate intensity exercise between tests was small for HR (2.6-2.9%) and f (2.5-3.2%) but slightly larger for V˙E (5.3-7.9%). We conclude that the Hexoskin® vest is sufficiently valid and reliable for measurements of HR and f in elite athletes during high intensity cycling but the calculated V˙E value the Hexoskin® vest produces during such exercise should be used with caution due to the lower validity and reliability of this variable.
Pazira, Parvin; Rostami Haji-Abadi, Mahdi; Zolaktaf, Vahid; Sabahi, Mohammadfarzan; Pazira, Toomaj
2016-06-08
In relation to statistical analysis, studies to determine the validity, reliability, objectivity and precision of new measuring devices are usually incomplete, due in part to using only correlation coefficient and ignoring the data dispersion. The aim of this study was to demonstrate the best way to determine the validity, reliability, objectivity and accuracy of an electro-inclinometer or other measuring devices. Another purpose of this study is to answer the question of whether reliability and objectivity represent accuracy of measuring devices. The validity of an electro-inclinometer was examined by mechanical and geometric methods. The objectivity and reliability of the device was assessed by calculating Cronbach's alpha for repeated measurements by three raters and by measurements on the same person by mechanical goniometer and the electro-inclinometer. Measurements were performed on "hip flexion with the extended knee" and "shoulder abduction with the extended elbow." The raters measured every angle three times within an interval of two hours. The three-way ANOVA was used to determine accuracy. The results of mechanical and geometric analysis showed that validity of the electro-inclinometer was 1.00 and level of error was less than one degree. Objectivity and reliability of electro-inclinometer was 0.999, while objectivity of mechanical goniometer was in the range of 0.802 to 0.966 and the reliability was 0.760 to 0.961. For hip flexion, the difference between raters in joints angle measurement by electro-inclinometer and mechanical goniometer was 1.74 and 16.33 degree (P<0.05), respectively. The differences for shoulder abduction measurement by electro-inclinometer and goniometer were 0.35 and 4.40 degree (P<0.05). Although both the objectivity and reliability are acceptable, the results showed that measurement error was very high in the mechanical goniometer. Therefore, it can be concluded that objectivity and reliability alone cannot determine the accuracy of a device and it is preferable to use other statistical methods to compare and evaluate the accuracy of these two devices.
Error quantification of osteometric data in forensic anthropology.
Langley, Natalie R; Meadows Jantz, Lee; McNulty, Shauna; Maijanen, Heli; Ousley, Stephen D; Jantz, Richard L
2018-06-01
This study evaluates the reliability of osteometric data commonly used in forensic case analyses, with specific reference to the measurements in Data Collection Procedures 2.0 (DCP 2.0). Four observers took a set of 99 measurements four times on a sample of 50 skeletons (each measurement was taken 200 times by each observer). Two-way mixed ANOVAs and repeated measures ANOVAs with pairwise comparisons were used to examine interobserver (between-subjects) and intraobserver (within-subjects) variability. Relative technical error of measurement (TEM) was calculated for measurements with significant ANOVA results to examine the error among a single observer repeating a measurement multiple times (e.g. repeatability or intraobserver error), as well as the variability between multiple observers (interobserver error). Two general trends emerged from these analyses: (1) maximum lengths and breadths have the lowest error across the board (TEM<0.5), and (2) maximum and minimum diameters at midshaft are more reliable than their positionally-dependent counterparts (i.e. sagittal, vertical, transverse, dorso-volar). Therefore, maxima and minima are specified for all midshaft measurements in DCP 2.0. Twenty-two measurements were flagged for excessive variability (either interobserver, intraobserver, or both); 15 of these measurements were part of the standard set of measurements in Data Collection Procedures for Forensic Skeletal Material, 3rd edition. Each measurement was examined carefully to determine the likely source of the error (e.g. data input, instrumentation, observer's method, or measurement definition). For several measurements (e.g. anterior sacral breadth, distal epiphyseal breadth of the tibia) only one observer differed significantly from the remaining observers, indicating a likely problem with the measurement definition as interpreted by that observer; these definitions were clarified in DCP 2.0 to eliminate this confusion. Other measurements were taken from landmarks that are difficult to locate consistently (e.g. pubis length, ischium length); these measurements were omitted from DCP 2.0. This manual is available for free download online (https://fac.utk.edu/wp-content/uploads/2016/03/DCP20_webversion.pdf), along with an accompanying instructional video (https://www.youtube.com/watch?v=BtkLFl3vim4). Copyright © 2018 Elsevier B.V. All rights reserved.
Hill, B.R.; DeCarlo, E.H.; Fuller, C.C.; Wong, M.F.
1998-01-01
Reliable estimates of sediment-budget errors are important for interpreting sediment-budget results. Sediment-budget errors are commonly considered equal to sediment-budget imbalances, which may underestimate actual sediment-budget errors if they include compensating positive and negative errors. We modified the sediment 'fingerprinting' approach to qualitatively evaluate compensating errors in an annual (1991) fine (<63 ??m) sediment budget for the North Halawa Valley, a mountainous, forested drainage basin on the island of Oahu, Hawaii, during construction of a major highway. We measured concentrations of aeolian quartz and 137Cs in sediment sources and fluvial sediments, and combined concentrations of these aerosols with the sediment budget to construct aerosol budgets. Aerosol concentrations were independent of the sediment budget, hence aerosol budgets were less likely than sediment budgets to include compensating errors. Differences between sediment-budget and aerosol-budget imbalances therefore provide a measure of compensating errors in the sediment budget. The sediment-budget imbalance equalled 25% of the fluvial fine-sediment load. Aerosol-budget imbalances were equal to 19% of the fluvial 137Cs load and 34% of the fluval quartz load. The reasonably close agreement between sediment- and aerosol-budget imbalances indicates that compensating errors in the sediment budget were not large and that the sediment-budget imbalance as a reliable measure of sediment-budget error. We attribute at least one-third of the 1991 fluvial fine-sediment load to highway construction. Continued monitoring indicated that highway construction produced 90% of the fluvial fine-sediment load during 1992. Erosion of channel margins and attrition of coarse particles provided most of the fine sediment produced by natural processes. Hillslope processes contributed relatively minor amounts of sediment.
Schwertner, Debora Soccal; Oliveira, Raul; Mazo, Giovana Zarpellon; Gioda, Fabiane Rosa; Kelber, Christian Roberto; Swarowsky, Alessandra
2016-05-04
Several posture evaluation devices have been used to detect deviations of the vertebral column. However it has been observed that the instruments present measurement errors related to the equipment, environment or measurement protocol. This study aimed to build, validate, analyze the reliability and describe a measurement protocol for the use of the Posture Evaluation Rotating Platform System (SPGAP, Brazilian abbreviation). The posture evaluation system comprises a Posture Evaluation Rotating Platform, video camera, calibration support and measurement software. Two pilot studies were carried out with 102 elderly individuals (average age 69 years old, SD = ±7.3) to establish a protocol for SPGAP, controlling the measurement errors related to the environment, equipment and the person under evaluation. Content validation was completed with input from judges with expertise in posture measurement. The variation coefficient method was used to validate the measurement by the instrument of an object with known dimensions. Finally, reliability was established using repeated measurements of the known object. Expert content judges gave the system excellent ratings for content validity (mean 9.4 out of 10; SD 1.13). The measurement of an object with known dimensions indicated excellent validity (all measurement errors <1 %) and test-retest reliability. A total of 26 images were needed to stabilize the system. Participants in the pilot studies indicated that they felt comfortable throughout the assessment. The use of only one image can offer measurements that underestimate or overestimate the reality. To verify the images of objects with known dimensions the values for the width and height were, respectively, CV 0.88 (width) and 2.33 (height), SD 0.22 (width) and 0.35 (height), minimum and maximum values 24.83-25.2 (width) and 14.56 - 15.75 (height). In the analysis of different images (similar) of an individual, greater discrepancies were observed in the values found. The cervical index, for example, presented minimum and maximum values of 15.38 and 37.5, a coefficient of variation of 0.29 and a standard deviation of 6.78. The SPGAP was shown to be a valid and reliable instrument for the quantitative analysis of body posture with applicability and clinical use, since it managed to reduce several measurement errors, amongst which parallax distortion.
Shaw, Andrew J; Ingham, Stephen A; Fudge, Barry W; Folland, Jonathan P
2013-12-01
This study assessed the between-test reliability of oxygen cost (OC) and energy cost (EC) in distance runners, and contrasted it with the smallest worthwhile change (SWC) of these measures. OC and EC displayed similar levels of within-subject variation (typical error < 3.85%). However, the typical error (2.75% vs 2.74%) was greater than the SWC (1.38% vs 1.71%) for both OC and EC, respectively, indicating insufficient sensitivity to confidently detect small, but meaningful, changes in OC and EC.
Collado-Mateo, Daniel; Adsuar, Jose C; Olivares, Pedro R; Cano-Plasencia, Ricardo; Gusi, Narcis
2015-01-01
The analysis of brain activity during balance is an important topic in different fields of science. Given that all measurements involve an error that is caused by different agents, like the instrument, the researcher, or the natural human variability, a test-retest reliability evaluation of the electroencephalographic assessment is a needed starting point. However, there is a lack of information about the reliability of electroencephalographic measurements, especially in a new wireless device with dry electrodes. The current study aims to analyze the reliability of electroencephalographic measurements from a wireless device using dry electrodes during two different balance tests. Seventeen healthy male volunteers performed two different static balance tasks on a Biodex Balance Platform: (a) with two feet on the platform and (b) with one foot on the platform. Electroencephalographic data was recorded using Enobio (Neuroelectrics). The mean power spectrum of the alpha band of the central and frontal channels was calculated. Relative and absolute indices of reliability were also calculated. In general terms, the intraclass correlation coefficient (ICC) values of all the assessed channels can be classified as excellent (>0.90). The percentage standard error of measurement oscillated from 0.54% to 1.02% and the percentage smallest real difference ranged from 1.50% to 2.82%. Electroencephalographic assessment through an Enobio device during balance tasks has an excellent reliability. However, its utility was not demonstrated because responsiveness was not assessed.
Revised techniques for estimating peak discharges from channel width in Montana
Parrett, Charles; Hull, J.A.; Omang, R.J.
1987-01-01
This study was conducted to develop new estimating equations based on channel width and the updated flood frequency curves of previous investigations. Simple regression equations for estimating peak discharges with recurrence intervals of 2, 5, 10 , 25, 50, and 100 years were developed for seven regions in Montana. The standard errors of estimates for the equations that use active channel width as the independent variables ranged from 30% to 87%. The standard errors of estimate for the equations that use bankfull width as the independent variable ranged from 34% to 92%. The smallest standard errors generally occurred in the prediction equations for the 2-yr flood, 5-yr flood, and 10-yr flood, and the largest standard errors occurred in the prediction equations for the 100-yr flood. The equations that use active channel width and the equations that use bankfull width were determined to be about equally reliable in five regions. In the West Region, the equations that use bankfull width were slightly more reliable than those based on active channel width, whereas in the East-Central Region the equations that use active channel width were slightly more reliable than those based on bankfull width. Compared with similar equations previously developed, the standard errors of estimate for the new equations are substantially smaller in three regions and substantially larger in two regions. Limitations on the use of the estimating equations include: (1) The equations are based on stable conditions of channel geometry and prevailing water and sediment discharge; (2) The measurement of channel width requires a site visit, preferably by a person with experience in the method, and involves appreciable measurement errors; (3) Reliability of results from the equations for channel widths beyond the range of definition is unknown. In spite of the limitations, the estimating equations derived in this study are considered to be as reliable as estimating equations based on basin and climatic variables. Because the two types of estimating equations are independent, results from each can be weighted inversely proportional to their variances, and averaged. The weighted average estimate has a variance less than either individual estimate. (Author 's abstract)
2011-01-01
Background A clinical study was conducted to determine the intra and inter-rater reliability of digital scanning and the neutral suspension casting technique to measure six foot parameters. The neutral suspension casting technique is a commonly utilised method for obtaining a negative impression of the foot prior to orthotic fabrication. Digital scanning offers an alternative to the traditional plaster of Paris techniques. Methods Twenty one healthy participants volunteered to take part in the study. Six casts and six digital scans were obtained from each participant by two raters of differing clinical experience. The foot parameters chosen for investigation were cast length (mm), forefoot width (mm), rearfoot width (mm), medial arch height (mm), lateral arch height (mm) and forefoot to rearfoot alignment (degrees). Intraclass correlation coefficients (ICC) with 95% confidence intervals (CI) were calculated to determine the intra and inter-rater reliability. Measurement error was assessed through the calculation of the standard error of the measurement (SEM) and smallest real difference (SRD). Results ICC values for all foot parameters using digital scanning ranged between 0.81-0.99 for both intra and inter-rater reliability. For neutral suspension casting technique inter-rater reliability values ranged from 0.57-0.99 and intra-rater reliability values ranging from 0.36-0.99 for rater 1 and 0.49-0.99 for rater 2. Conclusions The findings of this study indicate that digital scanning is a reliable technique, irrespective of clinical experience, with reduced measurement variability in all foot parameters investigated when compared to neutral suspension casting. PMID:21375757
Hinton-Bayre, Anton D
2011-02-01
There is an ongoing debate over the preferred method(s) for determining the reliable change (RC) in individual scores over time. In the present paper, specificity comparisons of several classic and contemporary RC models were made using a real data set. This included a more detailed review of a new RC model recently proposed in this journal, that used the within-subjects standard deviation (WSD) as the error term. It was suggested that the RC(WSD) was more sensitive to change and theoretically superior. The current paper demonstrated that even in the presence of mean practice effects, false-positive rates were comparable across models when reliability was good and initial and retest variances were equivalent. However, when variances differed, discrepancies in classification across models became evident. Notably, the RC using the WSD provided unacceptably high false-positive rates in this setting. It was considered that the WSD was never intended for measuring change in this manner. The WSD actually combines systematic and error variance. The systematic variance comes from measurable between-treatment differences, commonly referred to as practice effect. It was further demonstrated that removal of the systematic variance and appropriate modification of the residual error term for the purpose of testing individual change yielded an error term already published and criticized in the literature. A consensus on the RC approach is needed. To that end, further comparison of models under varied conditions is encouraged.
Investigating Reliabilities of Intraindividual Variability Indicators
ERIC Educational Resources Information Center
Wang, Lijuan; Grimm, Kevin J.
2012-01-01
Reliabilities of the two most widely used intraindividual variability indicators, "ISD[superscript 2]" and "ISD", are derived analytically. Both are functions of the sizes of the first and second moments of true intraindividual variability, the size of the measurement error variance, and the number of assessments within a burst. For comparison,…
Fortin, Carole; Feldman, Debbie Ehrmann; Cheriet, Farida; Gravel, Denis; Gauthier, Frédérique; Labelle, Hubert
2012-03-01
To determine overall, test-retest and inter-rater reliability of posture indices among persons with idiopathic scoliosis. A reliability study using two raters and two test sessions. Tertiary care paediatric centre. Seventy participants aged between 10 and 20 years with different types of idiopathic scoliosis (Cobb angle 15 to 60°) were recruited from the scoliosis clinic. Based on the XY co-ordinates of natural reference points (e.g., eyes) as well as markers placed on several anatomical landmarks, 32 angular and linear posture indices taken from digital photographs in the standing position were calculated from a specially developed software program. Generalisability theory served to estimate the reliability and standard error of measurement (SEM) for the overall, test-retest and inter-rater designs. Bland and Altman's method was also used to document agreement between sessions and raters. In the random design, dependability coefficients demonstrated a moderate level of reliability for six posture indices (ϕ=0.51 to 0.72) and a good level of reliability for 26 posture indices out of 32 (ϕ≥0.79). Error attributable to marker placement was negligible for most indices. Limits of agreement and SEM values were larger for shoulder protraction, trunk list, Q angle, cervical lordosis and scoliosis angles. The most reproducible indices were waist angles and knee valgus and varus. Posture can be assessed in a global fashion from photographs in persons with idiopathic scoliosis. Despite the good reliability of marker placement, other studies are needed to minimise measurement errors in order to provide a suitable tool for monitoring change in posture over time. Copyright © 2011 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
Is Coefficient Alpha Robust to Non-Normal Data?
Sheng, Yanyan; Sheng, Zhaohui
2011-01-01
Coefficient alpha has been a widely used measure by which internal consistency reliability is assessed. In addition to essential tau-equivalence and uncorrelated errors, normality has been noted as another important assumption for alpha. Earlier work on evaluating this assumption considered either exclusively non-normal error score distributions, or limited conditions. In view of this and the availability of advanced methods for generating univariate non-normal data, Monte Carlo simulations were conducted to show that non-normal distributions for true or error scores do create problems for using alpha to estimate the internal consistency reliability. The sample coefficient alpha is affected by leptokurtic true score distributions, or skewed and/or kurtotic error score distributions. Increased sample sizes, not test lengths, help improve the accuracy, bias, or precision of using it with non-normal data. PMID:22363306
Retention-error patterns in complex alphanumeric serial-recall tasks.
Mathy, Fabien; Varré, Jean-Stéphane
2013-01-01
We propose a new method based on an algorithm usually dedicated to DNA sequence alignment in order to both reliably score short-term memory performance on immediate serial-recall tasks and analyse retention-error patterns. There can be considerable confusion on how performance on immediate serial list recall tasks is scored, especially when the to-be-remembered items are sampled with replacement. We discuss the utility of sequence-alignment algorithms to compare the stimuli to the participants' responses. The idea is that deletion, substitution, translocation, and insertion errors, which are typical in DNA, are also typical putative errors in short-term memory (respectively omission, confusion, permutation, and intrusion errors). We analyse four data sets in which alphanumeric lists included a few (or many) repetitions. After examining the method on two simple data sets, we show that sequence alignment offers 1) a compelling method for measuring capacity in terms of chunks when many regularities are introduced in the material (third data set) and 2) a reliable estimator of individual differences in short-term memory capacity. This study illustrates the difficulty of arriving at a good measure of short-term memory performance, and also attempts to characterise the primary factors underpinning remembering and forgetting.
Inter-arch digital model vs. manual cast measurements: Accuracy and reliability.
Kiviahde, Heikki; Bukovac, Lea; Jussila, Päivi; Pesonen, Paula; Sipilä, Kirsi; Raustia, Aune; Pirttiniemi, Pertti
2017-06-28
The purpose of this study was to evaluate the accuracy and reliability of inter-arch measurements using digital dental models and conventional dental casts. Thirty sets of dental casts with permanent dentition were examined. Manual measurements were done with a digital caliper directly on the dental casts, and digital measurements were made on 3D models by two independent examiners. Intra-class correlation coefficients (ICC), a paired sample t-test or Wilcoxon signed-rank test, and Bland-Altman plots were used to evaluate intra- and inter-examiner error and to determine the accuracy and reliability of the measurements. The ICC values were generally good for manual and excellent for digital measurements. The Bland-Altman plots of all the measurements showed good agreement between the manual and digital methods and excellent inter-examiner agreement using the digital method. Inter-arch occlusal measurements on digital models are accurate and reliable and are superior to manual measurements.
Meijer, Erik; Rohwedder, Susann; Wansbeek, Tom
2012-01-01
Survey data on earnings tend to contain measurement error. Administrative data are superior in principle, but they are worthless in case of a mismatch. We develop methods for prediction in mixture factor analysis models that combine both data sources to arrive at a single earnings figure. We apply the methods to a Swedish data set. Our results show that register earnings data perform poorly if there is a (small) probability of a mismatch. Survey earnings data are more reliable, despite their measurement error. Predictors that combine both and take conditional class probabilities into account outperform all other predictors.
Automatic training and reliability estimation for 3D ASM applied to cardiac MRI segmentation
NASA Astrophysics Data System (ADS)
Tobon-Gomez, Catalina; Sukno, Federico M.; Butakoff, Constantine; Huguet, Marina; Frangi, Alejandro F.
2012-07-01
Training active shape models requires collecting manual ground-truth meshes in a large image database. While shape information can be reused across multiple imaging modalities, intensity information needs to be imaging modality and protocol specific. In this context, this study has two main purposes: (1) to test the potential of using intensity models learned from MRI simulated datasets and (2) to test the potential of including a measure of reliability during the matching process to increase robustness. We used a population of 400 virtual subjects (XCAT phantom), and two clinical populations of 40 and 45 subjects. Virtual subjects were used to generate simulated datasets (MRISIM simulator). Intensity models were trained both on simulated and real datasets. The trained models were used to segment the left ventricle (LV) and right ventricle (RV) from real datasets. Segmentations were also obtained with and without reliability information. Performance was evaluated with point-to-surface and volume errors. Simulated intensity models obtained average accuracy comparable to inter-observer variability for LV segmentation. The inclusion of reliability information reduced volume errors in hypertrophic patients (EF errors from 17 ± 57% to 10 ± 18% LV MASS errors from -27 ± 22 g to -14 ± 25 g), and in heart failure patients (EF errors from -8 ± 42% to -5 ± 14%). The RV model of the simulated images needs further improvement to better resemble image intensities around the myocardial edges. Both for real and simulated models, reliability information increased segmentation robustness without penalizing accuracy.
Automatic training and reliability estimation for 3D ASM applied to cardiac MRI segmentation.
Tobon-Gomez, Catalina; Sukno, Federico M; Butakoff, Constantine; Huguet, Marina; Frangi, Alejandro F
2012-07-07
Training active shape models requires collecting manual ground-truth meshes in a large image database. While shape information can be reused across multiple imaging modalities, intensity information needs to be imaging modality and protocol specific. In this context, this study has two main purposes: (1) to test the potential of using intensity models learned from MRI simulated datasets and (2) to test the potential of including a measure of reliability during the matching process to increase robustness. We used a population of 400 virtual subjects (XCAT phantom), and two clinical populations of 40 and 45 subjects. Virtual subjects were used to generate simulated datasets (MRISIM simulator). Intensity models were trained both on simulated and real datasets. The trained models were used to segment the left ventricle (LV) and right ventricle (RV) from real datasets. Segmentations were also obtained with and without reliability information. Performance was evaluated with point-to-surface and volume errors. Simulated intensity models obtained average accuracy comparable to inter-observer variability for LV segmentation. The inclusion of reliability information reduced volume errors in hypertrophic patients (EF errors from 17 ± 57% to 10 ± 18%; LV MASS errors from -27 ± 22 g to -14 ± 25 g), and in heart failure patients (EF errors from -8 ± 42% to -5 ± 14%). The RV model of the simulated images needs further improvement to better resemble image intensities around the myocardial edges. Both for real and simulated models, reliability information increased segmentation robustness without penalizing accuracy.
The Trojan Lifetime Champions Health Survey: Development, Validity, and Reliability
Sorenson, Shawn C.; Romano, Russell; Scholefield, Robin M.; Schroeder, E. Todd; Azen, Stanley P.; Salem, George J.
2015-01-01
Context Self-report questionnaires are an important method of evaluating lifespan health, exercise, and health-related quality of life (HRQL) outcomes among elite, competitive athletes. Few instruments, however, have undergone formal characterization of their psychometric properties within this population. Objective To evaluate the validity and reliability of a novel health and exercise questionnaire, the Trojan Lifetime Champions (TLC) Health Survey. Design Descriptive laboratory study. Setting A large National Collegiate Athletic Association Division I university. Patients or Other Participants A total of 63 university alumni (age range, 24 to 84 years), including former varsity collegiate athletes and a control group of nonathletes. Intervention(s) Participants completed the TLC Health Survey twice at a mean interval of 23 days with randomization to the paper or electronic version of the instrument. Main Outcome Measure(s) Content validity, feasibility of administration, test-retest reliability, parallel-form reliability between paper and electronic forms, and estimates of systematic and typical error versus differences of clinical interest were assessed across a broad range of health, exercise, and HRQL measures. Results Correlation coefficients, including intraclass correlation coefficients (ICCs) for continuous variables and κ agreement statistics for ordinal variables, for test-retest reliability averaged 0.86, 0.90, 0.80, and 0.74 for HRQL, lifetime health, recent health, and exercise variables, respectively. Correlation coefficients, again ICCs and κ, for parallel-form reliability (ie, equivalence) between paper and electronic versions averaged 0.90, 0.85, 0.85, and 0.81 for HRQL, lifetime health, recent health, and exercise variables, respectively. Typical measurement error was less than the a priori thresholds of clinical interest, and we found minimal evidence of systematic test-retest error. We found strong evidence of content validity, convergent construct validity with the Short-Form 12 Version 2 HRQL instrument, and feasibility of administration in an elite, competitive athletic population. Conclusions These data suggest that the TLC Health Survey is a valid and reliable instrument for assessing lifetime and recent health, exercise, and HRQL, among elite competitive athletes. Generalizability of the instrument may be enhanced by additional, larger-scale studies in diverse populations. PMID:25611315
de Mesquita, Gabriel Nunes; de Oliveira, Marcela Nicácio Medeiros; Matoso, Amanda Ellen Rodrigues; Filho, Alberto Galvão de Moura; de Oliveira, Rodrigo Ribeiro
2018-04-24
Study Design Clinical measurement study. Background Achilles tendon disorders are very common among athletes and it is important to objectively measure symptoms and functional limitations related to Achilles tendinopathy using outcome measures that have been validated in the language of the target population. Objectives To perform a cross-cultural adaptation and to evaluate the measurement properties of the Brazilian version of the Victorian Institute of Sport Assessment-Achilles (VISA-A) questionnaire. Methods We adapted the VISA-A questionnaire to Brazilian Portuguese (VISA-A-Br). The questionnaire was applied on 2 occasions with an interval of 5 to 14 days. We evaluated the following measurement properties: internal consistency, test-retest reliability, measurement error, construct validity, and ceiling and floor effects. Results The VISA-A-Br showed good internal consistency (Cronbach's alpha = 0.79; after excluding 1 item at a time, Cronbach's α = 0.73 to 0.84), good test-retest reliability (ICC agreement2,1 = 0.84, 95% confidence interval = 0.71-0.91), an acceptable measurement error (standard error of measurement = 3.25 points and Smallest Detectable Change= 9.02 points), good construct validity (Spearman's coefficient with LEFS= 0.73 and FAOS in its 5 subscales; Pain= 0.66, other Symptoms=0.48, Function in daily living (ADL)= 0.59, Function in sport and recreation=0.67, and foot and ankle-related Quality of Life = 0.7), and no ceiling and floor effects. Conclusion The VISA-A-Br is equivalent to the original version; it has been validated and confirmed as reliable to measure pain and function among the Brazilian population with Achilles tendinopathy, and it can be used in clinical and scientific settings. J Orthop Sports Phys Ther, Epub 24 Apr 2018. doi:10.2519/jospt.2018.7897.
Lee, Posen; Lu, Wen-Shian; Liu, Chin-Hsuan; Lin, Hung-Yu; Hsieh, Ching-Lin
2017-12-08
The d2 Test of Attention (D2) is a commonly used measure of selective attention for patients with schizophrenia. However, its test-retest reliability and minimal detectable change (MDC) are unknown in patients with schizophrenia, limiting its utility in both clinical and research settings. The aim of the present study was to examine the test-retest reliability and MDC of the D2 in patients with schizophrenia. A rater administered the D2 on 108 patients with schizophrenia twice at a 1-month interval. Test-retest reliability was determined through the calculation of the intra-class correlation coefficient (ICC). We also carried out Bland-Altman analysis, which included a scatter plot of the differences between test and retest against their mean. Systematic biases were evaluated by use of a paired t-test. The ICCs for the D2 ranged from 0.78 to 0.94. The MDCs (MDC%) of the seven subscores were 102.3 (29.7), 19.4 (85.0), 7.2 (94.6), 21.0 (69.0), 104.0 (33.1), 105.0 (35.8), and 7.8 (47.8), which represented limited-to-acceptable random measurement error. Trends in the Bland-Altman plots of the omissions (E1), commissions (E2), and errors (E) were noted, presenting that the data had heteroscedasticity. According to the results, the D2 had good test-retest reliability, especially in the scores of TN, TN-E, and CP. For the further research, finding a way to improve the administration procedure to reduce random measurement error would be important for the E1, E2, E, and FR subscores. © The Author(s) 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koepferl, Christine M.; Robitaille, Thomas P.; Dale, James E., E-mail: koepferl@usm.lmu.de
We use a large data set of realistic synthetic observations (produced in Paper I of this series) to assess how observational techniques affect the measurement physical properties of star-forming regions. In this part of the series (Paper II), we explore the reliability of the measured total gas mass, dust surface density and dust temperature maps derived from modified blackbody fitting of synthetic Herschel observations. We find from our pixel-by-pixel analysis of the measured dust surface density and dust temperature a worrisome error spread especially close to star formation sites and low-density regions, where for those “contaminated” pixels the surface densitiesmore » can be under/overestimated by up to three orders of magnitude. In light of this, we recommend to treat the pixel-based results from this technique with caution in regions with active star formation. In regions of high background typical in the inner Galactic plane, we are not able to recover reliable surface density maps of individual synthetic regions, since low-mass regions are lost in the far-infrared background. When measuring the total gas mass of regions in moderate background, we find that modified blackbody fitting works well (absolute error: + 9%; −13%) up to 10 kpc distance (errors increase with distance). Commonly, the initial images are convolved to the largest common beam-size, which smears contaminated pixels over large areas. The resulting information loss makes this commonly used technique less verifiable as now χ {sup 2} values cannot be used as a quality indicator of a fitted pixel. Our control measurements of the total gas mass (without the step of convolution to the largest common beam size) produce similar results (absolute error: +20%; −7%) while having much lower median errors especially for the high-mass stellar feedback phase. In upcoming papers (Paper III; Paper IV) of this series we test the reliability of measured star formation rate with direct and indirect techniques.« less
A Psychometric Review of Norm-Referenced Tests Used to Assess Phonological Error Patterns
ERIC Educational Resources Information Center
Kirk, Celia; Vigeland, Laura
2014-01-01
Purpose: The authors provide a review of the psychometric properties of 6 norm-referenced tests designed to measure children's phonological error patterns. Three aspects of the tests' psychometric adequacy were evaluated: the normative sample, reliability, and validity. Method: The specific criteria used for determining the psychometric…
Effects of uncertainty and variability on population declines and IUCN Red List classifications.
Rueda-Cediel, Pamela; Anderson, Kurt E; Regan, Tracey J; Regan, Helen M
2018-01-22
The International Union for Conservation of Nature (IUCN) Red List Categories and Criteria is a quantitative framework for classifying species according to extinction risk. Population models may be used to estimate extinction risk or population declines. Uncertainty and variability arise in threat classifications through measurement and process error in empirical data and uncertainty in the models used to estimate extinction risk and population declines. Furthermore, species traits are known to affect extinction risk. We investigated the effects of measurement and process error, model type, population growth rate, and age at first reproduction on the reliability of risk classifications based on projected population declines on IUCN Red List classifications. We used an age-structured population model to simulate true population trajectories with different growth rates, reproductive ages and levels of variation, and subjected them to measurement error. We evaluated the ability of scalar and matrix models parameterized with these simulated time series to accurately capture the IUCN Red List classification generated with true population declines. Under all levels of measurement error tested and low process error, classifications were reasonably accurate; scalar and matrix models yielded roughly the same rate of misclassifications, but the distribution of errors differed; matrix models led to greater overestimation of extinction risk than underestimations; process error tended to contribute to misclassifications to a greater extent than measurement error; and more misclassifications occurred for fast, rather than slow, life histories. These results indicate that classifications of highly threatened taxa (i.e., taxa with low growth rates) under criterion A are more likely to be reliable than for less threatened taxa when assessed with population models. Greater scrutiny needs to be placed on data used to parameterize population models for species with high growth rates, particularly when available evidence indicates a potential transition to higher risk categories. © 2018 Society for Conservation Biology.
Varughese, J K; Wentzel-Larsen, T; Vassbotn, F; Moen, G; Lund-Johansen, M
2010-04-01
In this volumetric study of the vestibular schwannoma, we evaluated the accuracy and reliability of several approximation methods that are in use, and determined the minimum volume difference that needs to be measured for it to be attributable to an actual difference rather than a retest error. We also found empirical proportionality coefficients for the different methods. DESIGN/SETTING AND PARTICIPANTS: Methodological study with investigation of three different VS measurement methods compared to a reference method that was based on serial slice volume estimates. These volume estimates were based on: (i) one single diameter, (ii) three orthogonal diameters or (iii) the maximal slice area. Altogether 252 T1-weighted MRI images with gadolinium contrast, from 139 VS patients, were examined. The retest errors, in terms of relative percentages, were determined by undertaking repeated measurements on 63 scans for each method. Intraclass correlation coefficients were used to assess the agreement between each of the approximation methods and the reference method. The tendency for approximation methods to systematically overestimate/underestimate different-sized tumours was also assessed, with the help of Bland-Altman plots. The most commonly used approximation method, the maximum diameter, was the least reliable measurement method and has inherent weaknesses that need to be considered. This includes greater retest errors than area-based measurements (25% and 15%, respectively), and that it was the only approximation method that could not easily be converted into volumetric units. Area-based measurements can furthermore be more reliable for smaller volume differences than diameter-based measurements. All our findings suggest that the maximum diameter should not be used as an approximation method. We propose the use of measurement modalities that take into account growth in multiple dimensions instead.
Sairanen, V; Kuusela, L; Sipilä, O; Savolainen, S; Vanhatalo, S
2017-02-15
Diffusion Tensor Imaging (DTI) is commonly challenged by subject motion during data acquisition, which often leads to corrupted image data. Currently used procedure in DTI analysis is to correct or completely reject such data before tensor estimations, however assessing the reliability and accuracy of the estimated tensor in such situations has evaded previous studies. This work aims to define the loss of data accuracy with increasing image rejections, and to define a robust method for assessing reliability of the result at voxel level. We carried out simulations of every possible sub-scheme (N=1,073,567,387) of Jones30 gradient scheme, followed by confirming the idea with MRI data from four newborn and three adult subjects. We assessed the relative error of the most commonly used tensor estimates for DTI and tractography studies, fractional anisotropy (FA) and the major orientation vector (V1), respectively. The error was estimated using two measures, the widely used electric potential (EP) criteria as well as the rotationally variant condition number (CN). Our results show that CN and EP are comparable in situations with very few rejections, but CN becomes clearly more sensitive to depicting errors when more gradient vectors and images were rejected. The error in FA and V1 was also found depend on the actual FA level in the given voxel; low actual FA levels were related to high relative errors in the FA and V1 estimates. Finally, the results were confirmed with clinical MRI data. This showed that the errors after rejections are, indeed, inhomogeneous across brain regions. The FA and V1 errors become progressively larger when moving from the thick white matter bundles towards more superficial subcortical structures. Our findings suggest that i) CN is a useful estimator of data reliability at voxel level, and ii) DTI preprocessing with data rejections leads to major challenges when assessing brain tissue with lower FA levels, such as all newborn brain, as well as the adult superficial, subcortical areas commonly traced in precise connectivity analyses between cortical regions. Copyright © 2016 Elsevier Inc. All rights reserved.
2016-10-01
Reports an error in "Unreliability as a threat to understanding psychopathology: The cautionary tale of attentional bias" by Thomas L. Rodebaugh, Rachel B. Scullin, Julia K. Langer, David J. Dixon, Jonathan D. Huppert, Amit Bernstein, Ariel Zvielli and Eric J. Lenze ( Journal of Abnormal Psychology , 2016[Aug], Vol 125[6], 840-851). There was an error in the Author Note concerning the support of the MacBrain Face Stimulus Set. The correct statement is provided. (The following abstract of the original article appeared in record 2016-30117-001.) The use of unreliable measures constitutes a threat to our understanding of psychopathology, because advancement of science using both behavioral and biologically oriented measures can only be certain if such measurements are reliable. Two pillars of the National Institute of Mental Health's portfolio-the Research Domain Criteria (RDoC) initiative for psychopathology and the target engagement initiative in clinical trials-cannot succeed without measures that possess the high reliability necessary for tests involving mediation and selection based on individual differences. We focus on the historical lack of reliability of attentional bias measures as an illustration of how reliability can pose a threat to our understanding. Our own data replicate previous findings of poor reliability for traditionally used scores, which suggests a serious problem with the ability to test theories regarding attentional bias. This lack of reliability may also suggest problems with the assumption (in both theory and the formula for the scores) that attentional bias is consistent and stable across time. In contrast, measures accounting for attention as a dynamic process in time show good reliability in our data. The field is sorely in need of research reporting findings and reliability for attentional bias scores using multiple methods, including those focusing on dynamic processes over time. We urge researchers to test and report reliability of all measures, considering findings of low reliability not just as a nuisance but as an opportunity to modify and improve upon the underlying theory. Full assessment of reliability of measures will maximize the possibility that RDoC (and psychological science more generally) will succeed. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Palta, Mari; Chen, Han-Yang; Kaplan, Robert M; Feeny, David; Cherepanov, Dasha; Fryback, Dennis G
2011-01-01
Standard errors of measurement (SEMs) of health-related quality of life (HRQoL) indexes are not well characterized. SEM is needed to estimate responsiveness statistics, and is a component of reliability. To estimate the SEM of 5 HRQoL indexes. The National Health Measurement Study (NHMS) was a population-based survey. The Clinical Outcomes and Measurement of Health Study (COMHS) provided repeated measures. A total of 3844 randomly selected adults from the noninstitutionalized population aged 35 to 89 y in the contiguous United States and 265 cataract patients. The SF6-36v2™, QWB-SA, EQ-5D, HUI2, and HUI3 were included. An item-response theory approach captured joint variation in indexes into a composite construct of health (theta). The authors estimated 1) the test-retest standard deviation (SEM-TR) from COMHS, 2) the structural standard deviation (SEM-S) around theta from NHMS, and 3) reliability coefficients. SEM-TR was 0.068 (SF-6D), 0.087 (QWB-SA), 0.093 (EQ-5D), 0.100 (HUI2), and 0.134 (HUI3), whereas SEM-S was 0.071, 0.094, 0.084, 0.074, and 0.117, respectively. These yield reliability coefficients 0.66 (COMHS) and 0.71 (NHMS) for SF-6D, 0.59 and 0.64 for QWB-SA, 0.61 and 0.70 for EQ-5D, 0.64 and 0.80 for HUI2, and 0.75 and 0.77 for HUI3, respectively. The SEM varied across levels of health, especially for HUI2, HUI3, and EQ-5D, and was influenced by ceiling effects. Limitations. Repeated measures were 5 mo apart, and estimated theta contained measurement error. The 2 types of SEM are similar and substantial for all the indexes and vary across health.
Romero-Delmastro, Alejandro; Kadioglu, Onur; Currier, G Frans; Cook, Tanner
2014-08-01
Cone-beam computed tomography images have been previously used for evaluation of alveolar bone levels around teeth before, during, and after orthodontic treatment. Protocols described in the literature have been vague, have used unstable landmarks, or have required several software programs, file conversions, or hand tracings, among other factors that could compromise the precision of the measurements. The purposes of this article are to describe a totally digital tooth-based superimposition method for the quantitative assessment of alveolar bone levels and to evaluate its reliability. Ultra cone-beam computed tomography images (0.1-mm reconstruction) from 10 subjects were obtained from the data pool of the University of Oklahoma; 80 premolars were measured twice by the same examiner and a third time by a second examiner to determine alveolar bone heights and thicknesses before and more than 6 months after orthodontic treatment using OsiriX (version 3.5.1; Pixeo, Geneva, Switzerland). Intraexaminer and interexaminer reliabilities were evaluated, and Dahlberg's formula was used to calculate the error of the measurements. Cross-sectional and longitudinal evaluations of alveolar bone levels were possible using a digital tooth-based superimposition method. The mean differences for buccal alveolar crest heights and thicknesses were below 0.10 mm for the same examiner and below 0.17 mm for all examiners. The ranges of errors for any measurement were between 0.02 and 0.23 mm for intraexaminer errors, and between 0.06 and 0.29 mm for interexaminer errors. This protocol can be used for cross-sectional or longitudinal assessment of alveolar bone levels with low interexaminer and intraexaminer errors, and it eliminates the use of less reliable or less stable landmarks and the need for multiple software programs and image printouts. Standardization of the methods for bone assessment in orthodontics is necessary; this method could be the answer to this need. Copyright © 2014 American Association of Orthodontists. Published by Mosby, Inc. All rights reserved.
Between-day reliability of a method for non-invasive estimation of muscle composition.
Simunič, Boštjan
2012-08-01
Tensiomyography is a method for valid and non-invasive estimation of skeletal muscle fibre type composition. The validity of selected temporal tensiomyographic measures has been well established recently; there is, however, no evidence regarding the method's between-day reliability. Therefore it is the aim of this paper to establish the between-day repeatability of tensiomyographic measures in three skeletal muscles. For three consecutive days, 10 healthy male volunteers (mean±SD: age 24.6 ± 3.0 years; height 177.9 ± 3.9 cm; weight 72.4 ± 5.2 kg) were examined in a supine position. Four temporal measures (delay, contraction, sustain, and half-relaxation time) and maximal amplitude were extracted from the displacement-time tensiomyogram. A reliability analysis was performed with calculations of bias, random error, coefficient of variation (CV), standard error of measurement, and intra-class correlation coefficient (ICC) with a 95% confidence interval. An analysis of ICC demonstrated excellent agreement (ICC were over 0.94 in 14 out of 15 tested parameters). However, lower CV was observed in half-relaxation time, presumably because of the specifics of the parameter definition itself. These data indicate that for the three muscles tested, tensiomyographic measurements were reproducible across consecutive test days. Furthermore, we indicated the most possible origin of the lowest reliability detected in half-relaxation time. Copyright © 2012 Elsevier Ltd. All rights reserved.
Quantifying Error in Survey Measures of School and Classroom Environments
ERIC Educational Resources Information Center
Schweig, Jonathan David
2014-01-01
Developing indicators that reflect important aspects of school and classroom environments has become central in a nationwide effort to develop comprehensive programs that measure teacher quality and effectiveness. Formulating teacher evaluation policy necessitates accurate and reliable methods for measuring these environmental variables. This…
The MOBID-2 pain scale: Reliability and responsiveness to pain in patients with dementia
Husebo, BS; Ostelo, R; Strand, LI
2014-01-01
Background Mobilization-Observation-Behavior-Intensity-Dementia-2 (MOBID-2) pain scale is a staff-administered pain tool for patients with dementia. This study explores MOBID-2's test–retest reliability, measurement error and responsiveness to change. Methods Analyses are based upon data from a cluster randomized trial including 352 patients with advanced dementia from 18 Norwegian nursing homes. Test–retest reliability between baseline and week 2 (n = 163), and weeks 2 and 4 (n = 159) was examined in patients not expected to change (controls), using intraclass correlation coefficient (ICC2.1), standard error of measurement (SEM) and smallest detectable change (SDC). Responsiveness was examined by testing six priori-formulated hypotheses about the association between change scores on MOBID-2 and other outcome measures. Results ICCs of the total MOBID-2 scores were 0.81 (0–2 weeks) and 0.85 (2–4 weeks). SEM and SDC were 1.9 and 3.1 (0–2 weeks) and 1.4 and 2.3 (2–4 weeks), respectively. Five out of six hypotheses were confirmed: MOBID-2 discriminated (p < 0.001) between change in patients with and without a stepwise protocol for treatment of pain (SPTP). Moderate association (r = 0.35) was demonstrated with Cohen-Mansfield Agitation Inventory, and no association with Mini-Mental State Examination, Functional Assessment Staging and Activity of Daily Living. Expected associations between change scores of MOBID-2 and Neuropsychiatric Inventory – Nursing Home version were not confirmed. Conclusion The SEM and SDC in connection with the MOBID-2 pain scale indicate that the instrument is responsive to a decrease in pain after a SPTP. Satisfactory test–retest reliability across test periods was demonstrated. Change scores ≥ 3 on total and subscales are clinically relevant and are beyond measurement error. PMID:24799157
Intrarater and interrater reliability of the Anteromedial Reach Test in healthy participants
Bent, Nicholas P; Rushton, Alison B; Wright, Chris C; Petherick, Emma-Jane; Batt, Mark E
2014-01-01
Background The Anteromedial Reach Test is a performance-based outcome measure for evaluating dynamic knee stability in patients with anterior cruciate ligament injury. No previously published study has adequately evaluated intrarater or interrater reliability of the Anteromedial Reach Test, so the purpose of this study was to assess these measurement properties in healthy participants prior to their investigation in patients with anterior cruciate ligament injury. Methods Two raters (A and B) tested 39 healthy university staff and students (20 men, 19 women). For the intrarater reliability investigation, rater A tested participants on three separate test occasions (days 1, 2, and 3) at the same time of day. For the interrater reliability investigation, raters A and B independently tested participants on the same test occasion (day 3). Results There was no significant systematic bias between test occasions or raters. Values of the intraclass correlation coefficient (2,1) were 0.96 for intrarater reliability of both the dominant leg and nondominant leg and 0.97 (dominant leg) and 0.98 (nondominant leg) for interrater reliability. Values for the standard error of measurement were 1.46 (dominant leg) and 1.62 (nondominant leg) for the intrarater investigation, and 1.26 (dominant leg) and 1.04 (nondominant leg) for the interrater investigation. At the 90% confidence level, the minimum detectable change was 3.8% and the error in an individual’s score at a given point in time was ±2.7%. Conclusion The Anteromedial Reach Test demonstrated excellent intrarater and interrater reliability in healthy participants. This provides a basis for future investigation of the measurement properties of the Anteromedial Reach Test in patients with anterior cruciate ligament injury. PMID:24648776
Morin, Mélanie; Gravel, Denis; Bourbonnais, Daniel; Dumoulin, Chantale; Ouellet, Stéphane
2008-01-01
The passive properties of the pelvic floor muscles (PFM) might play a role in stress urinary incontinence (SUI) pathophysiology. To investigate the test-retest reliability of the dynamometric passive properties of the PFM in postmenopausal SUI women. Thirty-two SUI postmenopausal women were convened to two sessions 2 weeks apart. In each session, the measurements were repeated twice. The pelvic floor musculature was evaluated in four different conditions: (1) forces recorded at minimal aperture (initial passive resistance); (2) passive resistance at maximal aperture; (3) five lengthening and shortening cycles (Forces and passive elastic stiffness (PES) were evaluated at different vaginal apertures. Hysteresis was also calculated.); (4) Percentage of passive resistance loss after 1 min of sustained stretching was computed. The generalizability theory was used to calculate two reliability estimates, the dependability indices (Phi) and the standard error of measurement (SEM), for one session involving one measurement or the mean of two measurements. Overall, the reliability of the passive properties was good with indices of dependability of 0.75-0.93. The SEMs for forces and PES were 0.24-0.67 N and 0.03-0.10 N/mm, respectively, for mean, maximal and 20-mm apertures, representing an error between 13% and 23%. Passive forces at minimal aperture showed lower reliability (Phi = 0.51-0.57) compared with other vaginal openings. The aperture at a common force of 0.5 N was the only parameter demonstrating a poor reliability (Phi = 0.35). This new approach for assessing PFM passive properties showed enough reliability for highly recommending its inclusion in the PFM assessment of SUI postmenopausal women. (c) 2008 Wiley-Liss, Inc.
Ban, Ilija; Troelsen, Anders; Kristensen, Morten Tange
2016-10-01
The Constant score (CS) has been the primary endpoint in most studies on clavicle fractures. However, the CS was not developed to assess patients with clavicle fractures. Our aim was to examine inter-rater reliability and agreement of the CS in patients with clavicle fractures. The secondary aim was to estimate the correlation between the CS and the Disabilities of the Arm, Shoulder and Hand score and the internal consistency of the 2 scores. On the basis of sample sizing, 36 patients (31 male and 5 female patients; mean age, 41.3 years) with clavicle fractures underwent standardized CS assessment at a mean of 6.8 weeks (SD, 1.0 weeks) after injury. Reliability and agreement of the CS were determined by 2 raters. The interclass correlation coefficient (ICC2,1), standard error of measurement, minimal detectable change, Cronbach α coefficient, and Pearson correlation coefficient were estimated. Inter-rater reliability of the total CS was excellent (interclass correlation coefficient, 0.94; 95% confidence interval, 0.88-0.97), with no systematic difference between the 2 raters (P = .75). The standard error of measurement (measurement error at the group level) was 4.9, whereas the minimal detectable change (smallest change needed to indicate a real change for an individual) was 13.6 CS points. The internal consistency of the 10 CS items was good, with a Cronbach α of .85, and we found a strong correlation (r = -0.92) between the CS and Disabilities of the Arm, Shoulder and Hand score. The CS was found to be reliable for assessing patients with clavicle fractures, especially at the group level. With high inter-rater reliability and agreement, in addition to good internal consistency, the standardized CS used in this study can be used for comparison of results from different settings. Copyright © 2016 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
Alghadir, Ahmad H; Anwer, Shahnawaz; Iqbal, Amir; Iqbal, Zaheen Ahmed
2018-01-01
Objective Several scales are commonly used for assessing pain intensity. Among them, the numerical rating scale (NRS), visual analog scale (VAS), and verbal rating scale (VRS) are often used in clinical practice. However, no study has performed psychometric analyses of their reliability and validity in the measurement of osteoarthritic (OA) pain. Therefore, the present study examined the test–retest reliability, validity, and minimum detectable change (MDC) of the VAS, NRS, and VRS for the measurement of OA knee pain. In addition, the correlations of VAS, NRS, and VRS with demographic variables were evaluated. Methods The study included 121 subjects (65 women, 56 men; aged 40–80 years) with OA of the knee. Test–retest reliability of the VAS, NRS, and VRS was assessed during two consecutive visits in a 24 h interval. The validity was tested using Pearson’s correlation coefficients between the baseline scores of VAS, NRS, and VRS and the demographic variables (age, body mass index [BMI], sex, and OA grade). The standard error of measurement (SEM) and the MDC were calculated to assess statistically meaningful changes. Results The intraclass correlation coefficients of the VAS, NRS, and VRS were 0.97, 0.95, and 0.93, respectively. VAS, NRS, and VRS were significantly related to demographic variables (age, BMI, sex, and OA grade). The SEM of VAS, NRS, and VRS was 0.03, 0.48, and 0.21, respectively. The MDC of VAS, NRS, and VRS was 0.08, 1.33, and 0.58, respectively. Conclusion All the three scales had excellent test–retest reliability. However, the VAS was the most reliable, with the smallest errors in the measurement of OA knee pain. PMID:29731662
Mass-balance measurements in Alaska and suggestions for simplified observation programs
Trabant, D.C.; March, R.S.
1999-01-01
US Geological Survey glacier fieldwork in Alaska includes repetitious measurements, corrections for leaning or bending stakes, an ability to reliably measure seasonal snow as deep as 10 m, absolute identification of summer surfaces in the accumulation area, and annual evaluation of internal accumulation, internal ablation, and glacier-thickness changes. Prescribed field measurement and note-taking techniques help eliminate field errors and expedite the interpretative process. In the office, field notes are transferred to computerized spread-sheets for analysis, release on the World Wide Web, and archival storage. The spreadsheets have error traps to help eliminate note-taking and transcription errors. Rigorous error analysis ends when mass-balance measurements are extrapolated and integrated with area to determine glacier and basin mass balances. Unassessable errors in the glacier and basin mass-balance data reduce the value of the data set for correlations with climate change indices. The minimum glacier mass-balance program has at least three measurement sites on a glacier and the measurements must include the seasonal components of mass balance as well as the annual balance.
Predictive models of safety based on audit findings: Part 1: Model development and reliability.
Hsiao, Yu-Lin; Drury, Colin; Wu, Changxu; Paquet, Victor
2013-03-01
This consecutive study was aimed at the quantitative validation of safety audit tools as predictors of safety performance, as we were unable to find prior studies that tested audit validity against safety outcomes. An aviation maintenance domain was chosen for this work as both audits and safety outcomes are currently prescribed and regulated. In Part 1, we developed a Human Factors/Ergonomics classification framework based on HFACS model (Shappell and Wiegmann, 2001a,b), for the human errors detected by audits, because merely counting audit findings did not predict future safety. The framework was tested for measurement reliability using four participants, two of whom classified errors on 1238 audit reports. Kappa values leveled out after about 200 audits at between 0.5 and 0.8 for different tiers of errors categories. This showed sufficient reliability to proceed with prediction validity testing in Part 2. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Roaldsen, Kirsti Skavberg; Måøy, Åsa Blad; Jørgensen, Vivien; Stanghelle, Johan Kvalvik
2016-05-01
Translation of the Spinal Cord Injury Falls Concern Scale (SCI-FCS), and investigation of test-retest reliability on item-level and total-score-level. Translation, adaptation and test-retest study. A specialized rehabilitation setting in Norway. Fifty-four wheelchair users with a spinal cord injury. The median age of the cohort was 49 years, and the median number of years after injury was 13. Interventions/measurements: The SCI-FCS was translated and back-translated according to guidelines. Individuals answered the SCI-FCS twice over the course of one week. We investigated item-level test-retest reliability using Svensson's rank-based statistical method for disagreement analysis of paired ordinal data. For relative reliability, we analyzed the total-score-level test-retest reliability with intraclass correlation coefficients (ICC2.1), the standard error of measurement (SEM), and the smallest detectable change (SDC) for absolute reliability/measurement-error assessment and Cronbach's alpha for internal consistency. All items showed satisfactory percentage agreement (≥69%) between test and retest. There were small but non-negligible systematic disagreements among three items; we recovered an 11-13% higher chance for a lower second score. There was no disagreement due to random variance. The test-retest agreement (ICC2.1) was excellent (0.83). The SEM was 2.6 (12%), and the SDC was 7.1 (32%). The Cronbach's alpha was high (0.88). The Norwegian SCI-FCS is highly reliable for wheelchair users with chronic spinal cord injuries.
Statistically Controlling for Confounding Constructs Is Harder than You Think
Westfall, Jacob; Yarkoni, Tal
2016-01-01
Social scientists often seek to demonstrate that a construct has incremental validity over and above other related constructs. However, these claims are typically supported by measurement-level models that fail to consider the effects of measurement (un)reliability. We use intuitive examples, Monte Carlo simulations, and a novel analytical framework to demonstrate that common strategies for establishing incremental construct validity using multiple regression analysis exhibit extremely high Type I error rates under parameter regimes common in many psychological domains. Counterintuitively, we find that error rates are highest—in some cases approaching 100%—when sample sizes are large and reliability is moderate. Our findings suggest that a potentially large proportion of incremental validity claims made in the literature are spurious. We present a web application (http://jakewestfall.org/ivy/) that readers can use to explore the statistical properties of these and other incremental validity arguments. We conclude by reviewing SEM-based statistical approaches that appropriately control the Type I error rate when attempting to establish incremental validity. PMID:27031707
Error correcting coding-theory for structured light illumination systems
NASA Astrophysics Data System (ADS)
Porras-Aguilar, Rosario; Falaggis, Konstantinos; Ramos-Garcia, Ruben
2017-06-01
Intensity discrete structured light illumination systems project a series of projection patterns for the estimation of the absolute fringe order using only the temporal grey-level sequence at each pixel. This work proposes the use of error-correcting codes for pixel-wise correction of measurement errors. The use of an error correcting code is advantageous in many ways: it allows reducing the effect of random intensity noise, it corrects outliners near the border of the fringe commonly present when using intensity discrete patterns, and it provides a robustness in case of severe measurement errors (even for burst errors where whole frames are lost). The latter aspect is particular interesting in environments with varying ambient light as well as in critical safety applications as e.g. monitoring of deformations of components in nuclear power plants, where a high reliability is ensured even in case of short measurement disruptions. A special form of burst errors is the so-called salt and pepper noise, which can largely be removed with error correcting codes using only the information of a given pixel. The performance of this technique is evaluated using both simulations and experiments.
NASA Astrophysics Data System (ADS)
Próchniewicz, Dominik
2014-03-01
The reliability of precision GNSS positioning primarily depends on correct carrier-phase ambiguity resolution. An optimal estimation and correct validation of ambiguities necessitates a proper definition of mathematical positioning model. Of particular importance in the model definition is the taking into account of the atmospheric errors (ionospheric and tropospheric refraction) as well as orbital errors. The use of the network of reference stations in kinematic positioning, known as Network-based Real-Time Kinematic (Network RTK) solution, facilitates the modeling of such errors and their incorporation, in the form of correction terms, into the functional description of positioning model. Lowered accuracy of corrections, especially during atmospheric disturbances, results in the occurrence of unaccounted biases, the so-called residual errors. The taking into account of such errors in Network RTK positioning model is possible by incorporating the accuracy characteristics of the correction terms into the stochastic model of observations. In this paper we investigate the impact of the expansion of the stochastic model to include correction term variances on the reliability of the model solution. In particular the results of instantaneous solution that only utilizes a single epoch of GPS observations, is analyzed. Such a solution mode due to the low number of degrees of freedom is very sensitive to an inappropriate mathematical model definition. Thus the high level of the solution reliability is very difficult to achieve. Numerical tests performed for a test network located in mountain area during ionospheric disturbances allows to verify the described method for the poor measurement conditions. The results of the ambiguity resolution as well as the rover positioning accuracy shows that the proposed method of stochastic modeling can increase the reliability of instantaneous Network RTK performance.
Reliability of diabetic patients' gait parameters in a challenging environment.
Allet, L; Armand, S; de Bie, R A; Golay, A; Monnin, D; Aminian, K; de Bruin, E D
2008-11-01
Activities of daily life require us to move about in challenging environments and to walk on varied surfaces. Irregular terrain has been shown to influence gait parameters, especially in a population at risk for falling. A precise portable measurement system would permit objective gait analysis under such conditions. The aims of this study are to (a) investigate the reliability of gait parameters measured with the Physilog in diabetic patients walking on different surfaces (tar, grass, and stones); (b) identify the measurement error (precision); (c) identify the minimal clinical detectable change. 16 patients with Type 2 diabetes were measured twice within 8 days. After clinical examination patients walked, equipped with a Physilog, on the three aforementioned surfaces. ICC for each surface was excellent for within-visit analyses (>0.938). Inter-visit ICC's (0.753) were excellent except for the knee range parameter (>0.503). The coefficient of variation (CV) was lower than 5% for most of the parameters. Bland and Altman Plots, SEM and SDC showed precise values, distributed around zero for all surfaces. Good reliability of Physilog measurements on different surfaces suggests that Physilog could facilitate the study of diabetic patients' gait in conditions close to real-life situations. Gait parameters during complex locomotor activities (e.g. stair-climbing, curbs, slopes) have not yet been extensively investigated. Good reliability, small measurement error and values of minimal clinical detectable change recommend the utilization of Physilog for the evaluation of gait parameters in diabetic patients.
Test-retest reliability of posture measurements in adolescents with idiopathic scoliosis.
Heitz, Pierre-Henri; Aubin-Fournier, Jean-François; Parent, Éric; Fortin, Carole
2018-05-07
Posture changes are a major consequence of IS (IS). Posture changes can lead to psychosocial and physical impairments in adolescents with IS. Therefore, it is important to assess posture but the test-retest reliability of posture measurements still remains unknown in this population. The primary objective was to determine the test-retest reliability of 25 head and trunk posture indices using the Clinical Photographic Postural Assessment Tool (CPPAT) in adolescents with IS. The secondary objective was to determine the standard error of measurement and the minimal detectable change. This is a prospective test-retest reliability study carried out at two tertiary university hospital centers. Forty-one adolescents with IS, aged 10 to 16 years old with curves 10 to 45 o and treated non-operatively were recruited. Two posture assessments were done using the CPPAT five to 10 days apart following a standardized procedure. Photographs were analyzed with the CPPAT software by digitizing reference landmarks placed on the participant by a physiotherapist evaluator. Generalizability theory was used to obtain a coefficient of dependability, standard error of measurement and the minimal detectable change at the 90% confidence interval. This project was supported by the Canadian Pediatric Spine Society (CPSS: 10000$). There is no study-specific conflicts of interest-associated biases. Fourteen of 25 posture indices had a good reliability (ϕ ≥ 0.78), ten of 25 had moderate reliability (ϕ = 0.55 to 0.74) and one had poor reliability (ϕ = 0.45). The most reliable posture indices were waist angles asymmetry (ϕ = 0.93), right waist angle (ϕ = 0.91) and frontal trunk list (ϕ = 0.92). Right sagittal trunk list was the least reliable posture index (ϕ = 0.45). The MDC 90 values ranged from 2.6 to 10.3° for angular measurements and from 8.4 to 35.1 mm for linear measurements. This study demonstrates that most posture indices, especially the trunk posture indices, are reproducible in time among adolescents with IS and provides reference values. Clinicians and researchers can use these reference values in order to assess change in posture over time attributable to treatment effectiveness. Copyright © 2018. Published by Elsevier Inc.
[Interpreting change scores of the Behavioural Rating Scale for Geriatric Inpatients (GIP)].
Diesfeldt, H F A
2013-09-01
The Behavioural Rating Scale for Geriatric Inpatients (GIP) consists of fourteen, Rasch modelled subscales, each measuring different aspects of behavioural, cognitive and affective disturbances in elderly patients. Four additional measures are derived from the GIP: care dependency, apathy, cognition and affect. The objective of the study was to determine the reproducibility of the 18 measures. A convenience sample of 56 patients in psychogeriatric day care was assessed twice by the same observer (a professional caregiver). The median time interval between rating occasions was 45 days (interquartile range 34-58 days). Reproducibility was determined by calculating intraclass correlation coefficients (ICC agreement) for test-retest reliability. The minimal detectable difference (MDD) was calculated based on the standard error of measurement (SEM agreement). Test-retest reliability expressed by the ICCs varied from 0.57 (incoherent behaviour) to 0.93 (anxious behaviour). Standard errors of measurement varied from 0.28 (anxious behaviour) to 1.63 (care dependency). The results show how the GIP can be applied when interpreting individual change in psychogeriatric day care participants.
Retrieving the Polar Mixed-Phase Cloud Liquid Water Path by Combining CALIOP and IIR Measurements
NASA Astrophysics Data System (ADS)
Luo, Tao; Wang, Zhien; Li, Xuebin; Deng, Shumei; Huang, Yong; Wang, Yingjian
2018-02-01
Mixed-phase cloud (MC) is the dominant cloud type over the polar region, and there are challenging conditions for remote sensing and in situ measurements. In this study, a new methodology of retrieving the stratiform MC liquid water path (LWP) by combining Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) and infrared imaging radiometer (IIR) measurements was developed and evaluated. This new methodology takes the advantage of reliable cloud-phase discrimination by combining lidar and radar measurements. An improved multiple-scattering effect correction method for lidar signals was implemented to provide reliable cloud extinction near cloud top. Then with the adiabatic cloud assumption, the MC LWP can be retrieved by a lookup-table-based method. Simulations with error-free inputs showed that the mean bias and the root mean squared error of the LWP derived from the new method are -0.23 ± 2.63 g/m2, with the mean absolute relative error of 4%. Simulations with erroneous inputs suggested that the new methodology could provide reliable retrieval of LWP to support the statistical or climatology analysis. Two-month A-train satellite retrievals over Arctic region showed that the new method can produce very similar cloud top temperature (CTT) dependence of LWP to the ground-based microwave radiometer measurements, with a bias of -0.78 g/m2 and a correlation coefficient of 0.95 between the two mean CTT-LWP relationships. The new approach can also produce reasonable pattern and value of LWP in spatial distribution over the Arctic region.
ERIC Educational Resources Information Center
Padilla, Miguel A.; Divers, Jasmin
2016-01-01
Coefficient omega and alpha are both measures of the composite reliability for a set of items. Unlike coefficient alpha, coefficient omega remains unbiased with congeneric items with uncorrelated errors. Despite this ability, coefficient omega is not as widely used and cited in the literature as coefficient alpha. Reasons for coefficient omega's…
Effects of Differential Item Functioning on Examinees' Test Performance and Reliability of Test
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; Zhang, Jinming
2017-01-01
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Data driven CAN node reliability assessment for manufacturing system
NASA Astrophysics Data System (ADS)
Zhang, Leiming; Yuan, Yong; Lei, Yong
2017-01-01
The reliability of the Controller Area Network(CAN) is critical to the performance and safety of the system. However, direct bus-off time assessment tools are lacking in practice due to inaccessibility of the node information and the complexity of the node interactions upon errors. In order to measure the mean time to bus-off(MTTB) of all the nodes, a novel data driven node bus-off time assessment method for CAN network is proposed by directly using network error information. First, the corresponding network error event sequence for each node is constructed using multiple-layer network error information. Then, the generalized zero inflated Poisson process(GZIP) model is established for each node based on the error event sequence. Finally, the stochastic model is constructed to predict the MTTB of the node. The accelerated case studies with different error injection rates are conducted on a laboratory network to demonstrate the proposed method, where the network errors are generated by a computer controlled error injection system. Experiment results show that the MTTB of nodes predicted by the proposed method agree well with observations in the case studies. The proposed data driven node time to bus-off assessment method for CAN networks can successfully predict the MTTB of nodes by directly using network error event data.
Intra- and interobserver reliability of quantitative ultrasound measurement of the plantar fascia.
Rathleff, Michael Skovdal; Moelgaard, Carsten; Lykkegaard Olesen, Jens
2011-01-01
To determine intra- and interobserver reliability and measurement precision of sonographic assessment of plantar fascia thickness when using one, the mean of two, or the mean of three measurements. Two experienced observers scanned 20 healthy subjects twice with 60 minutes between test and retest. A GE LOGIQe ultrasound scanner was used in the study. The built-in software in the scanner was used to measure the thickness of the plantar fascia (PF). Reliability was calculated using intraclass correlation coefficient (ICC) and limits of agreement (LOA). Intraobserver reliability (ICC) using one measurement was 0.50 for one observer and 0.52 for the other, and using the mean of three measurements intraobserver reliability increased up to 0.77 and 0.67, respectively. Interobserver reliability (ICC) when using one measurement was 0.62 and increased to 0.82 when using the average of three measurements. LOA showed that when using the average of three measurements, LOA decreased to 0.6 mm, corresponding to 17.5% of the mean thickness of the PF. The results showed that reliability increases when using the mean of three measurements compared with one. Limits of agreement based on intratester reliability shows that changes in thickness that are larger than 0.6 mm can be considered actual changes in thickness and not a result of measurement error. Copyright © 2011 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Simmons, D. B.
1975-01-01
The DOMONIC system has been modified to run on the Univac 1108 and the CDC 6600 as well as the IBM 370 computer system. The DOMONIC monitor system has been implemented to gather data which can be used to optimize the DOMONIC system and to predict the reliability of software developed using DOMONIC. The areas of quality metrics, error characterization, program complexity, program testing, validation and verification are analyzed. A software reliability model for estimating program completion levels and one on which to base system acceptance have been developed. The DAVE system which performs flow analysis and error detection has been converted from the University of Colorado CDC 6400/6600 computer to the IBM 360/370 computer system for use with the DOMONIC system.
Opar, David A; Piatkowski, Timothy; Williams, Morgan D; Shield, Anthony J
2013-09-01
Reliability and case-control injury study. To determine if a novel device designed to measure eccentric knee flexor strength via the Nordic hamstring exercise displays acceptable test-retest reliability; to determine normative values for eccentric knee flexor strength derived from the device in individuals without a history of hamstring strain injury (HSI); and to determine if the device can detect weakness in elite athletes with a previous history of unilateral HSI. HSI and reinjury are the most common cause of lost playing time in a number of sports. Eccentric knee flexor weakness is a major modifiable risk factor for future HSI. However, at present, there is a lack of easily accessible equipment to assess eccentric knee flexor strength. Thirty recreationally active males without a history of HSI completed the Nordic hamstring exercise on the device on 2 separate occasions. Intraclass correlation coefficients, typical error, typical error as a coefficient of variation, and minimal detectable change at a 95% confidence level were calculated. Normative strength data were determined using the most reliable measurement. An additional 20 elite athletes with a unilateral history of HSI within the previous 12 months performed the Nordic hamstring exercise on the device to determine if residual eccentric muscle weakness existed in the previously injured limb. The device displayed high to moderate reliability (intraclass correlation coefficient = 0.83-0.90; typical error, 21.7-27.5 N; typical error as a coefficient of variation, 5.8%-8.5%; minimal detectable change at a 95% confidence level, 60.1-76.2 N). Mean ± SD normative eccentric flexor strength in the uninjured group was 344.7 ± 61.1 N for the left and 361.2 ± 65.1 N for the right side. The previously injured limb was 15% weaker than the contralateral uninjured limb (mean difference, 50.3 N; 95% confidence interval: 25.7, 74.9; P<.01), 15% weaker than the normative left limb (mean difference, 50.0 N; 95% confidence interval: 1.4, 98.5; P = .04), and 18% weaker than the normative right limb (mean difference, 66.5 N; 95% confidence interval: 18.0, 115.1; P<.01). The experimental device offers a reliable method to measure eccentric knee flexor strength and strength asymmetry and to detect residual weakness in previously injured elite athletes.
Understanding seasonal variability of uncertainty in hydrological prediction
NASA Astrophysics Data System (ADS)
Li, M.; Wang, Q. J.
2012-04-01
Understanding uncertainty in hydrological prediction can be highly valuable for improving the reliability of streamflow prediction. In this study, a monthly water balance model, WAPABA, in a Bayesian joint probability with error models are presented to investigate the seasonal dependency of prediction error structure. A seasonal invariant error model, analogous to traditional time series analysis, uses constant parameters for model error and account for no seasonal variations. In contrast, a seasonal variant error model uses a different set of parameters for bias, variance and autocorrelation for each individual calendar month. Potential connection amongst model parameters from similar months is not considered within the seasonal variant model and could result in over-fitting and over-parameterization. A hierarchical error model further applies some distributional restrictions on model parameters within a Bayesian hierarchical framework. An iterative algorithm is implemented to expedite the maximum a posterior (MAP) estimation of a hierarchical error model. Three error models are applied to forecasting streamflow at a catchment in southeast Australia in a cross-validation analysis. This study also presents a number of statistical measures and graphical tools to compare the predictive skills of different error models. From probability integral transform histograms and other diagnostic graphs, the hierarchical error model conforms better to reliability when compared to the seasonal invariant error model. The hierarchical error model also generally provides the most accurate mean prediction in terms of the Nash-Sutcliffe model efficiency coefficient and the best probabilistic prediction in terms of the continuous ranked probability score (CRPS). The model parameters of the seasonal variant error model are very sensitive to each cross validation, while the hierarchical error model produces much more robust and reliable model parameters. Furthermore, the result of the hierarchical error model shows that most of model parameters are not seasonal variant except for error bias. The seasonal variant error model is likely to use more parameters than necessary to maximize the posterior likelihood. The model flexibility and robustness indicates that the hierarchical error model has great potential for future streamflow predictions.
Liu, Yan; Salvendy, Gavriel
2009-05-01
This paper aims to demonstrate the effects of measurement errors on psychometric measurements in ergonomics studies. A variety of sources can cause random measurement errors in ergonomics studies and these errors can distort virtually every statistic computed and lead investigators to erroneous conclusions. The effects of measurement errors on five most widely used statistical analysis tools have been discussed and illustrated: correlation; ANOVA; linear regression; factor analysis; linear discriminant analysis. It has been shown that measurement errors can greatly attenuate correlations between variables, reduce statistical power of ANOVA, distort (overestimate, underestimate or even change the sign of) regression coefficients, underrate the explanation contributions of the most important factors in factor analysis and depreciate the significance of discriminant function and discrimination abilities of individual variables in discrimination analysis. The discussions will be restricted to subjective scales and survey methods and their reliability estimates. Other methods applied in ergonomics research, such as physical and electrophysiological measurements and chemical and biomedical analysis methods, also have issues of measurement errors, but they are beyond the scope of this paper. As there has been increasing interest in the development and testing of theories in ergonomics research, it has become very important for ergonomics researchers to understand the effects of measurement errors on their experiment results, which the authors believe is very critical to research progress in theory development and cumulative knowledge in the ergonomics field.
Brandmaier, Andreas M.; von Oertzen, Timo; Ghisletta, Paolo; Lindenberger, Ulman; Hertzog, Christopher
2018-01-01
Latent Growth Curve Models (LGCM) have become a standard technique to model change over time. Prediction and explanation of inter-individual differences in change are major goals in lifespan research. The major determinants of statistical power to detect individual differences in change are the magnitude of true inter-individual differences in linear change (LGCM slope variance), design precision, alpha level, and sample size. Here, we show that design precision can be expressed as the inverse of effective error. Effective error is determined by instrument reliability and the temporal arrangement of measurement occasions. However, it also depends on another central LGCM component, the variance of the latent intercept and its covariance with the latent slope. We derive a new reliability index for LGCM slope variance—effective curve reliability (ECR)—by scaling slope variance against effective error. ECR is interpretable as a standardized effect size index. We demonstrate how effective error, ECR, and statistical power for a likelihood ratio test of zero slope variance formally relate to each other and how they function as indices of statistical power. We also provide a computational approach to derive ECR for arbitrary intercept-slope covariance. With practical use cases, we argue for the complementary utility of the proposed indices of a study's sensitivity to detect slope variance when making a priori longitudinal design decisions or communicating study designs. PMID:29755377
Pant, Anup D; Dorairaj, Syril K; Amini, Rouzbeh
2018-07-01
Quantifying the mechanical properties of the iris is important, as it provides insight into the pathophysiology of glaucoma. Recent ex vivo studies have shown that the mechanical properties of the iris are different in glaucomatous eyes as compared to normal ones. Notwithstanding the importance of the ex vivo studies, such measurements are severely limited for diagnosis and preclude development of treatment strategies. With the advent of detailed imaging modalities, it is possible to determine the in vivo mechanical properties using inverse finite element (FE) modeling. An inverse modeling approach requires an appropriate objective function for reliable estimation of parameters. In the case of the iris, numerous measurements such as iris chord length (CL) and iris concavity (CV) are made routinely in clinical practice. In this study, we have evaluated five different objective functions chosen based on the iris biometrics (in the presence and absence of clinical measurement errors) to determine the appropriate criterion for inverse modeling. Our results showed that in the absence of experimental measurement error, a combination of iris CL and CV can be used as the objective function. However, with the addition of measurement errors, the objective functions that employ a large number of local displacement values provide more reliable outcomes.
Liu, Jin-Ya; Chen, Li-Da; Cai, Hua-Song; Liang, Jin-Yu; Xu, Ming; Huang, Yang; Li, Wei; Feng, Shi-Ting; Xie, Xiao-Yan; Lu, Ming-De; Wang, Wei
2016-01-01
AIM: To present our initial experience regarding the feasibility of ultrasound virtual endoscopy (USVE) and its measurement reliability for polyp detection in an in vitro study using pig intestine specimens. METHODS: Six porcine intestine specimens containing 30 synthetic polyps underwent USVE, computed tomography colonography (CTC) and optical colonoscopy (OC) for polyp detection. The polyp measurement defined as the maximum polyp diameter on two-dimensional (2D) multiplanar reformatted (MPR) planes was obtained by USVE, and the absolute measurement error was analyzed using the direct measurement as the reference standard. RESULTS: USVE detected 29 (96.7%) of 30 polyps, remaining a 7-mm one missed. There was one false-positive finding. Twenty-six (89.7%) of 29 reconstructed images were clearly depicted, while 29 (96.7%) of 30 polyps were displayed on CTC with one false-negative finding. In OC, all the polyps were detected. The intraclass correlation coefficient was 0.876 (95%CI: 0.745-0.940) for measurements obtained with USVE. The pooled absolute measurement errors ± the standard deviations of the depicted polyps with actual sizes ≤ 5 mm, 6-9 mm, and ≥ 10 mm were 1.9 ± 0.8 mm, 0.9 ± 1.2 mm, and 1.0 ± 1.4 mm, respectively. CONCLUSION: USVE is reliable for polyp detection and measurement in in vitro study. PMID:27022217
Bisi-Balogun, Adebisi; Cassel, Michael; Mayer, Frank
2016-04-13
This study aimed to determine the relative and absolute reliability of ultrasound (US) measurements of the thickness and echogenicity of the plantar fascia (PF) at different measurement stations along its length using a standardized protocol. Twelve healthy subjects (24 feet) were enrolled. The PF was imaged in the longitudinal plane. Subjects were assessed twice to evaluate the intra-rater reliability. A quantitative evaluation of the thickness and echogenicity of the plantar fascia was performed using Image J, a digital image analysis and viewer software. A sonography evaluation of the thickness and echogenicity of the PF showed a high relative reliability with an Intra class correlation coefficient of ≥0.88 at all measurement stations. However, the measurement stations for both the PF thickness and echogenicity which showed the highest intraclass correlation coefficient (ICCs) did not have the highest absolute reliability. Compared to other measurement stations, measuring the PF thickness at 3 cm distal and the echogenicity at a region of interest 1 cm to 2 cm distal from its insertion at the medial calcaneal tubercle showed the highest absolute reliability with the least systematic bias and random error. Also, the reliability was higher using a mean of three measurements compared to one measurement. To reduce discrepancies in the interpretation of the thickness and echogenicity measurements of the PF, the absolute reliability of the different measurement stations should be considered in clinical practice and research rather than the relative reliability with the ICC.
Bisi-Balogun, Adebisi; Cassel, Michael; Mayer, Frank
2016-01-01
This study aimed to determine the relative and absolute reliability of ultrasound (US) measurements of the thickness and echogenicity of the plantar fascia (PF) at different measurement stations along its length using a standardized protocol. Twelve healthy subjects (24 feet) were enrolled. The PF was imaged in the longitudinal plane. Subjects were assessed twice to evaluate the intra-rater reliability. A quantitative evaluation of the thickness and echogenicity of the plantar fascia was performed using Image J, a digital image analysis and viewer software. A sonography evaluation of the thickness and echogenicity of the PF showed a high relative reliability with an Intra class correlation coefficient of ≥0.88 at all measurement stations. However, the measurement stations for both the PF thickness and echogenicity which showed the highest intraclass correlation coefficient (ICCs) did not have the highest absolute reliability. Compared to other measurement stations, measuring the PF thickness at 3 cm distal and the echogenicity at a region of interest 1 cm to 2 cm distal from its insertion at the medial calcaneal tubercle showed the highest absolute reliability with the least systematic bias and random error. Also, the reliability was higher using a mean of three measurements compared to one measurement. To reduce discrepancies in the interpretation of the thickness and echogenicity measurements of the PF, the absolute reliability of the different measurement stations should be considered in clinical practice and research rather than the relative reliability with the ICC. PMID:27089369
Conkle, Joel; Ramakrishnan, Usha; Flores-Ayala, Rafael; Suchdev, Parminder S; Martorell, Reynaldo
2017-01-01
Anthropometric data collected in clinics and surveys are often inaccurate and unreliable due to measurement error. The Body Imaging for Nutritional Assessment Study (BINA) evaluated the ability of 3D imaging to correctly measure stature, head circumference (HC) and arm circumference (MUAC) for children under five years of age. This paper describes the protocol for and the quality of manual anthropometric measurements in BINA, a study conducted in 2016-17 in Atlanta, USA. Quality was evaluated by examining digit preference, biological plausibility of z-scores, z-score standard deviations, and reliability. We calculated z-scores and analyzed plausibility based on the 2006 WHO Child Growth Standards (CGS). For reliability, we calculated intra- and inter-observer Technical Error of Measurement (TEM) and Intraclass Correlation Coefficient (ICC). We found low digit preference; 99.6% of z-scores were biologically plausible, with z-score standard deviations ranging from 0.92 to 1.07. Total TEM was 0.40 for stature, 0.28 for HC, and 0.25 for MUAC in centimeters. ICC ranged from 0.99 to 1.00. The quality of manual measurements in BINA was high and similar to that of the anthropometric data used to develop the WHO CGS. We attributed high quality to vigorous training, motivated and competent field staff, reduction of non-measurement error through the use of technology, and reduction of measurement error through adequate monitoring and supervision. Our anthropometry measurement protocol, which builds on and improves upon the protocol used for the WHO CGS, can be used to improve anthropometric data quality. The discussion illustrates the need to standardize anthropometric data quality assessment, and we conclude that BINA can provide a valuable evaluation of 3D imaging for child anthropometry because there is comparison to gold-standard, manual measurements.
Zou, Yun; Han, Qing; Weng, Xisheng; Zou, Yongwei; Yang, Yingying; Zhang, Kesong; Yang, Kerong; Xu, Xiaolin; Wang, Chenyu; Qin, Yanguo; Wang, Jincheng
2018-01-01
Abstract Recently, clinical application of 3D printed model was increasing. However, there was no systemic study for confirming the precision and reliability of 3D printed model. Some senior clinical doctors mistrusted its reliability in clinical application. The purpose of this study was to evaluate the precision and reliability of stereolithography appearance (SLA) 3D printed model. Some related parameters were selected to research the reliability of SLA 3D printed model. The computed tomography (CT) data of bone/prosthesis and model were collected and 3D reconstructed. Some anatomical parameters were measured and statistical analysis was performed; the intraclass correlation coefficient (ICC) was used to was used to evaluate the similarity between the model and real bone/prosthesis. the absolute difference (mm) and relative difference (%) were conducted. For prosthesis model, the 3-dimensional error was measured. There was no significant difference in the anatomical parameters except max height (MH) of long bone. All the ICCs were greater than 0.990. The maximum absolute and relative difference were 0.45 mm and 1.10%; The 3-dimensional error analysis showed that positive/minus distance were 0.273 mm/0.237 mm. The application of SLA 3D printed model in diagnosis and treatment process of complex orthopedic disease was reliable and precise. PMID:29419675
Zou, Yun; Han, Qing; Weng, Xisheng; Zou, Yongwei; Yang, Yingying; Zhang, Kesong; Yang, Kerong; Xu, Xiaolin; Wang, Chenyu; Qin, Yanguo; Wang, Jincheng
2018-02-01
Recently, clinical application of 3D printed model was increasing. However, there was no systemic study for confirming the precision and reliability of 3D printed model. Some senior clinical doctors mistrusted its reliability in clinical application. The purpose of this study was to evaluate the precision and reliability of stereolithography appearance (SLA) 3D printed model.Some related parameters were selected to research the reliability of SLA 3D printed model. The computed tomography (CT) data of bone/prosthesis and model were collected and 3D reconstructed. Some anatomical parameters were measured and statistical analysis was performed; the intraclass correlation coefficient (ICC) was used to was used to evaluate the similarity between the model and real bone/prosthesis. the absolute difference (mm) and relative difference (%) were conducted. For prosthesis model, the 3-dimensional error was measured.There was no significant difference in the anatomical parameters except max height (MH) of long bone. All the ICCs were greater than 0.990. The maximum absolute and relative difference were 0.45 mm and 1.10%; The 3-dimensional error analysis showed that positive/minus distance were 0.273 mm/0.237 mm.The application of SLA 3D printed model in diagnosis and treatment process of complex orthopedic disease was reliable and precise.
Swanenburg, Jaap; Nevzati, Arian; Mittaz Hager, Anne Gabrielle; de Bruin, Eling D; Klipstein, Andreas
2013-01-01
The aim of this study was to test the reliability and validity of a preferred-standing test for measuring the risk of falling. The preferred-standing position of elderly fallers and non-fallers and healthy young adults was measured. The maximal BSW was measured. The absolute and relative reliability and discriminant validity were assessed. The expanded timed get-up-and-go test (ETGUG), one-leg stance test (OS), tandem stance (TS), and falls efficacy scale international version (FES-I) were used to determine criterion validity. In total, 146 persons (102 females, 44 males; mean age 55±22 years, range 20-94) were recruited. Forty elderly community dwellers (8 fallers) and 26 young adults were tested twice to determine the test-retest reliability. The BSW showed acceptable test-retest reliability (Intraclass correlation coefficient, ICC2,1=0.77-0.83) and inter-rater reliability (ICC3,1=0.77-0.95) for all groups. The standard error of measurement (SEM) was between 0.77 and 1.87, and the smallest detectable change (SDC) was between 2.14cm and 5.19cm. The Bland-Altman plot revealed no systematic errors. There was significant difference between elderly fallers and non-fallers (F(1/75)=11.951; p=0.001. Spearman's rho coefficient values showed no correlation between the BSW and the ETGUG (-0.17, p=0.47), OLS (-0.04, p=0.65), TS (-0.11, p=0.21), and FES-I (-0.10; p=0.27). Only the BSW was a significant predictor for falling (odds ratio=0.736, p=0.007). The reliability and validity of the BSW protocol were acceptable overall. Prospective studies are warranted to evaluate the predictive value of the BSW for determining the risk of falling. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Nitschke, J E; Nattrass, C L; Disler, P B; Chou, M J; Ooi, K T
1999-02-01
Repeated measures design for intra- and interrater reliability. To determine the intra- and interrater reliability of the lumbar spine range of motion measured with a dual inclinometer, and the thoracolumbar spine range of motion measured with a long-arm goniometer, as recommended in the American Medical Association Guides. The American Medical Association Guides (2nd and 4th editions) recommend using measurements of thoracolumbar and lumbar range of movement, respectively, to estimate the percentage of permanent impairment in patients with chronic low back pain. However, the reliability of this method of estimating impairment has not been determined. In all, 34 subjects participated in the study, 21 women with a mean age of 40.1 years (SD, +/- 11.1) and 13 men with a mean age of 47.7 years (SD, +/- 12.1). Measures of thoracolumbar flexion, extension, lateral flexion, and rotation were obtained with a long-arm goniometer. Lumbar flexion, extension, and lateral flexion were measured with a dual inclinometer. Measurements were taken by two examiners on one occasion and by one examiner on two occasions approximately 1 week apart. The results showed poor intra- and interrater reliability for all measurements taken with both instruments. Measurement error expressed in degrees showed that measurements taken by different raters exhibited systematic as well as random differences. As a result, subjects measured by two different examiners on the same day, with either instrument, could give impairment ratings ranging between 0% and 18% of the whole person (excluding rotation), in which percentage impairment is calculated using the average range of motion and the average systematic and random error in degrees for the group for each movement (flexion, extension, and lateral flexion). The poor reliability of the American Medical Association Guides' spinal range of motion model can result in marked variation in the percentage of whole-body impairment. These findings have implications for compensation bodies in Australia and other countries that use the American Medical Association Guides' procedure to estimate impairment in chronic low back pain patients.
Wilharm, A.; Hurschler, Ch.; Dermitas, T.; Bohnsack, M.
2013-01-01
Pressure-sensitive K-Scan 4000 sensors (Tekscan, USA) provide new possibilities for the dynamic measurement of force and pressure in biomechanical investigations. We examined the sensors to determine in particular whether they are also suitable for reliable measurements of retropatellar forces and pressures. Insertion approaches were also investigated and a lateral parapatellar arthrotomy supplemented by parapatellar sutures proved to be the most reliable method. The ten human cadaver knees were tested in a knee-simulating machine at a torque of 30 and 40 Nm. Each test cycle involved a dynamic extension from 120° flexion. All recorded parameters showed a decrease of 1-2% per measurement cycle. Although we supplemented the sensors with a Teflon film, the decrease, which was likely caused by shear force, was significant. We evaluated 12 cycles and observed a linear decrease in parameters up to 17.2% (coefficient of regression 0.69–0.99). In our opinion, the linear decrease can be considered a systematic error and can therefore be quantified and accounted for in subsequent experiments. That will ensure reliable retropatellar usage of Tekscan sensors and distinguish the effects of knee joint surgeries from sensor wear-related effects. PMID:24369018
The Trojan Lifetime Champions Health Survey: development, validity, and reliability.
Sorenson, Shawn C; Romano, Russell; Scholefield, Robin M; Schroeder, E Todd; Azen, Stanley P; Salem, George J
2015-04-01
Self-report questionnaires are an important method of evaluating lifespan health, exercise, and health-related quality of life (HRQL) outcomes among elite, competitive athletes. Few instruments, however, have undergone formal characterization of their psychometric properties within this population. To evaluate the validity and reliability of a novel health and exercise questionnaire, the Trojan Lifetime Champions (TLC) Health Survey. Descriptive laboratory study. A large National Collegiate Athletic Association Division I university. A total of 63 university alumni (age range, 24 to 84 years), including former varsity collegiate athletes and a control group of nonathletes. Participants completed the TLC Health Survey twice at a mean interval of 23 days with randomization to the paper or electronic version of the instrument. Content validity, feasibility of administration, test-retest reliability, parallel-form reliability between paper and electronic forms, and estimates of systematic and typical error versus differences of clinical interest were assessed across a broad range of health, exercise, and HRQL measures. Correlation coefficients, including intraclass correlation coefficients (ICCs) for continuous variables and κ agreement statistics for ordinal variables, for test-retest reliability averaged 0.86, 0.90, 0.80, and 0.74 for HRQL, lifetime health, recent health, and exercise variables, respectively. Correlation coefficients, again ICCs and κ, for parallel-form reliability (ie, equivalence) between paper and electronic versions averaged 0.90, 0.85, 0.85, and 0.81 for HRQL, lifetime health, recent health, and exercise variables, respectively. Typical measurement error was less than the a priori thresholds of clinical interest, and we found minimal evidence of systematic test-retest error. We found strong evidence of content validity, convergent construct validity with the Short-Form 12 Version 2 HRQL instrument, and feasibility of administration in an elite, competitive athletic population. These data suggest that the TLC Health Survey is a valid and reliable instrument for assessing lifetime and recent health, exercise, and HRQL, among elite competitive athletes. Generalizability of the instrument may be enhanced by additional, larger-scale studies in diverse populations.
Improved characterisation of measurement errors in electrical resistivity tomography (ERT) surveys
NASA Astrophysics Data System (ADS)
Tso, C. H. M.; Binley, A. M.; Kuras, O.; Graham, J.
2016-12-01
Measurement errors can play a pivotal role in geophysical inversion. Most inverse models require users to prescribe a statistical model of data errors before inversion. Wrongly prescribed error levels can lead to over- or under-fitting of data, yet commonly used models of measurement error are relatively simplistic. With the heightening interests in uncertainty estimation across hydrogeophysics, better characterisation and treatment of measurement errors is needed to provide more reliable estimates of uncertainty. We have analysed two time-lapse electrical resistivity tomography (ERT) datasets; one contains 96 sets of direct and reciprocal data collected from a surface ERT line within a 24h timeframe, while the other is a year-long cross-borehole survey at a UK nuclear site with over 50,000 daily measurements. Our study included the characterisation of the spatial and temporal behaviour of measurement errors using autocorrelation and covariance analysis. We find that, in addition to well-known proportionality effects, ERT measurements can also be sensitive to the combination of electrodes used. This agrees with reported speculation in previous literature that ERT errors could be somewhat correlated. Based on these findings, we develop a new error model that allows grouping based on electrode number in additional to fitting a linear model to transfer resistance. The new model fits the observed measurement errors better and shows superior inversion and uncertainty estimates in synthetic examples. It is robust, because it groups errors together based on the number of the four electrodes used to make each measurement. The new model can be readily applied to the diagonal data weighting matrix commonly used in classical inversion methods, as well as to the data covariance matrix in the Bayesian inversion framework. We demonstrate its application using extensive ERT monitoring datasets from the two aforementioned sites.
Aartun, Ellen; Degerfalk, Anna; Kentsdotter, Linn; Hestbaek, Lise
2014-02-10
Evidence on the reliability of clinical tests used for the spinal screening of children and adolescents is currently lacking. The aim of this study was to determine the inter- and intra-rater reliability and measurement error of clinical tests commonly used when screening young spines. Two experienced chiropractors independently assessed 111 adolescents aged 12-14 years who were recruited from a primary school in Denmark. A standardised examination protocol was used to test inter-rater reliability including tests for scoliosis, hypermobility, general mobility, inter-segmental mobility and end range pain in the spine. Seventy-five of the 111 subjects were re-examined after one to four hours to test intra-rater reliability. Percentage agreement and Cohen's Kappa were calculated for binary variables, and interclass correlation (ICC) and Bland-Altman plots with Limits of Agreement (LoA) were calculated for continuous measures. Inter-rater percentage agreement for binary data ranged from 59.5% to 100%. Kappa ranged from 0.06-1.00. Kappa ≥ 0.40 was seen for elbow, thumb, fifth finger and trunk/hip flexion hypermobility, pain response in inter-segmental mobility and end range pain in lumbar flexion and extension. For continuous data, ICCs ranged from 0.40-0.95. Only forward flexion as measured by finger-to-floor distance reached an acceptable ICC(≥ 0.75). Overall, results for intra-rater reliability were better than for inter-rater reliability but for both components, the LoA were quite wide compared with the range of assessments. Some clinical tests showed good, and some tests poor, reliability when applied in a spinal screening of adolescents. The results could probably be improved by additional training and further test standardization. This is the first step in evaluating the value of these tests for the spinal screening of adolescents. Future research should determine the association between these tests and current and/or future neck and back pain.
Wavelet-based multiscale performance analysis: An approach to assess and improve hydrological models
NASA Astrophysics Data System (ADS)
Rathinasamy, Maheswaran; Khosa, Rakesh; Adamowski, Jan; ch, Sudheer; Partheepan, G.; Anand, Jatin; Narsimlu, Boini
2014-12-01
The temporal dynamics of hydrological processes are spread across different time scales and, as such, the performance of hydrological models cannot be estimated reliably from global performance measures that assign a single number to the fit of a simulated time series to an observed reference series. Accordingly, it is important to analyze model performance at different time scales. Wavelets have been used extensively in the area of hydrological modeling for multiscale analysis, and have been shown to be very reliable and useful in understanding dynamics across time scales and as these evolve in time. In this paper, a wavelet-based multiscale performance measure for hydrological models is proposed and tested (i.e., Multiscale Nash-Sutcliffe Criteria and Multiscale Normalized Root Mean Square Error). The main advantage of this method is that it provides a quantitative measure of model performance across different time scales. In the proposed approach, model and observed time series are decomposed using the Discrete Wavelet Transform (known as the à trous wavelet transform), and performance measures of the model are obtained at each time scale. The applicability of the proposed method was explored using various case studies-both real as well as synthetic. The synthetic case studies included various kinds of errors (e.g., timing error, under and over prediction of high and low flows) in outputs from a hydrologic model. The real time case studies investigated in this study included simulation results of both the process-based Soil Water Assessment Tool (SWAT) model, as well as statistical models, namely the Coupled Wavelet-Volterra (WVC), Artificial Neural Network (ANN), and Auto Regressive Moving Average (ARMA) methods. For the SWAT model, data from Wainganga and Sind Basin (India) were used, while for the Wavelet Volterra, ANN and ARMA models, data from the Cauvery River Basin (India) and Fraser River (Canada) were used. The study also explored the effect of the choice of the wavelets in multiscale model evaluation. It was found that the proposed wavelet-based performance measures, namely the MNSC (Multiscale Nash-Sutcliffe Criteria) and MNRMSE (Multiscale Normalized Root Mean Square Error), are a more reliable measure than traditional performance measures such as the Nash-Sutcliffe Criteria (NSC), Root Mean Square Error (RMSE), and Normalized Root Mean Square Error (NRMSE). Further, the proposed methodology can be used to: i) compare different hydrological models (both physical and statistical models), and ii) help in model calibration.
The FLIR ONE thermal imager for the assessment of burn wounds: Reliability and validity study.
Jaspers, M E H; Carrière, M E; Meij-de Vries, A; Klaessens, J H G M; van Zuijlen, P P M
2017-11-01
Objective measurement tools may be of great value to provide early and reliable burn wound assessment. Thermal imaging is an easy, accessible and objective technique, which measures skin temperature as an indicator of tissue perfusion. These thermal images might be helpful in the assessment of burn wounds. However, before implementation of a novel measurement tool into clinical practice is considered, it is appropriate to test its clinimetric properties (i.e. reliability and validity). The objective of this study was to assess the reliability and validity of the recently introduced FLIR ONE thermal imager. Two observers obtained thermal images of burn wounds in adult patients at day 1-3, 4-7 and 8-10 after burn. Subsequently, temperature differences between the burn wound and healthy skin (ΔT) were calculated on an iPad mini containing the FLIR Tools app. To assess reliability, ΔT values of both observers were compared by calculating the intraclass correlation coefficient (ICC) and measurement error parameters. To assess validity, the ΔT values of the first observer were compared to the registered healing time of the burn wounds, which was specified into three categories: (I) ≤14 days, (II) 15-21 days and (III) >21 days. The ability of the FLIR ONE to discriminate between healing ≤21 days and >21 days was evaluated by means of a receiver operating characteristic curve and an optimal ΔT cut-off value. Reliability: ICCs were 0.99 for each time point, indicating excellent reliability up to 10 days after burn. The standard error of measurement varied between 0.17-0.22°C. the area under the curve was calculated at 0.69 (95% CI 0.54-0.84). A cut-off value of -1.15°C shows a moderate discrimination between burn wound healing ≤21 days and >21 days (46% sensitivity; 82% specificity). Our results show that the FLIR ONE thermal imager is highly reliable, but the moderate validity calls for additional research. However, the FLIR ONE is pre-eminently feasible, allowing easy and fast measurements in clinical burn practice. Copyright © 2017 Elsevier Ltd and ISBI. All rights reserved.
A Practical Method for Identifying Significant Change Scores
ERIC Educational Resources Information Center
Cascio, Wayne F.; Kurtines, William M.
1977-01-01
A test of significance for identifying individuals who are most influenced by an experimental treatment as measured by pre-post test change score is presented. The technique requires true difference scores, the reliability of obtained differences, and their standard error of measurement. (Author/JKS)
Palta, Mari; Chen, Han-Yang; Kaplan, Robert M.; Feeny, David; Cherepanov, Dasha; Fryback, Dennis
2011-01-01
Background Standard errors of measurement (SEMs) of health related quality of life (HRQoL) indexes are not well characterized. SEM is needed to estimate responsiveness statistics and provides guidance on using indexes on the individual and group level. SEM is also a component of reliability. Purpose To estimate SEM of five HRQoL indexes. Design The National Health Measurement Study (NHMS) was a population based telephone survey. The Clinical Outcomes and Measurement of Health Study (COMHS) provided repeated measures 1 and 6 months post cataract surgery. Subjects 3844 randomly selected adults from the non-institutionalized population 35 to 89 years old in the contiguous United States and 265 cataract patients. Measurements The SF6-36v2™, QWB-SA, EQ-5D, HUI2 and HUI3 were included. An item-response theory (IRT) approach captured joint variation in indexes into a composite construct of health (theta). We estimated: (1) the test-retest standard deviation (SEM-TR) from COMHS, (2) the structural standard deviation (SEM-S) around the composite construct from NHMS and (3) corresponding reliability coefficients. Results SEM-TR was 0.068 (SF-6D), 0.087 (QWB-SA), 0.093 (EQ-5D), 0.100 (HUI2) and 0.134 (HUI3), while SEM-S was 0.071, 0.094, 0.084, 0.074 and 0.117, respectively. These translate into reliability coefficients for SF-6D: 0.66 (COMHS) and 0.71 (NHMS), for QWB: 0.59 and 0.64, for EQ-5D: 0.61 and 0.70 for HUI2: 0.64 and 0.80, and for HUI3: 0.75 and 0.77, respectively. The SEM varied considerably across levels of health, especially for HUI2, HUI3 and EQ-5D, and was strongly influenced by ceiling effects. Limitations Repeated measures were five months apart and estimated theta contain measurement error. Conclusions The two types of SEM are similar and substantial for all the indexes, and vary across the range of health. PMID:20935280
ERIC Educational Resources Information Center
Abry, Tashia; Cash, Anne H.; Bradshaw, Catherine P.
2014-01-01
Generalizability theory (GT) offers a useful framework for estimating the reliability of a measure while accounting for multiple sources of error variance. The purpose of this study was to use GT to examine multiple sources of variance in and the reliability of school-level teacher and high school student behaviors as observed using the tool,…
ERIC Educational Resources Information Center
Henson, Robin K.; Thompson, Bruce
Given the potential value of reliability generalization (RG) studies in the development of cumulative psychometric knowledge, the purpose of this paper is to provide a tutorial on how to conduct such studies and to serve as a guide for researchers wishing to use this methodology. After some brief comments on classical test theory, the paper…
Individual differences in the calibration of trust in automation.
Pop, Vlad L; Shrewsbury, Alex; Durso, Francis T
2015-06-01
The objective was to determine whether operators with an expectancy that automation is trustworthy are better at calibrating their trust to changes in the capabilities of automation, and if so, why. Studies suggest that individual differences in automation expectancy may be able to account for why changes in the capabilities of automation lead to a substantial change in trust for some, yet only a small change for others. In a baggage screening task, 225 participants searched for weapons in 200 X-ray images of luggage. Participants were assisted by an automated decision aid exhibiting different levels of reliability. Measures of expectancy that automation is trustworthy were used in conjunction with subjective measures of trust and perceived reliability to identify individual differences in trust calibration. Operators with high expectancy that automation is trustworthy were more sensitive to changes (both increases and decreases) in automation reliability. This difference was eliminated by manipulating the causal attribution of automation errors. Attributing the cause of automation errors to factors external to the automation fosters an understanding of tasks and situations in which automation differs in reliability and may lead to more appropriate trust. The development of interventions can lead to calibrated trust in automation. © 2014, Human Factors and Ergonomics Society.
Goede, Simon L; Leow, Melvin Khee-Shing
2013-01-01
This treatise investigates error sources in measurements applicable to the hypothalamus-pituitary-thyroid (HPT) system of analysis for homeostatic set point computation. The hypothalamus-pituitary transfer characteristic (HP curve) describes the relationship between plasma free thyroxine [FT4] and thyrotropin [TSH]. We define the origin, types, causes, and effects of errors that are commonly encountered in TFT measurements and examine how we can interpret these to construct a reliable HP function for set point establishment. The error sources in the clinical measurement procedures are identified and analyzed in relation to the constructed HP model. The main sources of measurement and interpretation uncertainties are (1) diurnal variations in [TSH], (2) TFT measurement variations influenced by timing of thyroid medications, (3) error sensitivity in ranges of [TSH] and [FT4] (laboratory assay dependent), (4) rounding/truncation of decimals in [FT4] which in turn amplify curve fitting errors in the [TSH] domain in the lower [FT4] range, (5) memory effects (rate-independent hysteresis effect). When the main uncertainties in thyroid function tests (TFT) are identified and analyzed, we can find the most acceptable model space with which we can construct the best HP function and the related set point area.
Hinman, Rana S; Dobson, Fiona; Takla, Amir; O'Donnell, John; Bennell, Kim L
2014-03-01
The most reliable patient-reported outcomes (PROs) for people with femoroacetabular impingement (FAI) is unknown because there have been no direct comparisons of questionnaires. Thus, the aim was to evaluate the test-retest reliability of six existing PROs in a single cohort of young active people with hip/groin pain consistent with a clinical diagnosis of FAI. Young adults with clinical FAI completed six PRO questionnaires on two occasions, 1-2 weeks apart. The PROs were modified Harris Hip Score, Hip dysfunction and Osteoarthritis Score, Hip Outcome Score, Non-Arthritic Hip Score, International Hip Outcome Tool, Copenhagen Hip and Groin Outcome Score. 30 young adults (mean age 24 years, SD 4 years, range 18-30 years; 15 men) with stable symptoms participated. Intraclass correlation coefficient(3,1) values ranged from 0.73 to 0.93 (95% CI 0.38 to 0.98) indicating that most questionnaires reached minimal reliability benchmarks. Measurement error at the individual level was quite large for most questionnaires (minimal detectable change (MDC95) 12.4-35.6, 95% CI 8.7 to 54.0). In contrast, measurement error at the group level was quite small for most questionnaires (MDC95 2.2-7.3, 95% CI 1.6 to 11). The majority of the questionnaires were reliable and precise enough for use at the group level. Samples of only 23-30 individuals were required to achieve acceptable measurement variation at the group level. Further direct comparisons of these questionnaires are required to assess other measurement properties such as validity, responsiveness and meaningful change in young people with FAI.
Bania, Theofani
2014-09-01
We determined the criterion validity and the retest reliability of the ΑctivPAL™ monitor in young people with diplegic cerebral palsy (CP). Activity monitor data were compared with the criterion of video recording for 10 participants. For the retest reliability, activity monitor data were collected from 24 participants on two occasions. Participants had to have diplegic CP and be between 14 and 22 years of age. They also had to be of Gross Motor Function Classification System level II or III. Outcomes were time spent in standing, number of steps (physical activity) and time spent in sitting (sedentary behaviour). For criterion validity, coefficients of determination were all high (r(2) ≥ 0.96), and limits of group agreement were relatively narrow, but limits of agreement for individuals were narrow only for number of steps (≥5.5%). Relative reliability was high for number of steps (intraclass correlation coefficient = 0.87) and moderate for time spent in sitting and lying, and time spent in standing (intraclass correlation coefficients = 0.60-0.66). For groups, changes of up to 7% could be due to measurement error with 95% confidence, but for individuals, changes as high as 68% could be due to measurement error. The results support the criterion validity and the retest reliability of the ActivPAL™ to measure physical activity and sedentary behaviour in groups of young people with diplegic CP but not in individuals. Copyright © 2014 John Wiley & Sons, Ltd.
Reliability of real-time ultrasound for the assessment of transversus abdominis function.
Kidd, Adrian W; Magee, Scott; Richardson, Carolyn A
2002-07-01
Transversus abdominis (TrA) has now been established as a key muscle for the stabilization of the lumbar spine and sacroiliac joints. Significantly, dysfunction of this muscle has also been implicated in low back pain. Real-time ultrasound (US) is a non-invasive procedure that has the potential to evaluate objectively the function of TrA. To investigate M-mode US as a reliable method of assessing TrA function. M-mode US was used to measure the width of TrA as subjects drew in their lower abdominal wall at a controlled speed to a target depth. Eleven subjects were imaged. the measures of TrA width were reliable and ranged between 3.14mm relaxed and 6.35mm contracted. The standard error of measurement ranged between 0.18mm and 0.57mm. M-mode US provides a reliable non-invasive measure of a controlled contraction of TrA.
Errors in MR-based attenuation correction for brain imaging with PET/MR scanners
NASA Astrophysics Data System (ADS)
Rota Kops, Elena; Herzog, Hans
2013-02-01
AimAttenuation correction of PET data acquired by hybrid MR/PET scanners remains a challenge, even if several methods for brain and whole-body measurements have been developed recently. A template-based attenuation correction for brain imaging proposed by our group is easy to handle and delivers reliable attenuation maps in a short time. However, some potential error sources are analyzed in this study. We investigated the choice of template reference head among all the available data (error A), and possible skull anomalies of the specific patient, such as discontinuities due to surgery (error B). Materials and methodsAn anatomical MR measurement and a 2-bed-position transmission scan covering the whole head and neck region were performed in eight normal subjects (4 females, 4 males). Error A: Taking alternatively one of the eight heads as reference, eight different templates were created by nonlinearly registering the images to the reference and calculating the average. Eight patients (4 females, 4 males; 4 with brain lesions, 4 w/o brain lesions) were measured in the Siemens BrainPET/MR scanner. The eight templates were used to generate the patients' attenuation maps required for reconstruction. ROI and VOI atlas-based comparisons were performed employing all the reconstructed images. Error B: CT-based attenuation maps of two volunteers were manipulated by manually inserting several skull lesions and filling a nasal cavity. The corresponding attenuation coefficients were substituted with the water's coefficient (0.096/cm). ResultsError A: The mean SUVs over the eight templates pairs for all eight patients and all VOIs did not differ significantly one from each other. Standard deviations up to 1.24% were found. Error B: After reconstruction of the volunteers' BrainPET data with the CT-based attenuation maps without and with skull anomalies, a VOI-atlas analysis was performed revealing very little influence of the skull lesions (less than 3%), while the filled nasal cavity yielded an overestimation in cerebellum up to 5%. ConclusionsThe present error analysis confirms that our template-based attenuation method provides reliable attenuation corrections of PET brain imaging measured in PET/MR scanners.
Mazaheri, Masood; Negahban, Hossein; Salavati, Mahyar; Sanjari, Mohammad Ali; Parnianpour, Mohamad
2010-09-01
Although the application of nonlinear tools including recurrence quantification analysis (RQA) has increasingly grown in the recent years especially in balance-disordered populations, there have been few studies which determine their measurement properties. Therefore, a methodological study was performed to estimate the intersession and intrasession reliability of some dynamic features provided by RQA for nonlinear analysis of center of pressure (COP) signals recorded during quiet standing in a sample of patients with musculoskeletal disorders (MSDs) including low back pain (LBP), anterior cruciate ligament (ACL) injury and functional ankle instability (FAI). The subjects completed postural measurements with three levels of difficulty (rigid surface-eyes open, rigid surface-eyes closed, and foam surface-eyes closed). Four RQA measures (% recurrence, % determinism, entropy, and trend) were extracted from the recurrence plot. Relative reliability of these measures was assessed using intraclass correlation coefficient and absolute reliability using standard error of measurement and coefficient of variation. % Determinism and entropy were the most reliable features of RQA for the both intersession and intrasession reliability measures. High level of reliability of % determinism and entropy in this preliminary investigation may show their clinical promise for discriminative and evaluative purposes of balance performance. 2010 IPEM. Published by Elsevier Ltd. All rights reserved.
Bayes Error Rate Estimation Using Classifier Ensembles
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Ghosh, Joydeep
2003-01-01
The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating th is rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the accuracy achieved by a given classifier with the Bayes rate, one can quantify how effective that classifier is. Classical approaches for estimating or finding bounds for the Bayes error, in general, yield rather weak results for small sample sizes; unless the problem has some simple characteristics, such as Gaussian class-conditional likelihoods. This article shows how the outputs of a classifier ensemble can be used to provide reliable and easily obtainable estimates of the Bayes error with negligible extra computation. Three methods of varying sophistication are described. First, we present a framework that estimates the Bayes error when multiple classifiers, each providing an estimate of the a posteriori class probabilities, a recombined through averaging. Second, we bolster this approach by adding an information theoretic measure of output correlation to the estimate. Finally, we discuss a more general method that just looks at the class labels indicated by ensem ble members and provides error estimates based on the disagreements among classifiers. The methods are illustrated for artificial data, a difficult four-class problem involving underwater acoustic data, and two problems from the Problem benchmarks. For data sets with known Bayes error, the combiner-based methods introduced in this article outperform existing methods. The estimates obtained by the proposed methods also seem quite reliable for the real-life data sets for which the true Bayes rates are unknown.
Mihm, F G; Feeley, T W; Jamieson, S W
1987-01-01
The thermal dye double indicator dilution technique for estimating lung water was compared with gravimetric analyses in nine human subjects who were organ donors. As observed in animal studies, the thermal dye measurement of extravascular thermal volume (EVTV) consistently overestimated gravimetric extravascular lung water (EVLW), the mean (SEM) difference being 3.43 (0.59) ml/kg. In eight of the nine subjects the EVTV -3.43 ml/kg would yield an estimate of EVLW that would be from 3.23 ml/kg under to 3.37 ml/kg over the actual value EVLW at the 95% confidence limits. Reproducibility, assessed with the standard error of the mean percentage, suggested that a 15% change in EVTV can be reliably detected with repeated measurements. One subject was excluded from analysis because the EVTV measurement grossly underestimated its actual EVLW. This error was associated with regional injury observed on gross examination of the lung. Experimental and clinical evidence suggest that the thermal dye measurement provides a reliable estimate of lung water in diffuse pulmonary oedema states. PMID:3616974
Analysis of the impact of error detection on computer performance
NASA Technical Reports Server (NTRS)
Shin, K. C.; Lee, Y. H.
1983-01-01
Conventionally, reliability analyses either assume that a fault/error is detected immediately following its occurrence, or neglect damages caused by latent errors. Though unrealistic, this assumption was imposed in order to avoid the difficulty of determining the respective probabilities that a fault induces an error and the error is then detected in a random amount of time after its occurrence. As a remedy for this problem a model is proposed to analyze the impact of error detection on computer performance under moderate assumptions. Error latency, the time interval between occurrence and the moment of detection, is used to measure the effectiveness of a detection mechanism. This model is used to: (1) predict the probability of producing an unreliable result, and (2) estimate the loss of computation due to fault and/or error.
Test-retest reliability of the Progressive Isoinertial Lifting Evaluation (PILE).
Lygren, Hildegunn; Dragesund, Tove; Joensen, Jón; Ask, Tove; Moe-Nilssen, Rolf
2005-05-01
A repeated measures single group design. To investigate test-retest reliability of Progressive Isoinertial Lifting Evaluation on patients with long lasting musculoskeletal problems related to the lumbar spine. Test-retest reliability has been satisfactory in healthy men. Test-retest reliability for clinical populations has not been reported. A total of 31 patients (17 women and 14 men) with long lasting low back pain participated in the study. The patients were tested twice at an interval of 2 days and at the same time of the day. The heaviest load that the patient could lift 4 times was used as outcome measure. The error of measurement indicates that the true result in 95% of cases will be within +/-4.5 kg from the measured value, while the difference between 2 measurements in 95% of cases will be less than 6.4 kg. Intra-class correlation (1,1) was 0.91. Relative test-retest reliability was high assessed by intra-class correlation, but absolute measurement variability reported as the smallest detectable difference has relevance for the interpretation of clinical test results and should also be considered.
Wollin, Martin; Purdam, Craig; Drew, Michael K
2016-01-01
To investigate inter and intra-tester reliability of an externally fixed dynamometry unilateral hamstring strength test, in the elite sports setting. Reliability study. Sixteen, injury-free, elite male youth football players (age=16.81±0.54 years, height=180.22±5.29cm, weight 73.88±6.54kg, BMI=22.57±1.42) gave written informed consent. Unilateral maximum isometric peak hamstring force was evaluated by externally fixed dynamometry for inter-tester, intra-day and intra-tester, inter-week reliability. The test position was standardised to correlate with the terminal swing phase of the gait running cycle. Inter and intra-tester values demonstrated good to high levels of reliability. The intra-class coefficient (ICC) for inter-tester, intra-day reliability was 0.87 (95% CI=0.75-0.93) with standard error of measure percentage (SEM%) 4.7 and minimal detectable change percentage (MDC%) 12.9. Intra-tester, inter-week reliability results were ICC 0.86 (95% CI, 0.74-0.93), SEM% 5.0 and MDC% 14.0. This study demonstrates good to high inter and intra-tester reliability of isometric externally fixed dynamometry unilateral hamstring strength testing in the regular elite sport setting involving elite male youth football players. The intra-class coefficient in association with the low standard error of measure and minimal detectable change percentages suggest that this procedure is appropriate for clinical and academic use as well as monitoring hamstring strength in the elite sport setting. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
Evaluation of the CEAS model for barley yields in North Dakota and Minnesota
NASA Technical Reports Server (NTRS)
Barnett, T. L. (Principal Investigator)
1981-01-01
The CEAS yield model is based upon multiple regression analysis at the CRD and state levels. For the historical time series, yield is regressed on a set of variables derived from monthly mean temperature and monthly precipitation. Technological trend is represented by piecewise linear and/or quadriatic functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test (1970-79) demonstrated that biases are small and performance as indicated by the root mean square errors are acceptable for intended application, however, model response for individual years particularly unusual years, is not very reliable and shows some large errors. The model is objective, adequate, timely, simple and not costly. It considers scientific knowledge on a broad scale but not in detail, and does not provide a good current measure of modeled yield reliability.
NASA Technical Reports Server (NTRS)
LaValley, Brian W.; Little, Phillip D.; Walter, Chris J.
2011-01-01
This report documents the capabilities of the EDICT tools for error modeling and error propagation analysis when operating with models defined in the Architecture Analysis & Design Language (AADL). We discuss our experience using the EDICT error analysis capabilities on a model of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER) architecture that uses the Reliable Optical Bus (ROBUS). Based on these experiences we draw some initial conclusions about model based design techniques for error modeling and analysis of highly reliable computing architectures.
System reliability and recovery.
DOT National Transportation Integrated Search
1971-06-01
The paper exhibits a variety of reliability techniques applicable to future ATC data processing systems. Presently envisioned schemes for error detection, error interrupt and error analysis are considered, along with methods of retry, reconfiguration...
Bayesian Meta-Analysis of Coefficient Alpha
ERIC Educational Resources Information Center
Brannick, Michael T.; Zhang, Nanhua
2013-01-01
The current paper describes and illustrates a Bayesian approach to the meta-analysis of coefficient alpha. Alpha is the most commonly used estimate of the reliability or consistency (freedom from measurement error) for educational and psychological measures. The conventional approach to meta-analysis uses inverse variance weights to combine…
Turner, T H; Renfroe, J B; Elm, J; Duppstadt-Delambo, A; Hinson, V K
2016-01-01
Ability to identify change is crucial for measuring response to interventions and tracking disease progression. Beyond psychometrics, investigations of Parkinson's disease with mild cognitive impairment (PD-MCI) must consider fluctuating medication, motor, and mental status. One solution is to employ 90% reliable change indices (RCIs) from test manuals to account for account measurement error and practice effects. The current study examined robustness of 90% RCIs for 19 commonly used executive function tests in 14 PD-MCI subjects assigned to the placebo arm of a 10-week randomized controlled trial of atomoxetine in PD-MCI. Using 90% RCIs, the typical participant showed spurious improvement on one measure, and spurious decline on another. Reliability estimates from healthy adults standardization samples and PD-MCI were similar. In contrast to healthy adult samples, practice effects were minimal in this PD-MCI group. Separate 90% RCIs based on the PD-MCI sample did not further reduce error rate. In the present study, application of 90% RCIs based on healthy adults in standardization samples effectively reduced misidentification of change in a sample of PD-MCI. Our findings support continued application of 90% RCIs when using executive function tests to assess change in neurological populations with fluctuating status.
Accounting for uncertainty in DNA sequencing data.
O'Rawe, Jason A; Ferson, Scott; Lyon, Gholson J
2015-02-01
Science is defined in part by an honest exposition of the uncertainties that arise in measurements and propagate through calculations and inferences, so that the reliabilities of its conclusions are made apparent. The recent rapid development of high-throughput DNA sequencing technologies has dramatically increased the number of measurements made at the biochemical and molecular level. These data come from many different DNA-sequencing technologies, each with their own platform-specific errors and biases, which vary widely. Several statistical studies have tried to measure error rates for basic determinations, but there are no general schemes to project these uncertainties so as to assess the surety of the conclusions drawn about genetic, epigenetic, and more general biological questions. We review here the state of uncertainty quantification in DNA sequencing applications, describe sources of error, and propose methods that can be used for accounting and propagating these errors and their uncertainties through subsequent calculations. Copyright © 2014 Elsevier Ltd. All rights reserved.
Robust dynamic 3-D measurements with motion-compensated phase-shifting profilometry
NASA Astrophysics Data System (ADS)
Feng, Shijie; Zuo, Chao; Tao, Tianyang; Hu, Yan; Zhang, Minliang; Chen, Qian; Gu, Guohua
2018-04-01
Phase-shifting profilometry (PSP) is a widely used approach to high-accuracy three-dimensional shape measurements. However, when it comes to moving objects, phase errors induced by the movement often result in severe artifacts even though a high-speed camera is in use. From our observations, there are three kinds of motion artifacts: motion ripples, motion-induced phase unwrapping errors, and motion outliers. We present a novel motion-compensated PSP to remove the artifacts for dynamic measurements of rigid objects. The phase error of motion ripples is analyzed for the N-step phase-shifting algorithm and is compensated using the statistical nature of the fringes. The phase unwrapping errors are corrected exploiting adjacent reliable pixels, and the outliers are removed by comparing the original phase map with a smoothed phase map. Compared with the three-step PSP, our method can improve the accuracy by more than 95% for objects in motion.
Gagné, Myriam; Boulet, Louis-Philippe; Pérez, Norma; Moisan, Jocelyne
2018-04-30
To systematically identify the measurement properties of patient-reported outcome instruments (PROs) that evaluate adherence to inhaled maintenance medication in adults with asthma. We conducted a systematic review of six databases. Two reviewers independently included studies on the measurement properties of PROs that evaluated adherence in asthmatic participants aged ≥18 years. Based on the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN), the reviewers (1) extracted data on internal consistency, reliability, measurement error, content validity, structural validity, hypotheses testing, cross-cultural validity, criterion validity, and responsiveness; (2) assessed the methodological quality of the included studies; (3) assessed the quality of the measurement properties (positive or negative); and (4) summarised the level of evidence (limited, moderate, or strong). We screened 6,068 records and included 15 studies (14 PROs). No studies evaluated measurement error or responsiveness. Based on methodological and measurement property quality assessments, we found limited positive evidence of: (a) internal consistency of the Adherence Questionnaire, Refined Medication Adherence Reason Scale (MAR-Scale), Medication Adherence Report Scale for Asthma (MARS-A), and Test of the Adherence to Inhalers (TAI); (b) reliability of the TAI; and (c) structural validity of the Adherence Questionnaire, MAR-Scale, MARS-A, and TAI. We also found limited negative evidence of: (d) hypotheses testing of Adherence Questionnaire; (e) reliability of the MARS-A; and (f) criterion validity of the MARS-A and TAI. Our results highlighted the need to conduct further high-quality studies that will positively evaluate the reliability, validity, and responsiveness of the available PROs. This article is protected by copyright. All rights reserved.
Poleti, Marcelo Lupion; Fernandes, Thais Maria Freire; Pagin, Otávio; Moretti, Marcela Rodrigues; Rubira-Bullen, Izabel Regina Fischer
2016-01-01
The aim of this in vitro study was to evaluate the reliability and accuracy of linear measurements on three-dimensional (3D) surface models obtained by standard pre-set thresholds in two segmentation software programs. Ten mandibles with 17 silica markers were scanned for 0.3-mm voxels in the i-CAT Classic (Imaging Sciences International, Hatfield, PA, USA). Twenty linear measurements were carried out by two observers two times on the 3D surface models: the Dolphin Imaging 11.5 (Dolphin Imaging & Management Solutions, Chatsworth, CA, USA), using two filters(Translucent and Solid-1), and in the InVesalius 3.0.0 (Centre for Information Technology Renato Archer, Campinas, SP, Brazil). The physical measurements were made by another observer two times using a digital caliper on the dry mandibles. Excellent intra- and inter-observer reliability for the markers, physical measurements, and 3D surface models were found (intra-class correlation coefficient (ICC) and Pearson's r ≥ 0.91). The linear measurements on 3D surface models by Dolphin and InVesalius software programs were accurate (Dolphin Solid-1 > InVesalius > Dolphin Translucent). The highest absolute and percentage errors were obtained for the variable R1-R1 (1.37 mm) and MF-AC (2.53 %) in the Dolphin Translucent and InVesalius software, respectively. Linear measurements on 3D surface models obtained by standard pre-set thresholds in the Dolphin and InVesalius software programs are reliable and accurate compared with physical measurements. Studies that evaluate the reliability and accuracy of the 3D models are necessary to ensure error predictability and to establish diagnosis, treatment plan, and prognosis in a more realistic way.
Analysis and discussion on the experimental data of electrolyte analyzer
NASA Astrophysics Data System (ADS)
Dong, XinYu; Jiang, JunJie; Liu, MengJun; Li, Weiwei
2018-06-01
In the subsequent verification of electrolyte analyzer, we found that the instrument can achieve good repeatability and stability in repeated measurements with a short period of time, in line with the requirements of verification regulation of linear error and cross contamination rate, but the phenomenon of large indication error is very common, the measurement results of different manufacturers have great difference, in order to find and solve this problem, help enterprises to improve quality of product, to obtain accurate and reliable measurement data, we conducted the experimental evaluation of electrolyte analyzer, and the data were analyzed by statistical analysis.
Moewis, P; Boeth, H; Heller, M O; Yntema, C; Jung, T; Doyscher, R; Ehrig, R M; Zhong, Y; Taylor, W R
2014-07-01
The in vivo quantification of rotational laxity of the knee joint is of importance for monitoring changes in joint stability or the outcome of therapies. While invasive assessments have been used to study rotational laxity, non-invasive methods are attractive particularly for assessing young cohorts. This study aimed to determine the conditions under which tibio-femoral rotational laxity can be assessed reliably and accurately in a non-invasive manner. The reliability and error of non-invasive examinations of rotational joint laxity were determined by comparing the artefact associated with surface mounted markers against simultaneous measurements using fluoroscopy in five knees including healthy and ACL deficient joints. The knees were examined at 0°, 30°, 60° and 90° flexion using a device that allows manual axial rotation of the joint. With a mean RMS error of 9.6°, the largest inaccuracy using non-invasive assessment was present at 0° knee flexion, whereas at 90° knee flexion, a smaller RMS error of 5.7° was found. A Bland and Altman assessment indicated that a proportional bias exists between the non-invasive and fluoroscopic approaches, with limits of agreement that exceeded 20°. Correction using average linear regression functions resulted in a reduction of the RMS error to below 1° and limits of agreement to less than ±1° across all knees and flexion angles. Given the excellent reliability and the fact that a correction of the surface mounted marker based rotation values can be achieved, non-invasive evaluation of tibio-femoral rotation could offer opportunities for simplified devices for use in clinical settings in cases where invasive assessments are not justified. Although surface mounted marker based measurements tend to overestimate joint rotation, and therefore joint laxity, our results indicate that it is possible to correct for this error. Copyright © 2014 IPEM. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Chen, Yuan-Liu; Niu, Zengyuan; Matsuura, Daiki; Lee, Jung Chul; Shimizu, Yuki; Gao, Wei; Oh, Jeong Seok; Park, Chun Hong
2017-10-01
In this paper, a four-probe measurement system is implemented and verified for the carriage slide motion error measurement of a large-scale roll lathe used in hybrid manufacturing where a laser machining probe and a diamond cutting tool are placed on two sides of a roll workpiece for manufacturing. The motion error of the carriage slide of the roll lathe is composed of two straightness motion error components and two parallelism motion error components in the vertical and horizontal planes. Four displacement measurement probes, which are mounted on the carriage slide with respect to four opposing sides of the roll workpiece, are employed for the measurement. Firstly, based on the reversal technique, the four probes are moved by the carriage slide to scan the roll workpiece before and after a 180-degree rotation of the roll workpiece. Taking into consideration the fact that the machining accuracy of the lathe is influenced by not only the carriage slide motion error but also the gravity deformation of the large-scale roll workpiece due to its heavy weight, the vertical motion error is thus characterized relating to the deformed axis of the roll workpiece. The horizontal straightness motion error can also be synchronously obtained based on the reversal technique. In addition, based on an error separation algorithm, the vertical and horizontal parallelism motion error components are identified by scanning the rotating roll workpiece at the start and the end positions of the carriage slide, respectively. The feasibility and reliability of the proposed motion error measurement system are demonstrated by the experimental results and the measurement uncertainty analysis.
Goulet, Eric D B; Baker, Lindsay B
2017-12-01
The B-722 Laqua Twin is a low cost, portable, and battery operated sodium analyzer, which can be used for the assessment of sweat sodium concentration. The Laqua Twin is reliable and provides a degree of accuracy similar to more expensive analyzers; however, its interunit measurement error remains unknown. The purpose of this study was to compare the sodium concentration values of 70 sweat samples measured using three different Laqua Twin units. Mean absolute errors, random errors and constant errors among the different Laqua Twins ranged respectively between 1.7 mmol/L to 3.5 mmol/L, 2.5 mmol/L to 3.7 mmol/L and -0.6 mmol/L to 3.9 mmol/L. Proportional errors among Laqua Twins were all < 2%. Based on a within-subject biological variability in sweat sodium concentration of ± 12%, the maximal allowable imprecision among instruments was considered to be £ 6%. In that respect, the within (2.9%), between (4.5%), and total (5.4%) measurement error coefficient of variations were all < 6%. For a given sweat sodium concentration value, the largest observed difference in mean and lower and upper bound error of measurements among instruments were, respectively, 4.7 mmol/L, 2.3 mmol/L, and 7.0 mmol/L. In conclusion, our findings show that the interunit measurement error of the B-722 Laqua Twin is low and methodologically acceptable.
Interhemispheric Inhibition Measurement Reliability in Stroke: A Pilot Study
Cassidy, Jessica M.; Chu, Haitao; Chen, Mo; Kimberley, Teresa J.; Carey, James R.
2016-01-01
Objective Reliable transcranial magnetic stimulation (TMS) measures for probing corticomotor excitability are important when assessing the physiological effects of non-invasive brain stimulation. The primary objective of this study was to examine test-retest reliability of an interhemispheric inhibition (IHI) index measurement in stroke. Materials and Methods Ten subjects with chronic stroke (≥ 6 months) completed two IHI testing sessions per week for three weeks (six testing sessions total). A single investigator measured IHI in the contra- to-ipsilesional primary motor cortex direction and in the opposite direction using bilateral paired-pulse TMS. Weekly sessions were separated by 24 hours with a 1-week washout period separating testing weeks. To determine if motor-evoked potential (MEP) quantification method affected measurement reliability, IHI indices computed from both MEP amplitude and area responses were found. Reliability was assessed with two-way, mixed intraclass correlation coefficients (ICC(3,k)). Standard error of measurement and minimal detectable difference statistics were also determined. Results With the exception of the initial testing week, IHI indices measured in the contra-to-ipsilesional hemisphere direction demonstrated moderate to excellent reliability (ICC = 0.725 – 0.913). Ipsi-to-contralesional IHI indices depicted poor or invalid reliability estimates throughout the three-week testing duration (ICC= −1.153 – 0.105). The overlap of ICC 95% confidence intervals suggested that IHI indices using MEP amplitude vs. area measures did not differ with respect to reliability. Conclusions IHI indices demonstrated varying magnitudes of reliability irrespective of MEP quantification method. Several strategies for improving IHI index measurement reliability are discussed. PMID:27333364
TEST-retest reliability of kinetic variables measured on campus board in sport climbers.
Abreu, Edgardo Alvares de Campos; Araújo, Sílvia Ribeiro Santos; Cançado, Gustavo Henrique da Cunha Peixoto; Andrade, André Gustavo Pereira de; Chagas, Mauro Heleno; Menzel, Hans-Joachim Karl
2018-05-16
Sport climbers frequently use campus board (CB) to improve their upper limb strength under similar conditions of high-difficulty sport climbing routes. The objective of this study was to assess the test-retest reliability of peak force and impulse measured using a CB instrumented with two load cells on starting holds. The same evaluator examined 22 climbers on two days with 48 h between the assessments. The participants performed five concentric lunges (CL) and five lunges with stretch-shortening cycle with 1 min intervals between repetitions and 10 min between exercises. All variables were associated with significant intraclass correlation coefficient (ICC) values (p = 0.001), and none variable showed systematic errors (p > 0.05). Peak force ICC was higher than 0.88, and the standard error of measurement (SEM%) was less than 5%. Impulse ICC for the CL was greater than 0.90, and the SEM% was less than 14%. We conclude that the kinetic variables measured using the CB were reliable. The ability of the hands to maintain contact with the holds (peak force) and the abilities of the arms and shoulders vertically move the centre of mass (impulse) should be taken into account by coaches on CB training prescription as well for further research.
Lourenço, Ana S; Lameiras, Carina; Silva, Anabela G
2016-01-01
The aims of this study were to assess intrarater reliability and to calculate the standard error of measurement (SEM) and minimal detectable change (MDC) for deep neck flexor and neck extensor muscle endurance tests, and compare the results between individuals with and without subclinical neck pain. Participants were students of the University of Aveiro reporting subclinical neck pain and asymptomatic participants matched for sex and age to the neck pain group. Data on endurance capacity of the deep neck flexors and neck extensors were collected by a blinded assessor using the deep neck flexor endurance test and the extensor endurance test, respectively. Intraclass correlation coefficients (ICCs), SEM, and MDC were calculated for measurements taken within a session by the same assessor. Differences between groups for endurance capacity were investigated using a Mann-Whitney U test. The deep neck flexor endurance test (ICC = 0.71; SEM = 6.91 seconds; MDC = 19.15 seconds) and neck extensor endurance test (ICC = 0.73; SEM = 9.84 minutes; MDC = 2.34 minutes) are reliable. No significant differences were found between participants with and without neck pain for both tests of muscle endurance (P > .05). The endurance capacity of the deep neck flexors and neck extensors can be reliably measured in participants with subclinical neck pain. However, the wide SEM and MDC might limit the sensitivity of these tests. Copyright © 2016. Published by Elsevier Inc.
Medina-Mirapeix, Francesc; Bernabeu-Mora, Roberto; Llamazares-Herrán, Eduardo; Sánchez-Martínez, Ma Piedad; García-Vidal, José Antonio; Escolar-Reina, Pilar
2016-11-01
To evaluate the interobserver reliability of the Short Physical Performance Battery (SPPB) and hand dynamometry when measuring isometric muscle strength in people with chronic obstructive pulmonary disease (COPD). Reliability study. Each patient was assessed by a pulmonology physician and a physical therapist in 2 separate sessions 7 to 14 days apart (mean, 9.8±0.8d). Each rater was blinded to the other's results. Pneumology unit of a public hospital. Random sample of outpatients with stable COPD (N=30). Not applicable. SPPB and muscle strength (kg) using electronic handgrip and handheld dynamometers. Reliability was assessed with intraclass correlation coefficients (ICCs), standard error of measurement values, and Bland-Altman plots. ICCs were calculated for the SPPB summary score and for its 3 subscales. The ICCs for the overall reliability of the SPPB summary score and for grip and quadriceps strength were .82 (95% confidence interval [CI], .62-.91), .97 (95% CI, .93-.98), and .76 (95% CI, .49-.88), respectively. The standard error of measurement values were .55 points, 1.30kg, and 1.22kg, respectively. The mean differences between the rater's scores were near zero for grip strength and SPPB summary score measures. The ICCs for the SPPB subscales were .84 (95% CI, .66-.92) for the chair subscale, .75 (95% CI, .48-.88) for gait, and .33 (95% CI, -.42 to .68) for balance. Interobserver reliability was good for quadriceps and handgrip dynamometry and for the SPPB summary score and its chair stand and gait speed subscales. Both pulmonary physicians and physical therapists can obtain and exchange the scores. Because the reliability of the balance subscale was questionable, it is better to use the SPPB summary score. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Romero-Franco, Natalia; Montaño-Munuera, Juan Antonio; Jiménez-Reyes, Pedro
2017-01-01
Knee joint position sense (JPS) is a key parameter for optimum performance in many sports but is frequently negatively affected by injuries and/or fatigue during training sessions. Although evaluation of JPS may provide key information to reduce the risk of injury, it often requires expensive and/or complex tools that make monitoring proprioceptive deterioration difficult. To analyze the validity and reliability of a digital inclinometer to measure knee JPS in a closed kinetic chain (CKC). The validity and intertester and intratester reliability of a digital inclinometer for measuring knee JPS were assessed. Biomechanics laboratory. 10 athletes (5 men and 5 women; 26.2 ± 1.3 y, 71.7 ± 12.4 kg; 1.75 ± 0.09 m; 23.5 ± 3.9 kg/m 2 ). Knee JPS was measured in a CKC. Absolute angular error (AAE) of knee JPS in a CKC. Intraclass correlation coefficient (ICC) and standard error of the mean (SEM) were calculated to determine the validity and reliability of the inclinometer. Data showed that the inclinometer had a high level of validity compared with an isokinetic dynamometer (ICC = 1.0, SEM = 1.39, p < 0.001), and there was very good intra- and inter-tester reliability for reading the inclinometer (ICC = 1.0, SEM = 0.85, p < 0.001). Compared with AutoCAD video analysis, inclinometer validity was very high (ICC = 0.980, SEM = 3.46, p < 0.001) for measuring AAE during knee JPS in a CKC. In addition, the intertester reliability of the inclinometer for obtaining AAE was very high (ICC = .994, SEM = 1.67, p < 0.001). The inclinometer provides a valid and reliable method for assessing knee JPS in a CKC. Health and sports professionals could take advantage of this tool to monitor proprioceptive deterioration in athletes.
47 CFR 101.91 - Involuntary relocation procedures.
Code of Federal Regulations, 2010 CFR
2010-10-01
... engineering, equipment, site and FCC fees, as well as any legitimate and prudent transaction expenses incurred..., reliability is measured by the percent of time the bit error rate (BER) exceeds a desired value, and for analog or digital voice transmissions, it is measured by the percent of time that audio signal quality...
Paksi, Borbala; Demetrovics, Zsolt; Magi, Anna; Felvinczi, Katalin
2017-06-01
This paper introduces the methods and methodological findings of the National Survey on Addiction Problems in Hungary (NSAPH 2015). Use patterns of smoking, alcohol use and other psychoactive substances were measured as well as that of certain behavioural addictions (problematic gambling - PGSI, DSM-V, eating disorders - SCOFF, problematic internet use - PIUQ, problematic on-line gaming - POGO, problematic social media use - FAS, exercise addictions - EAI-HU, work addiction - BWAS, compulsive buying - CBS). The paper describes the applied measurement techniques, sample selection, recruitment of respondents and the data collection strategy as well. Methodological results of the survey including reliability and validity of the measures are reported. The NSAPH 2015 research was carried out on a nationally representative sample of the Hungarian adult population aged 16-64 yrs (gross sample 2477, net sample 2274 persons) with the age group of 18-34 being overrepresented. Statistical analysis of the weight-distribution suggests that weighting did not create any artificial distortion in the database leaving the representativeness of the sample unaffected. The size of the weighted sample of the 18-64 years old adult population is 1490 persons. The extent of the theoretical margin of error in the weighted sample is ±2,5%, at a reliability level of 95% which is in line with the original data collection plans. Based on the analysis of reliability and the extent of errors beyond sampling within the context of the database we conclude that inconsistencies create relatively minor distortions in cumulative prevalence rates; consequently the database makes possible the reliable estimation of risk factors related to different substance use behaviours. The reliability indexes of measurements used for prevalence estimates of behavioural addictions proved to be appropriate, though the psychometric features in some cases suggest the presence of redundant items. The comparison of parameters of errors beyond sample selection in the current and previous data collections indicates that trend estimates and their interpretation requires outstanding attention and in some cases even correction procedures might become necessary.
Validation of instrumentation to monitor dynamic performance of olympic weightlifters.
Bruenger, Adam J; Smith, Sarah L; Sands, William A; Leigh, Michael R
2007-05-01
The purpose of this study was to validate the accuracy and reliability of the Weightlifting Video Overlay System (WVOS) used by coaches and sport biomechanists at the United States Olympic Training Center. Static trials with the bar set at specific positions and dynamic trials of a power snatch were performed. Static and dynamic values obtained by the WVOS were compared with values obtained by tape measure and standard video kinematic analysis. Coordinate positions (horizontal [X] and vertical [Y]) were compared on both ends (left and right) of the bar. Absolute technical error of measurement between WVOS and kinematic values were calculated (0.97 cm [left X], 0.98 cm [right X], 0.88 cm [left Y], and 0.53 cm [right Y]) for the static data. Pearson correlations for all dynamic trials exceeded r = 0.88. The greatest discrepancies between the 2 measuring systems were found to occur when there was twisting of the bar during the performance. This error was probably due to the location on the bar where the coordinates were measured. The WVOS appears to provide accurate position information when compared with standard kinematics; however, care must be taken in evaluating position measurements if there is a significant amount of twisting in the movement. The WVOS appears to be reliable and valid within reasonable error limits for the determination of weightlifting movement technique.
Software reliability: Application of a reliability model to requirements error analysis
NASA Technical Reports Server (NTRS)
Logan, J.
1980-01-01
The application of a software reliability model having a well defined correspondence of computer program properties to requirements error analysis is described. Requirements error categories which can be related to program structural elements are identified and their effect on program execution considered. The model is applied to a hypothetical B-5 requirement specification for a program module.
NASA Technical Reports Server (NTRS)
Haas, Evan; DeLuccia, Frank
2016-01-01
In evaluating GOES-R Advanced Baseline Imager (ABI) image navigation quality, upsampled sub-images of ABI images are translated against downsampled Landsat 8 images of localized, high contrast earth scenes to determine the translations in the East-West and North-South directions that provide maximum correlation. The native Landsat resolution is much finer than that of ABI, and Landsat navigation accuracy is much better than ABI required navigation accuracy and expected performance. Therefore, Landsat images are considered to provide ground truth for comparison with ABI images, and the translations of ABI sub-images that produce maximum correlation with Landsat localized images are interpreted as ABI navigation errors. The measured local navigation errors from registration of numerous sub-images with the Landsat images are averaged to provide a statistically reliable measurement of the overall navigation error of the ABI image. The dispersion of the local navigation errors is also of great interest, since ABI navigation requirements are specified as bounds on the 99.73rd percentile of the magnitudes of per pixel navigation errors. However, the measurement uncertainty inherent in the use of image registration techniques tends to broaden the dispersion in measured local navigation errors, masking the true navigation performance of the ABI system. We have devised a novel and simple method for estimating the magnitude of the measurement uncertainty in registration error for any pair of images of the same earth scene. We use these measurement uncertainty estimates to filter out the higher quality measurements of local navigation error for inclusion in statistics. In so doing, we substantially reduce the dispersion in measured local navigation errors, thereby better approximating the true navigation performance of the ABI system.
Correction of stream quality trends for the effects of laboratory measurement bias
Alexander, Richard B.; Smith, Richard A.; Schwarz, Gregory E.
1993-01-01
We present a statistical model relating measurements of water quality to associated errors in laboratory methods. Estimation of the model allows us to correct trends in water quality for long-term and short-term variations in laboratory measurement errors. An illustration of the bias correction method for a large national set of stream water quality and quality assurance data shows that reductions in the bias of estimates of water quality trend slopes are achieved at the expense of increases in the variance of these estimates. Slight improvements occur in the precision of estimates of trend in bias by using correlative information on bias and water quality to estimate random variations in measurement bias. The results of this investigation stress the need for reliable, long-term quality assurance data and efficient statistical methods to assess the effects of measurement errors on the detection of water quality trends.
Walton, David M; Macdermid, Joy C; Nielson, Warren; Teasell, Robert W; Chiasson, Marco; Brown, Lauren
2011-09-01
Clinical measurement. To evaluate the intrarater, interrater, and test-retest reliability of an accessible digital algometer, and to determine the minimum detectable change in normal healthy individuals and a clinical population with neck pain. Pressure pain threshold testing may be a valuable assessment and prognostic indicator for people with neck pain. To date, most of this research has been completed using algometers that are too resource intensive for routine clinical use. Novice raters (physiotherapy students or clinical physiotherapists) were trained to perform algometry testing over 2 clinically relevant sites: the angle of the upper trapezius and the belly of the tibialis anterior. A convenience sample of normal healthy individuals and a clinical sample of people with neck pain were tested by 2 different raters (all participants) and on 2 different days (healthy participants only). Intraclass correlation coefficient (ICC), standard error of measurement, and minimum detectable change were calculated. A total of 60 healthy volunteers and 40 people with neck pain were recruited. Intrarater reliability was almost perfect (ICC = 0.94-0.97), interrater reliability was substantial to near perfect (ICC = 0.79-0.90), and test-retest reliability was substantial (ICC = 0.76-0.79). Smaller change was detectable in the trapezius compared to the tibialis anterior. This study provides evidence that novice raters can perform digital algometry with adequate reliability for research and clinical use in people with and without neck pain.
Raabe, A; Stöckel, R; Hohrein, D; Schöche, J
1998-01-01
The failure of intraventricular pressure measurement in cases of catheter blockage or dislodgement is thought to be eliminated by using intraventricular microtransducers. We report on an avoidable methodological error that may affect the reliability of intraventricular pressure measurement with these devices. Intraventricular fiberoptic or solid-state devices were implanted in 43 patients considered to be at risk for developing catheter occlusion. Two different types were used, i.e., devices in which the transducer is placed inside the ventriculostomy catheter (Type A) and devices in which the transducer is integrated in the external surface of the catheter (Type B). Type A devices were used in 15 patients and Type B devices in 28 patients. Pressure recordings were checked at bedside for the validity and reliability of the measurement. Of the 15 patients treated with Type A devices, no reliable pressure recordings were able to be obtained in three patients in whom ventricular punctures were not successful. In 4 of the remaining 12 patients, periods of erroneous pressure readings were detected. After opening of cerebrospinal fluid drainage, all Type A devices failed to reflect real intraventricular pressure. In patients treated with Type B devices, no erroneous pressure recordings were able to be identified, irrespective of whether cerebrospinal fluid drainage was performed. Even when ventricular puncture failed, pressure measurement was correct each time. Transducers that are simply placed inside the ventriculostomy catheter require fluid-coupling. They may fail, either during cerebrospinal fluid drainage or when the catheter is blocked or placed within the parenchyma.
NASA Astrophysics Data System (ADS)
Fukumori, Ichiro; Raghunath, Ramanujam; Fu, Lee-Lueng; Chao, Yi
1999-11-01
The feasibility of assimilating satellite altimetry data into a global ocean general circulation model is studied. Three years of TOPEX/Poseidon data are analyzed using a global, three-dimensional, nonlinear primitive equation model. The assimilation's success is examined by analyzing its consistency and reliability measured by formal error estimates with respect to independent measurements. Improvements in model solution are demonstrated, in particular, properties not directly measured. Comparisons are performed with sea level measured by tide gauges, subsurface temperatures and currents from moorings, and bottom pressure measurements. Model representation errors dictate what can and cannot be resolved by assimilation, and its identification is emphasized.
Hydrologic Design in the Anthropocene
NASA Astrophysics Data System (ADS)
Vogel, R. M.; Farmer, W. H.; Read, L.
2014-12-01
In an era dubbed the Anthropocene, the natural world is being transformed by a myriad of human influences. As anthropogenic impacts permeate hydrologic systems, hydrologists are challenged to fully account for such changes and develop new methods of hydrologic design. Deterministic watershed models (DWM), which can account for the impacts of changes in land use, climate and infrastructure, are becoming increasing popular for the design of flood and/or drought protection measures. As with all models that are calibrated to existing datasets, DWMs are subject to model error or uncertainty. In practice, the model error component of DWM predictions is typically ignored yet DWM simulations which ignore model error produce model output which cannot reproduce the statistical properties of the observations they are intended to replicate. In the context of hydrologic design, we demonstrate how ignoring model error can lead to systematic downward bias in flood quantiles, upward bias in drought quantiles and upward bias in water supply yields. By reincorporating model error, we document how DWM models can be used to generate results that mimic actual observations and preserve their statistical behavior. In addition to use of DWM for improved predictions in a changing world, improved communication of the risk and reliability is also needed. Traditional statements of risk and reliability in hydrologic design have been characterized by return periods, but such statements often assume that the annual probability of experiencing a design event remains constant throughout the project horizon. We document the general impact of nonstationarity on the average return period and reliability in the context of hydrologic design. Our analyses reveal that return periods do not provide meaningful expressions of the likelihood of future hydrologic events. Instead, knowledge of system reliability over future planning horizons can more effectively prepare society and communicate the likelihood of future hydrologic events of interest.
Nibali, Maria L; Tombleson, Tom; Brady, Philip H; Wagner, Phillip
2015-10-01
Understanding typical variation of vertical jump (VJ) performance and confounding sources of its typical variability (i.e., familiarization and competitive level) is pertinent in the routine monitoring of athletes. We evaluated the presence of systematic error (learning effect) and nonuniformity of error (heteroscedasticity) across VJ performances of athletes that differ in competitive level and quantified the reliability of VJ kinetic and kinematic variables relative to the smallest worthwhile change (SWC). One hundred thirteen high school athletes, 30 college athletes, and 35 professional athletes completed repeat VJ trials. Average eccentric rate of force development (RFD), average concentric (CON) force, CON impulse, and jump height measurements were obtained from vertical ground reaction force (VGRF) data. Systematic error was assessed by evaluating changes in the mean of repeat trials. Heteroscedasticity was evaluated by plotting the difference score (trial 2 - trial 1) against the mean of the trials. Variability of jump variables was calculated as the typical error (TE) and coefficient of variation (%CV). No substantial systematic error (effect size range: -0.07 to 0.11) or heteroscedasticity was present for any of the VJ variables. Vertical jump can be performed without the need for familiarization trials, and the variability can be conveyed as either the raw TE or the %CV. Assessment of VGRF variables is an effective and reliable means of assessing VJ performance. Average CON force and CON impulse are highly reliable (%CV: 2.7% ×/÷ 1.10), although jump height was the only variable to display a %CV ≤SWC. Eccentric RFD is highly variable yet should not be discounted from VJ assessments on this factor alone because it may be sensitive to changes in response to training or fatigue that exceed the TE.
Validation of TRMM precipitation radar monthly rainfall estimates over Brazil
NASA Astrophysics Data System (ADS)
Franchito, Sergio H.; Rao, V. Brahmananda; Vasques, Ana C.; Santo, Clovis M. E.; Conforte, Jorge C.
2009-01-01
In an attempt to validate the Tropical Rainfall Measuring Mission (TRMM) precipitation radar (PR) over Brazil, TRMM PR estimates are compared with rain gauge station data from Agência Nacional de Energia Elétrica (ANEEL). The analysis is conducted on a seasonal basis and considers five geographic regions with different precipitation regimes. The results showed that TRMM PR seasonal rainfall is well correlated with ANEEL rainfall (correlation coefficients are significant at the 99% confidence level) over most of Brazil. The random and systematic errors of TRMM PR are sensitive to seasonal and regional differences. During December to February and March to May, TRMM PR rainfall is reliable over Brazil. In June to August (September to November) TRMM PR estimates are only reliable in the Amazonian and southern (Amazonian and southeastern) regions. In the other regions the relative RMS errors are larger than 50%, indicating that the random errors are high.
An examination of the interrater reliability between practitioners and researchers on the static-99.
Quesada, Stephen P; Calkins, Cynthia; Jeglic, Elizabeth L
2014-11-01
Many studies have validated the psychometric properties of the Static-99, the most widely used measure of sexual offender recidivism risk. However much of this research relied on instrument coding completed by well-trained researchers. This study is the first to examine the interrater reliability (IRR) of the Static-99 between practitioners in the field and researchers. Using archival data from a sample of 1,973 formerly incarcerated sex offenders, field raters' scores on the Static-99 were compared with those of researchers. Overall, clinicians and researchers had excellent IRR on Static-99 total scores, with IRR coefficients ranging from "substantial" to "outstanding" for the individual 10 items of the scale. The most common causes of discrepancies were coding manual errors, followed by item subjectivity, inaccurate item scoring, and calculation errors. These results offer important data with regard to the frequency and perceived nature of scoring errors. © The Author(s) 2013.
[Reliability of iWitness photogrammetry in maxillofacial application].
Jiang, Chengcheng; Song, Qinggao; He, Wei; Chen, Shang; Hong, Tao
2015-06-01
This study aims to test the accuracy and precision of iWitness photogrammetry for measuring the facial tissues of mannequin head. Under ideal circumstances, the 3D landmark coordinates were repeatedly obtained from a mannequin head using iWitness photogrammetric system with different parameters, to examine the precision of this system. The differences between the 3D data and their true distance values of mannequin head were computed. Operator error of 3D system in non-zoom and zoom status were 0.20 mm and 0.09 mm, and the difference was significant (P 0.05). Image captured error of 3D system was 0.283 mm, and there was no significant difference compared with the same group of images (P>0.05). Error of 3D systen with recalibration was 0.251 mm, and the difference was not statistically significant compared with image captured error (P>0.05). Good congruence was observed between means derived from the 3D photos and direct anthropometry, with difference ranging from -0.4 mm to +0.4 mm. This study provides further evidence of the high reliability of iWitness photogrammetry for several craniofacial measurements, including landmarks and inter-landmark distances. The evaluated system can be recommended for the evaluation and documentation of the facial surface.
Parrett, Charles; Johnson, D.R.; Hull, J.A.
1989-01-01
Estimates of streamflow characteristics (monthly mean flow that is exceeded 90, 80, 50, and 20 percent of the time for all years of record and mean monthly flow) were made and are presented in tabular form for 312 sites in the Missouri River basin in Montana. Short-term gaged records were extended to the base period of water years 1937-86, and were used to estimate monthly streamflow characteristics at 100 sites. Data from 47 gaged sites were used in regression analysis relating the streamflow characteristics to basin characteristics and to active-channel width. The basin-characteristics equations, with standard errors of 35% to 97%, were used to estimate streamflow characteristics at 179 ungaged sites. The channel-width equations, with standard errors of 36% to 103%, were used to estimate characteristics at 138 ungaged sites. Streamflow measurements were correlated with concurrent streamflows at nearby gaged sites to estimate streamflow characteristics at 139 ungaged sites. In a test using 20 pairs of gages, the standard errors ranged from 31% to 111%. At 139 ungaged sites, the estimates from two or more of the methods were weighted and combined in accordance with the variance of individual methods. When estimates from three methods were combined the standard errors ranged from 24% to 63 %. A drainage-area-ratio adjustment method was used to estimate monthly streamflow characteristics at seven ungaged sites. The reliability of the drainage-area-ratio adjustment method was estimated to be about equal to that of the basin-characteristics method. The estimate were checked for reliability. Estimates of monthly streamflow characteristics from gaged records were considered to be most reliable, and estimates at sites with actual flow record from 1937-86 were considered to be completely reliable (zero error). Weighted-average estimates were considered to be the most reliable estimates made at ungaged sites. (USGS)
Advanced error-prediction LDPC with temperature compensation for highly reliable SSDs
NASA Astrophysics Data System (ADS)
Tokutomi, Tsukasa; Tanakamaru, Shuhei; Iwasaki, Tomoko Ogura; Takeuchi, Ken
2015-09-01
To improve the reliability of NAND Flash memory based solid-state drives (SSDs), error-prediction LDPC (EP-LDPC) has been proposed for multi-level-cell (MLC) NAND Flash memory (Tanakamaru et al., 2012, 2013), which is effective for long retention times. However, EP-LDPC is not as effective for triple-level cell (TLC) NAND Flash memory, because TLC NAND Flash has higher error rates and is more sensitive to program-disturb error. Therefore, advanced error-prediction LDPC (AEP-LDPC) has been proposed for TLC NAND Flash memory (Tokutomi et al., 2014). AEP-LDPC can correct errors more accurately by precisely describing the error phenomena. In this paper, the effects of AEP-LDPC are investigated in a 2×nm TLC NAND Flash memory with temperature characterization. Compared with LDPC-with-BER-only, the SSD's data-retention time is increased by 3.4× and 9.5× at room-temperature (RT) and 85 °C, respectively. Similarly, the acceptable BER is increased by 1.8× and 2.3×, respectively. Moreover, AEP-LDPC can correct errors with pre-determined tables made at higher temperatures to shorten the measurement time before shipping. Furthermore, it is found that one table can cover behavior over a range of temperatures in AEP-LDPC. As a result, the total table size can be reduced to 777 kBytes, which makes this approach more practical.
López-Pascual, Juan; Cáceres, Magda Liliana; De Rosario, Helios; Page, Álvaro
2016-02-08
The reliability of joint rotation measurements is an issue of major interest, especially in clinical applications. The effect of instrumental errors and soft tissue artifacts on the variability of human motion measures is well known, but the influence of the representation of joint motion has not yet been studied. The aim of the study was to compare the within-subject reliability of three rotation formalisms for the calculation of the shoulder elevation joint angles. Five repetitions of humeral elevation in the scapular plane of 27 healthy subjects were recorded using a stereophotogrammetry system. The humerothoracic joint angles were calculated using the YX'Y" and XZ'Y" Euler angle sequences and the attitude vector. A within-subject repeatability study was performed for the three representations. ICC, SEM and CV were the indices used to estimate the error in the calculation of the angle amplitudes and the angular waveforms with each method. Excellent results were obtained in all representations for the main angle (elevation), but there were remarkable differences for axial rotation and plane of elevation. The YX'Y" sequence generally had the poorest reliability in the secondary angles. The XZ'Y' sequence proved to be the most reliable representation of axial rotation, whereas the attitude vector had the highest reliability in the plane of elevation. These results highlight the importance of selecting the method used to describe the joint motion when within-subjects reliability is an important issue of the experiment. This may be of particular importance when the secondary angles of motions are being studied. Copyright © 2016 Elsevier Ltd. All rights reserved.
Evaluation of TRMM Ground-Validation Radar-Rain Errors Using Rain Gauge Measurements
NASA Technical Reports Server (NTRS)
Wang, Jianxin; Wolff, David B.
2009-01-01
Ground-validation (GV) radar-rain products are often utilized for validation of the Tropical Rainfall Measuring Mission (TRMM) spaced-based rain estimates, and hence, quantitative evaluation of the GV radar-rain product error characteristics is vital. This study uses quality-controlled gauge data to compare with TRMM GV radar rain rates in an effort to provide such error characteristics. The results show that significant differences of concurrent radar-gauge rain rates exist at various time scales ranging from 5 min to 1 day, despite lower overall long-term bias. However, the differences between the radar area-averaged rain rates and gauge point rain rates cannot be explained as due to radar error only. The error variance separation method is adapted to partition the variance of radar-gauge differences into the gauge area-point error variance and radar rain estimation error variance. The results provide relatively reliable quantitative uncertainty evaluation of TRMM GV radar rain estimates at various times scales, and are helpful to better understand the differences between measured radar and gauge rain rates. It is envisaged that this study will contribute to better utilization of GV radar rain products to validate versatile spaced-based rain estimates from TRMM, as well as the proposed Global Precipitation Measurement, and other satellites.
The Vocal Cord Dysfunction Questionnaire: Validity and Reliability of the Persian Version.
Ghaemi, Hamide; Khoddami, Seyyedeh Maryam; Soleymani, Zahra; Zandieh, Fariborz; Jalaie, Shohreh; Ahanchian, Hamid; Khadivi, Ehsan
2017-12-25
The aim of this study was to develop, validate, and assess the reliability of the Persian version of Vocal Cord Dysfunction Questionnaire (VCDQ P ). The study design was cross-sectional or cultural survey. Forty-four patients with vocal fold dysfunction (VFD) and 40 healthy volunteers were recruited for the study. To assess the content validity, the prefinal questions were given to 15 experts to comment on its essential. Ten patients with VFD rated the importance of VCDQ P in detecting face validity. Eighteen of the patients with VFD completed the VCDQ 1 week later for test-retest reliability. To detect absolute reliability, standard error of measurement and smallest detected change were calculated. Concurrent validity was assessed by completing the Persian Chronic Obstructive Pulmonary Disease (COPD) Assessment Test (CAT) by 34 patients with VFD. Discriminant validity was measured from 34 participants. The VCDQ was further validated by administering the questionnaire to 40 healthy volunteers. Validation of the VCDQ as a treatment outcome tool was conducted in 18 patients with VFD using pre- and posttreatment scores. The internal consistency was confirmed (Cronbach α = 0.78). The test-retest reliability was excellent (intraclass correlation coefficient = 0.97). The standard error of measurement and smallest detected change values were acceptable (0.39 and 1.08, respectively). There was a significant correlation between the VCDQ P and the CAT total scores (P < 0.05). Discriminative validity was significantly different. The VCDQ scores in patients with VFD before and after treatment was significantly different (P < 0.001). The VCDQ was cross-culturally adapted to Persian and demonstrated to be a valid and reliable self-administered questionnaire in Persian-speaking population. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
ShahAli, Shabnam; Arab, Amir Massoud; Talebian, Saeed; Ebrahimi, Esmaeil; Bahmani, Andia; Karimi, Noureddin; Nabavi, Hoda
2015-07-01
The study was designed to evaluate the intra-examiner reliability of ultrasound (US) thickness measurement of abdominal muscles activity when supine lying and during two isometric endurance tests in subjects with and without Low back pain (LBP). A total of 19 women (9 with LBP, 10 without LBP) participated in the study. Within-day reliability of the US thickness measurements at supine lying and the two isometric endurance tests were assessed in all subjects. The intra-class correlation coefficient (ICC) was used to assess the relative reliability of thickness measurement. The standard error of measurement (SEM), minimal detectable change (MDC) and the coefficient of variation (CV) were used to evaluate the absolute reliability. Results indicated high ICC scores (0.73-0.99) and also small SEM and MDC scores for within-day reliability assessment. The Bland-Altman plots of agreement in US measurement of the abdominal muscles during the two isometric endurance tests demonstrated that 95% of the observations fall between the limits of agreement for test and retest measurements. Together the results indicate high intra-tester reliability for the US measurement of the thickness of abdominal muscles in all the positions tested. According to the study's findings, US imaging can be used as a reliable method for assessment of abdominal muscles activity in supine lying and the two isometric endurance tests employed, in participants with and without LBP. Copyright © 2014 Elsevier Ltd. All rights reserved.
Reliable Channel-Adapted Error Correction: Bacon-Shor Code Recovery from Amplitude Damping
NASA Astrophysics Data System (ADS)
Piedrafita, Álvaro; Renes, Joseph M.
2017-12-01
We construct two simple error correction schemes adapted to amplitude damping noise for Bacon-Shor codes and investigate their prospects for fault-tolerant implementation. Both consist solely of Clifford gates and require far fewer qubits, relative to the standard method, to achieve exact correction to a desired order in the damping rate. The first, employing one-bit teleportation and single-qubit measurements, needs only one-fourth as many physical qubits, while the second, using just stabilizer measurements and Pauli corrections, needs only half. The improvements stem from the fact that damping events need only be detected, not corrected, and that effective phase errors arising due to undamped qubits occur at a lower rate than damping errors. For error correction that is itself subject to damping noise, we show that existing fault-tolerance methods can be employed for the latter scheme, while the former can be made to avoid potential catastrophic errors and can easily cope with damping faults in ancilla qubits.
Probabilistic confidence for decisions based on uncertain reliability estimates
NASA Astrophysics Data System (ADS)
Reid, Stuart G.
2013-05-01
Reliability assessments are commonly carried out to provide a rational basis for risk-informed decisions concerning the design or maintenance of engineering systems and structures. However, calculated reliabilities and associated probabilities of failure often have significant uncertainties associated with the possible estimation errors relative to the 'true' failure probabilities. For uncertain probabilities of failure, a measure of 'probabilistic confidence' has been proposed to reflect the concern that uncertainty about the true probability of failure could result in a system or structure that is unsafe and could subsequently fail. The paper describes how the concept of probabilistic confidence can be applied to evaluate and appropriately limit the probabilities of failure attributable to particular uncertainties such as design errors that may critically affect the dependability of risk-acceptance decisions. This approach is illustrated with regard to the dependability of structural design processes based on prototype testing with uncertainties attributable to sampling variability.
Soft error evaluation and vulnerability analysis in Xilinx Zynq-7010 system-on chip
NASA Astrophysics Data System (ADS)
Du, Xuecheng; He, Chaohui; Liu, Shuhuan; Zhang, Yao; Li, Yonghong; Xiong, Ceng; Tan, Pengkang
2016-09-01
Radiation-induced soft errors are an increasingly important threat to the reliability of modern electronic systems. In order to evaluate system-on chip's reliability and soft error, the fault tree analysis method was used in this work. The system fault tree was constructed based on Xilinx Zynq-7010 All Programmable SoC. Moreover, the soft error rates of different components in Zynq-7010 SoC were tested by americium-241 alpha radiation source. Furthermore, some parameters that used to evaluate the system's reliability and safety were calculated using Isograph Reliability Workbench 11.0, such as failure rate, unavailability and mean time to failure (MTTF). According to fault tree analysis for system-on chip, the critical blocks and system reliability were evaluated through the qualitative and quantitative analysis.
Systematic behavioural observation of executive performance after brain injury.
Lewis, Mark W; Babbage, Duncan R; Leathem, Janet M
2017-01-01
To develop an ecologically valid measure of executive functioning (i.e. Planning and Organization, Executive Memory, Initiation, Cognitive Shifting, Impulsivity, Sustained and Directed Attention, Error Detection, Error Correction and Time Management) during a functional chocolate brownie cooking task. In Study 1, the inter-rater reliability of a novel behavioural observation assessment method was assessed with 10 people with traumatic brain injury (TBI). In Study 2, 27 people with TBI and 16 healthy controls completed the functional task along with other measures of executive functioning to assess validity. Intraclass correlation coefficients for six of the nine aspects of executive functioning ranged from .54 to 1.00. Percentage agreements for the remaining aspects ranged from 70% to 90%. Significant and non-significant, moderate, correlations were found between the functional cooking task and standard neuropsychological measures. The healthy control group performed better than the TBI group in six areas (d = 0.56 to 1.23). In this initial trial of a novel assessment method, adequate inter-rater reliability was found. The measure was associated with standard neuropsychological measures, and our healthy control group performed better than the TBI group. The measure appears to be an ecologically valid measure of executive functioning.
Uncertainty Analysis of Sonic Boom Levels Measured in a Simulator at NASA Langley
NASA Technical Reports Server (NTRS)
Rathsam, Jonathan; Ely, Jeffry W.
2012-01-01
A sonic boom simulator has been constructed at NASA Langley Research Center for testing the human response to sonic booms heard indoors. Like all measured quantities, sonic boom levels in the simulator are subject to systematic and random errors. To quantify these errors, and their net influence on the measurement result, a formal uncertainty analysis is conducted. Knowledge of the measurement uncertainty, or range of values attributable to the quantity being measured, enables reliable comparisons among measurements at different locations in the simulator as well as comparisons with field data or laboratory data from other simulators. The analysis reported here accounts for acoustic excitation from two sets of loudspeakers: one loudspeaker set at the facility exterior that reproduces the exterior sonic boom waveform and a second set of interior loudspeakers for reproducing indoor rattle sounds. The analysis also addresses the effect of pressure fluctuations generated when exterior doors of the building housing the simulator are opened. An uncertainty budget is assembled to document each uncertainty component, its sensitivity coefficient, and the combined standard uncertainty. The latter quantity will be reported alongside measurement results in future research reports to indicate data reliability.
Human Reliability and the Cost of Doing Business
NASA Technical Reports Server (NTRS)
DeMott, Diana
2014-01-01
Most businesses recognize that people will make mistakes and assume errors are just part of the cost of doing business, but does it need to be? Companies with high risk, or major consequences, should consider the effect of human error. In a variety of industries, Human Errors have caused costly failures and workplace injuries. These have included: airline mishaps, medical malpractice, administration of medication and major oil spills have all been blamed on human error. A technique to mitigate or even eliminate some of these costly human errors is the use of Human Reliability Analysis (HRA). Various methodologies are available to perform Human Reliability Assessments that range from identifying the most likely areas for concern to detailed assessments with human error failure probabilities calculated. Which methodology to use would be based on a variety of factors that would include: 1) how people react and act in different industries, and differing expectations based on industries standards, 2) factors that influence how the human errors could occur such as tasks, tools, environment, workplace, support, training and procedure, 3) type and availability of data and 4) how the industry views risk & reliability influences ( types of emergencies, contingencies and routine tasks versus cost based concerns). The Human Reliability Assessments should be the first step to reduce, mitigate or eliminate the costly mistakes or catastrophic failures. Using Human Reliability techniques to identify and classify human error risks allows a company more opportunities to mitigate or eliminate these risks and prevent costly failures.
The Combined Effects of Measurement Error and Omitting Confounders in the Single-Mediator Model
Fritz, Matthew S.; Kenny, David A.; MacKinnon, David P.
2016-01-01
Mediation analysis requires a number of strong assumptions be met in order to make valid causal inferences. Failing to account for violations of these assumptions, such as not modeling measurement error or omitting a common cause of the effects in the model, can bias the parameter estimates of the mediated effect. When the independent variable is perfectly reliable, for example when participants are randomly assigned to levels of treatment, measurement error in the mediator tends to underestimate the mediated effect, while the omission of a confounding variable of the mediator to outcome relation tends to overestimate the mediated effect. Violations of these two assumptions often co-occur, however, in which case the mediated effect could be overestimated, underestimated, or even, in very rare circumstances, unbiased. In order to explore the combined effect of measurement error and omitted confounders in the same model, the impact of each violation on the single-mediator model is first examined individually. Then the combined effect of having measurement error and omitted confounders in the same model is discussed. Throughout, an empirical example is provided to illustrate the effect of violating these assumptions on the mediated effect. PMID:27739903
Interobserver error involved in independent attempts to measure cusp base areas of Pan M1s
Bailey, Shara E; Pilbrow, Varsha C; Wood, Bernard A
2004-01-01
Cusp base areas measured from digitized images increase the amount of detailed quantitative information one can collect from post-canine crown morphology. Although this method is gaining wide usage for taxonomic analyses of extant and extinct hominoids, the techniques for digitizing images and taking measurements differ between researchers. The aim of this study was to investigate interobserver error in order to help assess the reliability of cusp base area measurement within extant and extinct hominoid taxa. Two of the authors measured individual cusp base areas and total cusp base area of 23 maxillary first molars (M1) of Pan. From these, relative cusp base areas were calculated. No statistically significant interobserver differences were found for either absolute or relative cusp base areas. On average the hypocone and paracone showed the least interobserver error (< 1%) whereas the protocone and metacone showed the most (2.6–4.5%). We suggest that the larger measurement error in the metacone/protocone is due primarily to either weakly defined fissure patterns and/or the presence of accessory occlusal features. Overall, levels of interobserver error are similar to those found for intraobserver error. The results of our study suggest that if certain prescribed standards are employed then cusp and crown base areas measured by different individuals can be pooled into a single database. PMID:15447691
ERIC Educational Resources Information Center
Kim, Young-Suk Grace; Schatschneider, Christopher; Wanzek, Jeanne; Gatlin, Brandy; Al Otaiba, Stephanie
2017-01-01
We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of 0.90 and 0.80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written…
2016-01-01
Background It is often thought that random measurement error has a minor effect upon the results of an epidemiological survey. Theoretically, errors of measurement should always increase the spread of a distribution. Defining an illness by having a measurement outside an established healthy range will lead to an inflated prevalence of that condition if there are measurement errors. Methods and results A Monte Carlo simulation was conducted of anthropometric assessment of children with malnutrition. Random errors of increasing magnitude were imposed upon the populations and showed that there was an increase in the standard deviation with each of the errors that became exponentially greater with the magnitude of the error. The potential magnitude of the resulting error of reported prevalence of malnutrition were compared with published international data and found to be of sufficient magnitude to make a number of surveys and the numerous reports and analyses that used these data unreliable. Conclusions The effect of random error in public health surveys and the data upon which diagnostic cut-off points are derived to define “health” has been underestimated. Even quite modest random errors can more than double the reported prevalence of conditions such as malnutrition. Increasing sample size does not address this problem, and may even result in less accurate estimates. More attention needs to be paid to the selection, calibration and maintenance of instruments, measurer selection, training & supervision, routine estimation of the likely magnitude of errors using standardization tests, use of statistical likelihood of error to exclude data from analysis and full reporting of these procedures in order to judge the reliability of survey reports. PMID:28030627
Scaglioni-Solano, Pietro; Aragón-Vargas, Luis F
2014-06-01
Standing balance is an important motor task. Postural instability associated with age typically arises from deterioration of peripheral sensory systems. The modified Clinical Test of Sensory Integration for Balance and the Tandem test have been used to screen for balance. Timed tests present some limitations, whereas quantification of the motions of the center of pressure (CoP) with portable and inexpensive equipment may help to improve the sensitivity of these tests and give the possibility of widespread use. This study determines the validity and reliability of the Wii Balance Board (Wii BB) to quantify CoP motions during the mentioned tests. Thirty-seven older adults completed three repetitions of five balance conditions: eyes open, eyes closed, eyes open on a compliant surface, eyes closed on a compliant surface, and tandem stance, all performed on a force plate and a Wii BB simultaneously. Twenty participants repeated the trials for reliability purposes. CoP displacement was the main outcome measure. Regression analysis indicated that the Wii BB has excellent concurrent validity, and Bland-Altman plots showed good agreement between devices with small mean differences and no relationship between the difference and the mean. Intraclass correlation coefficients (ICCs) indicated modest-to-excellent test-retest reliability (ICC=0.64-0.85). Standard error of measurement and minimal detectable change were similar for both devices, except the 'eyes closed' condition, with greater standard error of measurement for the Wii BB. In conclusion, the Wii BB is shown to be a valid and reliable method to quantify CoP displacement in older adults.
Accuracy and reliability of forensic latent fingerprint decisions
Ulery, Bradford T.; Hicklin, R. Austin; Buscaglia, JoAnn; Roberts, Maria Antonia
2011-01-01
The interpretation of forensic fingerprint evidence relies on the expertise of latent print examiners. The National Research Council of the National Academies and the legal and forensic sciences communities have called for research to measure the accuracy and reliability of latent print examiners’ decisions, a challenging and complex problem in need of systematic analysis. Our research is focused on the development of empirical approaches to studying this problem. Here, we report on the first large-scale study of the accuracy and reliability of latent print examiners’ decisions, in which 169 latent print examiners each compared approximately 100 pairs of latent and exemplar fingerprints from a pool of 744 pairs. The fingerprints were selected to include a range of attributes and quality encountered in forensic casework, and to be comparable to searches of an automated fingerprint identification system containing more than 58 million subjects. This study evaluated examiners on key decision points in the fingerprint examination process; procedures used operationally include additional safeguards designed to minimize errors. Five examiners made false positive errors for an overall false positive rate of 0.1%. Eighty-five percent of examiners made at least one false negative error for an overall false negative rate of 7.5%. Independent examination of the same comparisons by different participants (analogous to blind verification) was found to detect all false positive errors and the majority of false negative errors in this study. Examiners frequently differed on whether fingerprints were suitable for reaching a conclusion. PMID:21518906
Accuracy and reliability of forensic latent fingerprint decisions.
Ulery, Bradford T; Hicklin, R Austin; Buscaglia, Joann; Roberts, Maria Antonia
2011-05-10
The interpretation of forensic fingerprint evidence relies on the expertise of latent print examiners. The National Research Council of the National Academies and the legal and forensic sciences communities have called for research to measure the accuracy and reliability of latent print examiners' decisions, a challenging and complex problem in need of systematic analysis. Our research is focused on the development of empirical approaches to studying this problem. Here, we report on the first large-scale study of the accuracy and reliability of latent print examiners' decisions, in which 169 latent print examiners each compared approximately 100 pairs of latent and exemplar fingerprints from a pool of 744 pairs. The fingerprints were selected to include a range of attributes and quality encountered in forensic casework, and to be comparable to searches of an automated fingerprint identification system containing more than 58 million subjects. This study evaluated examiners on key decision points in the fingerprint examination process; procedures used operationally include additional safeguards designed to minimize errors. Five examiners made false positive errors for an overall false positive rate of 0.1%. Eighty-five percent of examiners made at least one false negative error for an overall false negative rate of 7.5%. Independent examination of the same comparisons by different participants (analogous to blind verification) was found to detect all false positive errors and the majority of false negative errors in this study. Examiners frequently differed on whether fingerprints were suitable for reaching a conclusion.
Reliable estimation of orbit errors in spaceborne SAR interferometry. The network approach
NASA Astrophysics Data System (ADS)
Bähr, Hermann; Hanssen, Ramon F.
2012-12-01
An approach to improve orbital state vectors by orbit error estimates derived from residual phase patterns in synthetic aperture radar interferograms is presented. For individual interferograms, an error representation by two parameters is motivated: the baseline error in cross-range and the rate of change of the baseline error in range. For their estimation, two alternatives are proposed: a least squares approach that requires prior unwrapping and a less reliable gridsearch method handling the wrapped phase. In both cases, reliability is enhanced by mutual control of error estimates in an overdetermined network of linearly dependent interferometric combinations of images. Thus, systematic biases, e.g., due to unwrapping errors, can be detected and iteratively eliminated. Regularising the solution by a minimum-norm condition results in quasi-absolute orbit errors that refer to particular images. For the 31 images of a sample ENVISAT dataset, orbit corrections with a mutual consistency on the millimetre level have been inferred from 163 interferograms. The method itself qualifies by reliability and rigorous geometric modelling of the orbital error signal but does not consider interfering large scale deformation effects. However, a separation may be feasible in a combined processing with persistent scatterer approaches or by temporal filtering of the estimates.
NASA Astrophysics Data System (ADS)
Winiarek, Victor; Bocquet, Marc; Duhanyan, Nora; Roustan, Yelva; Saunier, Olivier; Mathieu, Anne
2014-01-01
Inverse modelling techniques can be used to estimate the amount of radionuclides and the temporal profile of the source term released in the atmosphere during the accident of the Fukushima Daiichi nuclear power plant in March 2011. In Winiarek et al. (2012b), the lower bounds of the caesium-137 and iodine-131 source terms were estimated with such techniques, using activity concentration measurements. The importance of an objective assessment of prior errors (the observation errors and the background errors) was emphasised for a reliable inversion. In such critical context where the meteorological conditions can make the source term partly unobservable and where only a few observations are available, such prior estimation techniques are mandatory, the retrieved source term being very sensitive to this estimation. We propose to extend the use of these techniques to the estimation of prior errors when assimilating observations from several data sets. The aim is to compute an estimate of the caesium-137 source term jointly using all available data about this radionuclide, such as activity concentrations in the air, but also daily fallout measurements and total cumulated fallout measurements. It is crucial to properly and simultaneously estimate the background errors and the prior errors relative to each data set. A proper estimation of prior errors is also a necessary condition to reliably estimate the a posteriori uncertainty of the estimated source term. Using such techniques, we retrieve a total released quantity of caesium-137 in the interval 11.6-19.3 PBq with an estimated standard deviation range of 15-20% depending on the method and the data sets. The “blind” time intervals of the source term have also been strongly mitigated compared to the first estimations with only activity concentration data.
ERIC Educational Resources Information Center
Onwuegbuzie, Anthony J.; Daniel, Larry G.
The purposes of this paper are to identify common errors made by researchers when dealing with reliability coefficients and to outline best practices for reporting and interpreting reliability coefficients. Common errors that researchers make are: (1) stating that the instruments are reliable; (2) incorrectly interpreting correlation coefficients;…
On the Reliability of Photovoltaic Short-Circuit Current Temperature Coefficient Measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osterwald, Carl R.; Campanelli, Mark; Kelly, George J.
2015-06-14
The changes in short-circuit current of photovoltaic (PV) cells and modules with temperature are routinely modeled through a single parameter, the temperature coefficient (TC). This parameter is vital for the translation equations used in system sizing, yet in practice is very difficult to measure. In this paper, we discuss these inherent problems and demonstrate how they can introduce unacceptably large errors in PV ratings. A method for quantifying the spectral dependence of TCs is derived, and then used to demonstrate that databases of module parameters commonly contain values that are physically unreasonable. Possible ways to reduce measurement errors are alsomore » discussed.« less
Reliable and accurate extraction of Hamaker constants from surface force measurements.
Miklavcic, S J
2018-08-15
A simple and accurate closed-form expression for the Hamaker constant that best represents experimental surface force data is presented. Numerical comparisons are made with the current standard least squares approach, which falsely assumes error-free separation measurements, and a nonlinear version assuming independent measurements of force and separation are subject to error. The comparisons demonstrate that not only is the proposed formula easily implemented it is also considerably more accurate. This option is appropriate for any value of Hamaker constant, high or low, and certainly for any interacting system exhibiting an inverse square distance dependent van der Waals force. Copyright © 2018 Elsevier Inc. All rights reserved.
On the Power of Multivariate Latent Growth Curve Models to Detect Correlated Change
ERIC Educational Resources Information Center
Hertzog, Christopher; Lindenberger, Ulman; Ghisletta, Paolo; Oertzen, Timo von
2006-01-01
We evaluated the statistical power of single-indicator latent growth curve models (LGCMs) to detect correlated change between two variables (covariance of slopes) as a function of sample size, number of longitudinal measurement occasions, and reliability (measurement error variance). Power approximations following the method of Satorra and Saris…
Modeling of a bubble-memory organization with self-checking translators to achieve high reliability.
NASA Technical Reports Server (NTRS)
Bouricius, W. G.; Carter, W. C.; Hsieh, E. P.; Wadia, A. B.; Jessep, D. C., Jr.
1973-01-01
Study of the design and modeling of a highly reliable bubble-memory system that has the capabilities of: (1) correcting a single 16-adjacent bit-group error resulting from failures in a single basic storage module (BSM), and (2) detecting with a probability greater than 0.99 any double errors resulting from failures in BSM's. The results of the study justify the design philosophy adopted of employing memory data encoding and a translator to correct single group errors and detect double group errors to enhance the overall system reliability.
Instrumental variables vs. grouping approach for reducing bias due to measurement error.
Batistatou, Evridiki; McNamee, Roseanne
2008-01-01
Attenuation of the exposure-response relationship due to exposure measurement error is often encountered in epidemiology. Given that error cannot be totally eliminated, bias correction methods of analysis are needed. Many methods require more than one exposure measurement per person to be made, but the `group mean OLS method,' in which subjects are grouped into several a priori defined groups followed by ordinary least squares (OLS) regression on the group means, can be applied with one measurement. An alternative approach is to use an instrumental variable (IV) method in which both the single error-prone measure and an IV are used in IV analysis. In this paper we show that the `group mean OLS' estimator is equal to an IV estimator with the group mean used as IV, but that the variance estimators for the two methods are different. We derive a simple expression for the bias in the common estimator which is a simple function of group size, reliability and contrast of exposure between groups, and show that the bias can be very small when group size is large. We compare this method with a new proposal (group mean ranking method), also applicable with a single exposure measurement, in which the IV is the rank of the group means. When there are two independent exposure measurements per subject, we propose a new IV method (EVROS IV) and compare it with Carroll and Stefanski's (CS IV) proposal in which the second measure is used as an IV; the new IV estimator combines aspects of the `group mean' and `CS' strategies. All methods are evaluated in terms of bias, precision and root mean square error via simulations and a dataset from occupational epidemiology. The `group mean ranking method' does not offer much improvement over the `group mean method.' Compared with the `CS' method, the `EVROS' method is less affected by low reliability of exposure. We conclude that the group IV methods we propose may provide a useful way to handle mismeasured exposures in epidemiology with or without replicate measurements. Our finding may also have implications for the use of aggregate variables in epidemiology to control for unmeasured confounding.
Reliability and coverage analysis of non-repairable fault-tolerant memory systems
NASA Technical Reports Server (NTRS)
Cox, G. W.; Carroll, B. D.
1976-01-01
A method was developed for the construction of probabilistic state-space models for nonrepairable systems. Models were developed for several systems which achieved reliability improvement by means of error-coding, modularized sparing, massive replication and other fault-tolerant techniques. From the models developed, sets of reliability and coverage equations for the systems were developed. Comparative analyses of the systems were performed using these equation sets. In addition, the effects of varying subunit reliabilities on system reliability and coverage were described. The results of these analyses indicated that a significant gain in system reliability may be achieved by use of combinations of modularized sparing, error coding, and software error control. For sufficiently reliable system subunits, this gain may far exceed the reliability gain achieved by use of massive replication techniques, yet result in a considerable saving in system cost.
Measurement of the True Dynamic and Static Pressures in Flight
NASA Technical Reports Server (NTRS)
Kiel, Georg
1939-01-01
In this report, two reliable methods are presented, with the aid of which the undisturbed flight dynamic pressure and the true static pressure may be determined without error. These problems were solved chiefly through practical flight tests.
Online beam energy measurement of Beijing electron positron collider II linear accelerator
NASA Astrophysics Data System (ADS)
Wang, S.; Iqbal, M.; Liu, R.; Chi, Y.
2016-02-01
This paper describes online beam energy measurement of Beijing Electron Positron Collider upgraded version II linear accelerator (linac) adequately. It presents the calculation formula, gives the error analysis in detail, discusses the realization in practice, and makes some verification. The method mentioned here measures the beam energy by acquiring the horizontal beam position with three beam position monitors (BPMs), which eliminates the effect of orbit fluctuation, and is much better than the one using the single BPM. The error analysis indicates that this online measurement has further potential usage such as a part of beam energy feedback system. The reliability of this method is also discussed and demonstrated in this paper.
Online beam energy measurement of Beijing electron positron collider II linear accelerator.
Wang, S; Iqbal, M; Liu, R; Chi, Y
2016-02-01
This paper describes online beam energy measurement of Beijing Electron Positron Collider upgraded version II linear accelerator (linac) adequately. It presents the calculation formula, gives the error analysis in detail, discusses the realization in practice, and makes some verification. The method mentioned here measures the beam energy by acquiring the horizontal beam position with three beam position monitors (BPMs), which eliminates the effect of orbit fluctuation, and is much better than the one using the single BPM. The error analysis indicates that this online measurement has further potential usage such as a part of beam energy feedback system. The reliability of this method is also discussed and demonstrated in this paper.
Gatti, Anthony A; Stratford, Paul W; Brenneman, Elora C; Maly, Monica R
2016-01-01
Accelerometers provide a measure of step-count. Reliability and validity of step-count and pedal-revolution count measurements by the GT3X+ accelerometer, placed at different anatomical locations, is absent in the literature. The purpose of this study was to investigate the reliability and validity of step and pedal-revolution counts produced by the GT3X+ placed at different anatomical locations during running and bicycling. Twenty-two healthy adults (14 men and 8 women) completed running and bicycling activity bouts (5 minutes each) while wearing 6 accelerometers: 2 each at the waist, thigh and shank. Accelerometer and video data were collected during activity. Excellent reliability and validity were found for measurements taken from accelerometers mounted at the waist and shank during running (Reliability: intraclass correlation (ICC) ≥ 0.99; standard error of measurement (SEM) ≤1.0 steps; Pearson ≥ 0.99) and at the thigh and shank during bicycling (Reliability: ICC ≥ 0.99; SEM ≤1.0 revolutions; Pearson ≥ 0.99). Excellent reliability was found between measurements taken at the waist and shank during running (ICC ≥ 0.98; SEM ≤1.6 steps) and between measurements taken at the thigh and shank during bicycling (ICC ≥ 0.99; SEM ≤1.0 revolutions). These data suggest that the GT3X+ can be used for measuring step-count during running and pedal-revolution count during bicycling. Only shank placement is recommended for both activities.
Short version of the Depression Anxiety Stress Scale-21: is it valid for Brazilian adolescents?
da Silva, Hítalo Andrade; dos Passos, Muana Hiandra Pereira; de Oliveira, Valéria Mayaly Alves; Palmeira, Aline Cabral; Pitangui, Ana Carolina Rodarti; de Araújo, Rodrigo Cappato
2016-01-01
ABSTRACT Objective To evaluate the interday reproducibility, agreement and validity of the construct of short version of the Depression Anxiety Stress Scale-21 applied to adolescents. Methods The sample consisted of adolescents of both sexes, aged between 10 and 19 years, who were recruited from schools and sports centers. The validity of the construct was performed by exploratory factor analysis, and reliability was calculated for each construct using the intraclass correlation coefficient, standard error of measurement and the minimum detectable change. Results The factor analysis combining the items corresponding to anxiety and stress in a single factor, and depression in a second factor, showed a better match of all 21 items, with higher factor loadings in their respective constructs. The reproducibility values for depression were intraclass correlation coefficient with 0.86, standard error of measurement with 0.80, and minimum detectable change with 2.22; and, for anxiety/stress: intraclass correlation coefficient with 0.82, standard error of measurement with 1.80, and minimum detectable change with 4.99. Conclusion The short version of the Depression Anxiety Stress Scale-21 showed excellent values of reliability, and strong internal consistency. The two-factor model with condensation of the constructs anxiety and stress in a single factor was the most acceptable for the adolescent population. PMID:28076595
Using meta-quality to assess the utility of volunteered geographic information for science.
Langley, Shaun A; Messina, Joseph P; Moore, Nathan
2017-11-06
Volunteered geographic information (VGI) has strong potential to be increasingly valuable to scientists in collaboration with non-scientists. The abundance of mobile phones and other wireless forms of communication open up significant opportunities for the public to get involved in scientific research. As these devices and activities become more abundant, questions of uncertainty and error in volunteer data are emerging as critical components for using volunteer-sourced spatial data. Here we present a methodology for using VGI and assessing its sensitivity to three types of error. More specifically, this study evaluates the reliability of data from volunteers based on their historical patterns. The specific context is a case study in surveillance of tsetse flies, a health concern for being the primary vector of African Trypanosomiasis. Reliability, as measured by a reputation score, determines the threshold for accepting the volunteered data for inclusion in a tsetse presence/absence model. Higher reputation scores are successful in identifying areas of higher modeled tsetse prevalence. A dynamic threshold is needed but the quality of VGI will improve as more data are collected and the errors in identifying reliable participants will decrease. This system allows for two-way communication between researchers and the public, and a way to evaluate the reliability of VGI. Boosting the public's ability to participate in such work can improve disease surveillance and promote citizen science. In the absence of active surveillance, VGI can provide valuable spatial information given that the data are reliable.
Hajcak, Greg; Meyer, Alexandria; Kotov, Roman
2017-08-01
In the clinical neuroscience literature, between-subjects differences in neural activity are presumed to reflect reliable measures-even though the psychometric properties of neural measures are almost never reported. The current article focuses on the critical importance of assessing and reporting internal consistency reliability-the homogeneity of "items" that comprise a neural "score." We demonstrate how variability in the internal consistency of neural measures limits between-subjects (i.e., individual differences) effects. To this end, we utilize error-related brain activity (i.e., the error-related negativity or ERN) in both healthy and generalized anxiety disorder (GAD) participants to demonstrate options for psychometric analyses of neural measures; we examine between-groups differences in internal consistency, between-groups effect sizes, and between-groups discriminability (i.e., ROC analyses)-all as a function of increasing items (i.e., number of trials). Overall, internal consistency should be used to inform experimental design and the choice of neural measures in individual differences research. The internal consistency of neural measures is necessary for interpreting results and guiding progress in clinical neuroscience-and should be routinely reported in all individual differences studies. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Gasq, David; Labrunée, Marc; Amarantini, David; Dupui, Philippe; Montoya, Richard; Marque, Philippe
2014-03-21
Stroke patients have impaired postural balance that increases the risk of falls and impairs their mobility. Assessment of postural balance is commonly carried out by recording centre of pressure (CoP) displacements, but the lack of data concerning reliability of these measures compromises their interpretation. The purpose of this study was to investigate the between-day reliability of six CoP-based variables, in order to provide i) reliability data for monitoring postural sway and weight-bearing asymmetry of stroke patients in clinical practice and ii) consistent assessment method of measurement error for applications in physical medicine and rehabilitation. Postural balance of 20 stroke patients was assessed in quiet standing on a force platform, in two sessions, 7 days apart. Six CoP-based variables were collected in eyes open and eyes closed conditions: postural sway was assessed with mean and standart deviation of CoP-velocity, CoP-velocity along the mediolateral and anteroposterior axes, and confidence ellipse area (CE(AREA)); weight-bearing asymmetry was assessed with mean CoP position along the mediolateral axis (CoP(ML)). The intraclass correlation coefficient (ICC) was used to determine the level of agreement between test-retest. Small real difference (SRD), corresponding to the smallest change that indicates a real improvement for a single individual, was used to determine the extent of measurement error. ICCs were satisfactory (>0.9) for all CoP-based variables, except for CE(AREA) in eyes open condition and CoP(ML) (<0.8). The SRDs (eyes open/closed conditions) were: 6.1/9.5 mm.s(-1) for mean velocity; 12.3/12.2 mm.s(-1) for standard deviation of CoP-velocity; 3.6/5.5 mm.s(-1) and 4.9/7.3 mm.s(-1) for CoP-velocity in mediolateral and anteroposterior axes, respectively; 17.4/21.4 mm for CoP(ML). Because CE(AREA) showed heteroscedasticity of measurement error distribution, SRD (eyes open/closed conditions) was expressed as a percentage (121/75%) and a ratio (3.68/2.16) obtained after log-antilog procedure. In clinical practice, the CoP-based velocity variables should be prefer to CE(AREA) to assess and monitor postural sway over time in hemiplegic stroke patients. The poor reliability of CoP(ML) compromises its use to assess weight-bearing asymmetry. The procedure we used could be applied in reliability studies concerning other CoP-based variables or other biological variables in the field of physical medicine and rehabilitation.
Failure analysis and modeling of a VAXcluster system
NASA Technical Reports Server (NTRS)
Tang, Dong; Iyer, Ravishankar K.; Subramani, Sujatha S.
1990-01-01
This paper discusses the results of a measurement-based analysis of real error data collected from a DEC VAXcluster multicomputer system. In addition to evaluating basic system dependability characteristics such as error and failure distributions and hazard rates for both individual machines and for the VAXcluster, reward models were developed to analyze the impact of failures on the system as a whole. The results show that more than 46 percent of all failures were due to errors in shared resources. This is despite the fact that these errors have a recovery probability greater than 0.99. The hazard rate calculations show that not only errors, but also failures occur in bursts. Approximately 40 percent of all failures occur in bursts and involved multiple machines. This result indicates that correlated failures are significant. Analysis of rewards shows that software errors have the lowest reward (0.05 vs 0.74 for disk errors). The expected reward rate (reliability measure) of the VAXcluster drops to 0.5 in 18 hours for the 7-out-of-7 model and in 80 days for the 3-out-of-7 model.
Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J
2014-05-01
Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.
NASA Astrophysics Data System (ADS)
Sousa, Andre R.; Schneider, Carlos A.
2001-09-01
A touch probe is used on a 3-axis vertical machine center to check against a hole plate, calibrated on a coordinate measuring machine (CMM). By comparing the results obtained from the machine tool and CMM, the main machine tool error components are measured, attesting the machine accuracy. The error values can b used also t update the error compensation table at the CNC, enhancing the machine accuracy. The method is easy to us, has a lower cost than classical test techniques, and preliminary results have shown that its uncertainty is comparable to well established techniques. In this paper the method is compared with the laser interferometric system, regarding reliability, cost and time efficiency.
Inter-rater Reliability of Real-Time Ultrasound to Measure Acromiohumeral Distance.
Mackenzie, Tanya Anne; Bdaiwi, Alya H; Herrington, Lee; Cools, Ann
2016-07-01
Real-time ultrasound (RTUS) has been suggested as a reliable measure of acromiohumeral distance. However, to date, no vigorous assessment and reporting of inter-rater reliability of this method has been performed with the shoulder in a neutral position or with active and passive arm abduction. To assess intrasession inter-rater reliability of using RTUS to measure acromiohumeral distance with the shoulder in a neutral position and with 60° active and passive abduction. Inter-rater intrasession reliability of repeated measures. Human performance laboratory. Twenty persons (12 male and 8 female) with an average age of 29.86 years (standard deviation, 7.8). In an inter-rater, intrasession study, RTUS was used to measure the acromiohumeral distance with the shoulder in a neutral position and with 60° of both active and passive abduction. Acromiohumeral distance. Intraclass correlation coefficient (ICC)2.1 scores ranged between 0.65-0.88 (standard error of the mean = 0.81-1.2 mm and minimal detectable differences with 95% confidence = 2.2-2.3 mm) for inter-rater intrasession reliability. RTUS was found to have fair to good inter-rater reliability as a tool to measure acromiohumeral distance with the shoulder in a neutral position and with 60° of both active and passive arm abduction. Copyright © 2016 American Academy of Physical Medicine and Rehabilitation. Published by Elsevier Inc. All rights reserved.
2017-01-01
Anthropometric data collected in clinics and surveys are often inaccurate and unreliable due to measurement error. The Body Imaging for Nutritional Assessment Study (BINA) evaluated the ability of 3D imaging to correctly measure stature, head circumference (HC) and arm circumference (MUAC) for children under five years of age. This paper describes the protocol for and the quality of manual anthropometric measurements in BINA, a study conducted in 2016–17 in Atlanta, USA. Quality was evaluated by examining digit preference, biological plausibility of z-scores, z-score standard deviations, and reliability. We calculated z-scores and analyzed plausibility based on the 2006 WHO Child Growth Standards (CGS). For reliability, we calculated intra- and inter-observer Technical Error of Measurement (TEM) and Intraclass Correlation Coefficient (ICC). We found low digit preference; 99.6% of z-scores were biologically plausible, with z-score standard deviations ranging from 0.92 to 1.07. Total TEM was 0.40 for stature, 0.28 for HC, and 0.25 for MUAC in centimeters. ICC ranged from 0.99 to 1.00. The quality of manual measurements in BINA was high and similar to that of the anthropometric data used to develop the WHO CGS. We attributed high quality to vigorous training, motivated and competent field staff, reduction of non-measurement error through the use of technology, and reduction of measurement error through adequate monitoring and supervision. Our anthropometry measurement protocol, which builds on and improves upon the protocol used for the WHO CGS, can be used to improve anthropometric data quality. The discussion illustrates the need to standardize anthropometric data quality assessment, and we conclude that BINA can provide a valuable evaluation of 3D imaging for child anthropometry because there is comparison to gold-standard, manual measurements. PMID:29240796
Ahlström, Isabell; Hellström, Karin; Emtner, Margareta; Anens, Elisabeth
2015-03-01
To examine the test-retest reliability of the Swedish translated version of the Exercise Self-Efficacy Scale (S-ESES) in people with neurological disease and to examine internal consistency. Test-retest study. A total of 30 adults with neurological diseases including: Parkinson's disease; Multiple Sclerosis; Cervical Dystonia; and Charcot-Marie-Tooth disease. The S-ESES was sent twice by surface mail. Completion interval mean was 16 days apart. Weighted kappa, intraclass correlation coefficient 2,1 [ICC (2,1)], standard error of measurement (SEM), also expressed as a percentage value (SEM%), and Cronbach's alpha were calculated. The relative reliability of the test-retest results showed substantial agreement measured using weighted kappa (MD = 0.62) and a very high-reliability ICC (2,1) (0.92). Absolute reliability measured using SEM was 5.3 and SEM% was 20.7. Excellent internal consistency was shown, with an alpha coefficient of 0.91 (test 1) and 0.93 (test 2). The S-ESES is recommended for use in research and in clinical work for people with neurological diseases. The low-absolute reliability, however, indicates a limited ability to measure changes on an individual level.
Oliveira, Ana; Cruz, Joana; Jácome, Cristina; Marques, Alda
2018-01-01
Purpose: To estimate the within-day test-retest reliability and standard error of measurement (SEM) of the unsupported upper limb exercise test (UULEX) in adults without disabilities and to determine the effects of age and gender on performance of the UULEX. Method: A cross-sectional study was conducted with 100 adults without disabilities (44 men, mean age 44.2 [SD 26] y; 56 women, mean age 38.1 [SD 24.1] y). Participants performed three UULEX tests to establish within-day reliability, measured using an intra-class correlation coefficient (ICC) model 2 (two-way random effects) with a single rater (ICC[2,1]) and SEM. The effects of age and gender were examined using two-factor mixed-design analysis of variance (ANOVA) and one-way repeated-measures ANOVA. For analysis purposes, four sub-groups were created: younger adults, older adults, men, and women. Results: Excellent within-day reliability and a small SEM were found in the four sub-groups (younger adults: ICC[2,1]=0.88; 95% CI: 0.82, 0.92; SEM∼40 s; older adults: ICC[2,1]=0.82; 95% CI: 0.72, 0.90; SEM∼50 s; men: ICC[2,1]=0.93; 95% CI: 0.88, 0.96; SEM∼30 s; women: ICC[2,1]=0.85; 95% CI: 0.78, 0.91; SEM∼45 s). Younger adults took, on average, 308.24 seconds longer than older adults to perform the test; older adults performed significantly better on the third test ( p <0.0001; η 2 =0.096). Gender effects were not found ( p >0.05). Conclusion: The within-day test-retest reliability and SEM values of the UULEX may be used to define the magnitude of the error obtained with repeated measures. One UULEX test seems to be adequate for younger adults to achieve reliable results, whereas three tests seem to be needed for older adults.
Nkenke, Emeka; Lehner, Bernhard; Kramer, Manuel; Haeusler, Gerd; Benz, Stefanie; Schuster, Maria; Neukam, Friedrich W; Vairaktaris, Eleftherios G; Wurm, Jochen
2006-03-01
To assess measurement errors of a novel technique for the three-dimensional determination of the degree of facial symmetry in patients suffering from unilateral cleft lip and palate malformations. Technical report, reliability study. Cleft Lip and Palate Center of the University of Erlangen-Nuremberg, Erlangen, Germany. The three-dimensional facial surface data of five 10-year-old unilateral cleft lip and palate patients were subjected to the analysis. Distances, angles, surface areas, and volumes were assessed twice. Calculations were made for method error, intraclass correlation coefficient, and repeatability of the measurements of distances, angles, surface areas, and volumes. The method errors were less than 1 mm for distances and less than 1.5 degrees for angles. The intraclass correlation coefficients showed values greater than .90 for all parameters. The repeatability values were comparable for cleft and noncleft sides. The small method errors, high intraclass correlation coefficients, and comparable repeatability values for cleft and noncleft sides reveal that the new technique is appropriate for clinical use.
Pearson, Richard
2011-03-01
To assess the possibility of estimating the refractive index of rigid contact lenses on the basis of measurements of their back vertex power (BVP) in air and when immersed in liquid. First, a spreadsheet model was used to quantify the magnitude of errors arising from simulated inaccuracies in the variables required to calculate refractive index. Then, refractive index was calculated from in-air and in-liquid measurements of BVP of 21 lenses that had been made in three negative BVPs from materials with seven different nominal refractive index values. The power measurements were made by two operators on two occasions. Intraobserver reliability showed a mean difference of 0.0033±0.0061 (t = 0.544, P = 0.59), interobserver reliability showed a mean difference of 0.0043±0.0061 (t = 0.707, P = 0.48), and the mean difference between the nominal and calculated refractive index values was -0.0010±0.0111 (t = -0.093, P = 0.93). The spreadsheet prediction that low-powered lenses might be subject to greater errors in the calculated values of refractive index was substantiated by the experimental results. This method shows good intra and interobserver reliabilities and can be used easily in a clinical setting to provide an estimate of the refractive index of rigid contact lenses having a BVP of 3 D or more.
A systematic review of the measurement properties of the Body Image Scale (BIS) in cancer patients.
Melissant, Heleen C; Neijenhuijs, Koen I; Jansen, Femke; Aaronson, Neil K; Groenvold, Mogens; Holzner, Bernhard; Terwee, Caroline B; van Uden-Kraan, Cornelia F; Cuijpers, Pim; Verdonck-de Leeuw, Irma M
2018-06-01
Body image is acknowledged as an important aspect of health-related quality of life in cancer patients. The Body Image Scale (BIS) is a patient-reported outcome measure (PROM) to evaluate body image in cancer patients. The aim of this study was to systematically review measurement properties of the BIS among cancer patients. A search in Embase, MEDLINE, PsycINFO, and Web of Science was performed to identify studies that investigated measurement properties of the BIS (Prospero ID 42017057237). Study quality was assessed (excellent, good, fair, poor), and data were extracted and analyzed according to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) methodology on structural validity, internal consistency, reliability, measurement error, hypothesis testing for construct validity, and responsiveness. Evidence was categorized into sufficient, insufficient, inconsistent, or indeterminate. Nine studies were included. Evidence was sufficient for structural validity (one factor solution), internal consistency (α = 0.86-0.96), and reliability (r > 0.70); indeterminate for measurement error (information on minimal important change lacked) and responsiveness (increasing body image disturbance in only one study); and inconsistent for hypothesis testing (conflicting results). Quality of the evidence was moderate to low. No studies reported on cross-cultural validity. The BIS is a PROM with good structural validity, internal consistency, and test-retest reliability, but good quality studies on the other measurement properties are needed to optimize evidence. It is recommended to include a wider variety of cancer diagnoses and treatment modalities in these future studies.
Stolinski, L; Kozinoga, M; Czaprowski, D; Tyrakowski, M; Cerny, P; Suzuki, N; Kotwicki, T
2017-01-01
Digital photogrammetry provides measurements of body angles or distances which allow for quantitative posture assessment with or without the use of external markers. It is becoming an increasingly popular tool for the assessment of the musculoskeletal system. The aim of this paper is to present a structured method for the analysis of posture and its changes using a standardized digital photography technique. The purpose of the study was twofold. The first one comprised 91 children (44 girls and 47 boys) aged 7-10 (8.2 ± 1.0), i.e., students of primary school, and its aim was to develop the photographic method, choose the quantitative parameters, and determine the intraobserver reliability (repeatability) along with the interobserver reliability (reproducibility) measurements in sagittal plane using digital photography, as well as to compare the Rippstein plurimeter and digital photography measurements. The second one involved 7782 children (3804 girls, 3978 boys) aged 7-10 (8.4 ± 0.5), who underwent digital photography postural screening. The methods consisted in measuring and calculating selected parameters, establishing the normal ranges of photographic parameters, presenting percentile charts, as well as noticing common pitfalls and possible sources of errors in digital photography. A standardized procedure for the photographic evaluation of child body posture was presented. The photographic measurements revealed very good intra- and inter-rater reliability regarding the five sagittal parameters and good reliability performed against Rippstein plurimeter measurements. The parameters displayed insignificant variability over time. Normative data were calculated based on photographic assessment, while the percentile charts were provided to serve as reference values. The technical errors observed during photogrammetry are carefully discussed in this article. Technical developments are allowed for the regular use of digital photogrammetry in body posture assessment. Specific child positioning (described above) enables us to avoid incidentally modified posture. Image registration is simple, quick, harmless, and cost-effective. The semi-automatic image analysis, together with the normal values and percentile charts, makes the technique reliable in terms of child's posture documentation and corrective therapy effects' monitoring.
Kofman, Rianne; Beekman, Anna M; Emmelot, Cornelis H; Geertzen, Jan H B; Dijkstra, Pieter U
2018-06-01
Non-contact scanners may have potential for measurement of residual limb volume. Different non-contact scanners have been introduced during the last decades. Reliability and usability (practicality and user friendliness) should be assessed before introducing these systems in clinical practice. The aim of this study was to analyze the measurement properties and usability of four non-contact scanners (TT Design, Omega Scanner, BioSculptor Bioscanner, and Rodin4D Scanner). Quasi experimental. Nine (geometric and residual limb) models were measured on two occasions, each consisting of two sessions, thus in total 4 sessions. In each session, four observers used the four systems for volume measurement. Mean for each model, repeatability coefficients for each system, variance components, and their two-way interactions of measurement conditions were calculated. User satisfaction was evaluated with the Post-Study System Usability Questionnaire. Systematic differences between the systems were found in volume measurements. Most of the variances were explained by the model (97%), while error variance was 3%. Measurement system and the interaction between system and model explained 44% of the error variance. Repeatability coefficient of the systems ranged from 0.101 (Omega Scanner) to 0.131 L (Rodin4D). Differences in Post-Study System Usability Questionnaire scores between the systems were small and not significant. The systems were reliable in determining residual limb volume. Measurement systems and the interaction between system and residual limb model explained most of the error variances. The differences in repeatability coefficient and usability between the four CAD/CAM systems were small. Clinical relevance If accurate measurements of residual limb volume are required (in case of research), modern non-contact scanners should be taken in consideration nowadays.
Error Estimation and Uncertainty Propagation in Computational Fluid Mechanics
NASA Technical Reports Server (NTRS)
Zhu, J. Z.; He, Guowei; Bushnell, Dennis M. (Technical Monitor)
2002-01-01
Numerical simulation has now become an integral part of engineering design process. Critical design decisions are routinely made based on the simulation results and conclusions. Verification and validation of the reliability of the numerical simulation is therefore vitally important in the engineering design processes. We propose to develop theories and methodologies that can automatically provide quantitative information about the reliability of the numerical simulation by estimating numerical approximation error, computational model induced errors and the uncertainties contained in the mathematical models so that the reliability of the numerical simulation can be verified and validated. We also propose to develop and implement methodologies and techniques that can control the error and uncertainty during the numerical simulation so that the reliability of the numerical simulation can be improved.
Towards automatic Markov reliability modeling of computer architectures
NASA Technical Reports Server (NTRS)
Liceaga, C. A.; Siewiorek, D. P.
1986-01-01
The analysis and evaluation of reliability measures using time-varying Markov models is required for Processor-Memory-Switch (PMS) structures that have competing processes such as standby redundancy and repair, or renewal processes such as transient or intermittent faults. The task of generating these models is tedious and prone to human error due to the large number of states and transitions involved in any reasonable system. Therefore model formulation is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model formulation. This paper presents an overview of the Automated Reliability Modeling (ARM) program, under development at NASA Langley Research Center. ARM will accept as input a description of the PMS interconnection graph, the behavior of the PMS components, the fault-tolerant strategies, and the operational requirements. The output of ARM will be the reliability of availability Markov model formulated for direct use by evaluation programs. The advantages of such an approach are (a) utility to a large class of users, not necessarily expert in reliability analysis, and (b) a lower probability of human error in the computation.
NASA Astrophysics Data System (ADS)
Utegulov, B. B.
2018-02-01
In the work the study of the developed method was carried out for reliability by analyzing the error in indirect determination of the insulation parameters in an asymmetric network with an isolated neutral voltage above 1000 V. The conducted studies of the random relative mean square errors show that the accuracy of indirect measurements in the developed method can be effectively regulated not only by selecting a capacitive additional conductivity, which are connected between phases of the electrical network and the ground, but also by the selection of measuring instruments according to the accuracy class. When choosing meters with accuracy class of 0.5 with the correct selection of capacitive additional conductivity that are connected between the phases of the electrical network and the ground, the errors in measuring the insulation parameters will not exceed 10%.
The Truth about Scores Children Achieve on Tests.
ERIC Educational Resources Information Center
Brown, Jonathan R.
1989-01-01
The importance of using the standard error of measurement (SEm) in determining reliability in test scores is emphasized. The SEm is compared to the hypothetical true score for standardized tests, and procedures for calculation of the SEm are explained. (JDD)
Between-day reliability of the trapezius muscle H-reflex and M-wave.
Vangsgaard, Steffen; Hansen, Ernst A; Madeleine, Pascal
2015-12-01
The aim of this study was to investigate the between-day reliability of the trapezius muscle H-reflex and M-wave. Sixteen healthy subjects were studied on 2 consecutive days. Trapezius muscle H-reflexes were evoked by electrical stimulation of the C3/4 cervical nerves; M-waves were evoked by electrical stimulation of the accessory nerve. Relative reliability was estimated by intraclass correlation coefficients (ICC2,1 ). Absolute reliability was estimated by computing the standard error of measurement (SEM) and the smallest real difference (SRD). Bland-Altman plots were constructed to detect any systematic bias. Variables showed substantial to excellent relative reliability (ICC = 0.70-0.99). The relative SEM ranged from 1.4% to 34.8%; relative SRD ranged from 3.8% to 96.5%. No systematic bias was present in the data. The amplitude and latency of the trapezius muscle H-reflex and M-wave in healthy young subjects can be measured reliably across days. © 2015 Wiley Periodicals, Inc.
Foster, J D; Miskovic, D; Allison, A S; Conti, J A; Ockrim, J; Cooper, E J; Hanna, G B; Francis, N K
2016-06-01
Laparoscopic rectal resection is technically challenging, with outcomes dependent upon technical performance. No robust objective assessment tool exists for laparoscopic rectal resection surgery. This study aimed to investigate the application of the objective clinical human reliability analysis (OCHRA) technique for assessing technical performance of laparoscopic rectal surgery and explore the validity and reliability of this technique. Laparoscopic rectal cancer resection operations were described in the format of a hierarchical task analysis. Potential technical errors were defined. The OCHRA technique was used to identify technical errors enacted in videos of twenty consecutive laparoscopic rectal cancer resection operations from a single site. The procedural task, spatial location, and circumstances of all identified errors were logged. Clinical validity was assessed through correlation with clinical outcomes; reliability was assessed by test-retest. A total of 335 execution errors identified, with a median 15 per operation. More errors were observed during pelvic tasks compared with abdominal tasks (p < 0.001). Within the pelvis, more errors were observed during dissection on the right side than the left (p = 0.03). Test-retest confirmed reliability (r = 0.97, p < 0.001). A significant correlation was observed between error frequency and mesorectal specimen quality (r s = 0.52, p = 0.02) and with blood loss (r s = 0.609, p = 0.004). OCHRA offers a valid and reliable method for evaluating technical performance of laparoscopic rectal surgery.
Reliability and Validity of a New Test of Agility and Skill for Female Amateur Soccer Players
Kutlu, Mehmet; Yapici, Hakan; Yilmaz, Abdullah
2017-01-01
Abstract The aim of this study was to evaluate the Agility and Skill Test, which had been recently developed to assess agility and skill in female athletes. Following a 10 min warm-up, two trials to test the reliability and validity of the test were conducted one week apart. Measurements were collected to compare soccer players’ physical performance in a 20 m sprint, a T-Drill test, the Illinois Agility Run Test, change-of-direction and acceleration, as well as agility and skill. All tests were completed following the same order. Thirty-four amateur female soccer players were recruited (age = 20.8 ± 1.9 years; body height = 166 ± 6.9 cm; body mass = 55.5 ± 5.8 kg). To determine the reliability and usefulness of these tests, paired sample t-tests, intra-class correlation coefficients, typical error, coefficient of variation, and differences between the typical error and smallest worthwhile change statistics were computed. Test results showed no significant differences between the two sessions (p > 0.01). There were higher intra-class correlations between the test and retest values (r = 0.94–0.99) for all tests. Typical error values were below the smallest worthwhile change, indicating ‘good’ usefulness for these tests. A near perfect Pearson correlation between the Agility and Skill Test (r = 0.98) was found, and there were moderate-to-large levels of correlation between the Agility and Skill Test and other measures (r = 0.37 to r = 0.56). The results of this study suggest that the Agility and Skill Test is a reliable and valid test for female soccer players and has significant value for assessing the integrative agility and skill capability of soccer players. PMID:28469760
Rosales, Roberto S; Martin-Hidalgo, Yolanda; Reboso-Morales, Luis; Atroshi, Isam
2016-03-03
The purpose of this study was to assess the reliability and construct validity of the Spanish version of the 6-item carpal tunnel syndrome (CTS) symptoms scale (CTS-6). In this cross-sectional study 40 patients diagnosed with CTS based on clinical and neurophysiologic criteria, completed the standard Spanish versions of the CTS-6 and the disabilities of the arm, shoulder and hand (QuickDASH) scales on two occasions with a 1-week interval. Internal-consistency reliability was assessed with the Cronbach alpha coefficient and test-retest reliability with the intraclass correlation coefficient, two way random effect model and absolute agreement definition (ICC2,1). Cross-sectional precision was analyzed with the Standard Error of the Measurement (SEM). Longitudinal precision for test-retest reliability coefficient was assessed with the Standard Error of the Measurement difference (SEMdiff) and the Minimal Detectable Change at 95 % confidence level (MDC95). For assessing construct validity it was hypothesized that the CTS-6 would have a strong positive correlation with the QuickDASH, analyzed with the Pearson correlation coefficient (r). The standard Spanish version of the CTS-6 presented a Cronbach alpha of 0.81 with a SEM of 0.3. Test-retest reliability showed an ICC of 0.85 with a SRMdiff of 0.36 and a MDC95 of 0.7. The correlation between CTS-6 and the QuickDASH was concordant with the a priori formulated construct hypothesis (r 0.69) CONCLUSIONS: The standard Spanish version of the 6-item CTS symptoms scale showed good internal consistency, test-retest reliability and construct validity for outcomes assessment in CTS. The CTS-6 will be useful to clinicians and researchers in Spanish speaking parts of the world. The use of standardized outcome measures across countries also will facilitate comparison of research results in carpal tunnel syndrome.
Measuring Fisher Information Accurately in Correlated Neural Populations
Kohn, Adam; Pouget, Alexandre
2015-01-01
Neural responses are known to be variable. In order to understand how this neural variability constrains behavioral performance, we need to be able to measure the reliability with which a sensory stimulus is encoded in a given population. However, such measures are challenging for two reasons: First, they must take into account noise correlations which can have a large influence on reliability. Second, they need to be as efficient as possible, since the number of trials available in a set of neural recording is usually limited by experimental constraints. Traditionally, cross-validated decoding has been used as a reliability measure, but it only provides a lower bound on reliability and underestimates reliability substantially in small datasets. We show that, if the number of trials per condition is larger than the number of neurons, there is an alternative, direct estimate of reliability which consistently leads to smaller errors and is much faster to compute. The superior performance of the direct estimator is evident both for simulated data and for neuronal population recordings from macaque primary visual cortex. Furthermore we propose generalizations of the direct estimator which measure changes in stimulus encoding across conditions and the impact of correlations on encoding and decoding, typically denoted by Ishuffle and Idiag respectively. PMID:26030735
Wallwork, Tracy L; Hides, Julie A; Stanton, Warren R
2007-10-01
Within-session intrarater and interrater reliability study. To establish the intrarater and interrater reliability of thickness measurements of the multifidus muscle in a parasagittal plane, conducted by an experienced ultrasound operator and a novice assessor. There is considerable evidence for the important role of the multifidus muscle in segmental stabilization of the lumbar spine. The cross-sectional area of the multifidus muscle has been assessed in healthy subjects and patients with low back pain using real-time ultrasound imaging. However, few studies have measured the thickness of the multifidus muscle using a parasagittal view. The thickness of the multifidus muscle was measured at rest, using real-time ultrasound imaging, in 10 subjects without a history of low back pain, at the levels of the L2-3 and L4-5 zygapophyseal joints. The measure was carried out 3 times at each level by 2 assessors (1 experienced, 1 novice). Intrarater (model 3) and interrater (model 2) reliability was assessed by calculation of an F statistic (analysis of variance), the intraclass correlation coefficient (ICC), and the standard error of measurement (SEM). On the basis of an average of 3 trials, the 2 operators showed very high interrater agreement on the measurement of thicknesses at the L2-3 level (ICC2,3 = 0.96; 95% CI: 0.84 to 0.99) and the L4-5 vertebral level (ICC2,3 = 0.97; 95% CI: 0.87 to 0.99), with no systematic differences in muscle size across operators (P > .05). Interrater reliability was relatively lower for the L2-3 level (ICC2,1 = 0.85; 95% CI: 0.51 to 0.96) than the L4-5 level (ICC2,1 = 0.87; 95% CI: 0.52 to 0.97) when a single trial per rater was used, but these values still indicated a high level of agreement. In addition, the novice and experienced operator produced reliable intrarater measurements at L2-3 (ICC3,1 = 0.89; 95% CI: 0.72 to 0.97 and 0.94; 95% CI: 0.86 to 0.99) and at L4-5 (ICC3,1 = 0.88; 95% CI: 0.68 to 0.97 and 0.95; 95% CI: 0.86 to 0.99), with no systematic differences in muscle size across trials (P > .05). The consistently low SEM values also indicate low measurement error. A novice and an experienced assessor were both able to reliably perform this measure at rest for 2 vertebral levels using real-time ultrasound imaging. An average of 3 trials produced higher interrater reliability scores, though using a single trial per rater was also reliable.
Reliability analysis of the objective structured clinical examination using generalizability theory.
Trejo-Mejía, Juan Andrés; Sánchez-Mendiola, Melchor; Méndez-Ramírez, Ignacio; Martínez-González, Adrián
2016-01-01
The objective structured clinical examination (OSCE) is a widely used method for assessing clinical competence in health sciences education. Studies using this method have shown evidence of validity and reliability. There are no published studies of OSCE reliability measurement with generalizability theory (G-theory) in Latin America. The aims of this study were to assess the reliability of an OSCE in medical students using G-theory and explore its usefulness for quality improvement. An observational cross-sectional study was conducted at National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City. A total of 278 fifth-year medical students were assessed with an 18-station OSCE in a summative end-of-career final examination. There were four exam versions. G-theory with a crossover random effects design was used to identify the main sources of variance. Examiners, standardized patients, and cases were considered as a single facet of analysis. The exam was applied to 278 medical students. The OSCE had a generalizability coefficient of 0.93. The major components of variance were stations, students, and residual error. The sites and the versions of the tests had minimum variance. Our study achieved a G coefficient similar to that found in other reports, which is acceptable for summative tests. G-theory allows the estimation of the magnitude of multiple sources of error and helps decision makers to determine the number of stations, test versions, and examiners needed to obtain reliable measurements.
Reliability analysis of the objective structured clinical examination using generalizability theory.
Trejo-Mejía, Juan Andrés; Sánchez-Mendiola, Melchor; Méndez-Ramírez, Ignacio; Martínez-González, Adrián
2016-01-01
Background The objective structured clinical examination (OSCE) is a widely used method for assessing clinical competence in health sciences education. Studies using this method have shown evidence of validity and reliability. There are no published studies of OSCE reliability measurement with generalizability theory (G-theory) in Latin America. The aims of this study were to assess the reliability of an OSCE in medical students using G-theory and explore its usefulness for quality improvement. Methods An observational cross-sectional study was conducted at National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City. A total of 278 fifth-year medical students were assessed with an 18-station OSCE in a summative end-of-career final examination. There were four exam versions. G-theory with a crossover random effects design was used to identify the main sources of variance. Examiners, standardized patients, and cases were considered as a single facet of analysis. Results The exam was applied to 278 medical students. The OSCE had a generalizability coefficient of 0.93. The major components of variance were stations, students, and residual error. The sites and the versions of the tests had minimum variance. Conclusions Our study achieved a G coefficient similar to that found in other reports, which is acceptable for summative tests. G-theory allows the estimation of the magnitude of multiple sources of error and helps decision makers to determine the number of stations, test versions, and examiners needed to obtain reliable measurements.
Traceability of On-Machine Tool Measurement: A Review.
Mutilba, Unai; Gomez-Acedo, Eneko; Kortaberria, Gorka; Olarra, Aitor; Yagüe-Fabra, Jose A
2017-07-11
Nowadays, errors during the manufacturing process of high value components are not acceptable in driving industries such as energy and transportation. Sectors such as aerospace, automotive, shipbuilding, nuclear power, large science facilities or wind power need complex and accurate components that demand close measurements and fast feedback into their manufacturing processes. New measuring technologies are already available in machine tools, including integrated touch probes and fast interface capabilities. They provide the possibility to measure the workpiece in-machine during or after its manufacture, maintaining the original setup of the workpiece and avoiding the manufacturing process from being interrupted to transport the workpiece to a measuring position. However, the traceability of the measurement process on a machine tool is not ensured yet and measurement data is still not fully reliable enough for process control or product validation. The scientific objective is to determine the uncertainty on a machine tool measurement and, therefore, convert it into a machine integrated traceable measuring process. For that purpose, an error budget should consider error sources such as the machine tools, components under measurement and the interactions between both of them. This paper reviews all those uncertainty sources, being mainly focused on those related to the machine tool, either on the process of geometric error assessment of the machine or on the technology employed to probe the measurand.
High test-retest-reliability of pain-related evoked potentials (PREP) in healthy subjects.
Özgül, Özüm Simal; Maier, Christoph; Enax-Krumova, Elena K; Vollert, Jan; Fischer, Marc; Tegenthoff, Martin; Höffken, Oliver
2017-04-24
Pain-related evoked potentials (PREP) is an established electrophysiological method to evaluate the signal transmission of electrically stimulated A-delta fibres. Although prerequisite for its clinical use, test-retest-reliability and side-to-side differences of bilateral stimulation in healthy subjects have not been examined yet. We performed PREP twice within 3-14days in 33 healthy subjects bilaterally by stimulating the dorsal hand. Detection (DT) and pain thresholds (PT) after electrical stimulation, the corresponding pain ratings, latencies of P0, N1, P1 and N2 components and the corresponding amplitudes were assessed. Impact of electrically induced pain intensity, age, sex, and arm length on PREP was analysed. MANOVA, t-Test, interclass correlation coefficient (ICC), standard error of measurement (SEM), smallest real difference (SRD), Bland-Altmann-Analysis as well as ANCOVA were used for statistical analysis. Measurement from both sides on both days resulted in mean N1-latencies from 142.39±18.12ms to 144.03±16.62ms and in mean N1P1-amplitudes from 39.04±12.26μV to 40.53±12.9μV. Analysis of a side-to-side effect showed for the N1-latency a F-value of 0.038 and for the N1P1-amplitude of 0.004 (p>0.8). We found intraclass correlation coefficients (ICC) from 0.88 to 0.93 and a standard error of measurement (SEM)<10% of mean values for all measurements concerning the N1-Latency and N1P1-amplitude. Intraclass correlation coefficients, standard error of measurement and Bland-Altman-Analyses revealed excellent test-retest-reliability for N1-latency and N1P1-amplitude without systematic error and there was no side-to-side effect on PREP. N1-latency (r=0.35, p<0.05) and N1P1-amplitude (r=-0.45, p<0.05) correlated with age and additionally N1-latency correlated with arm length (r=0.45, p<0.001). In contrast, pain intensity during the stimulation had no effect on both N1-latency and N1P1-amplitude. In summary, PREP showed high test-retest-reliability and negligible side-to-side differences concerning the commonly used parameters N1-latency and N1P1-amplitude. Copyright © 2017 Elsevier B.V. All rights reserved.
What Randomized Benchmarking Actually Measures
Proctor, Timothy; Rudinger, Kenneth; Young, Kevin; ...
2017-09-28
Randomized benchmarking (RB) is widely used to measure an error rate of a set of quantum gates, by performing random circuits that would do nothing if the gates were perfect. In the limit of no finite-sampling error, the exponential decay rate of the observable survival probabilities, versus circuit length, yields a single error metric r. For Clifford gates with arbitrary small errors described by process matrices, r was believed to reliably correspond to the mean, over all Clifford gates, of the average gate infidelity between the imperfect gates and their ideal counterparts. We show that this quantity is not amore » well-defined property of a physical gate set. It depends on the representations used for the imperfect and ideal gates, and the variant typically computed in the literature can differ from r by orders of magnitude. We present new theories of the RB decay that are accurate for all small errors describable by process matrices, and show that the RB decay curve is a simple exponential for all such errors. Here, these theories allow explicit computation of the error rate that RB measures (r), but as far as we can tell it does not correspond to the infidelity of a physically allowed (completely positive) representation of the imperfect gates.« less
Application of psychometric theory to the measurement of voice quality using rating scales.
Shrivastav, Rahul; Sapienza, Christine M; Nandur, Vuday
2005-04-01
Rating scales are commonly used to study voice quality. However, recent research has demonstrated that perceptual measures of voice quality obtained using rating scales suffer from poor interjudge agreement and reliability, especially in the mid-range of the scale. These findings, along with those obtained using multidimensional scaling (MDS), have been interpreted to show that listeners perceive voice quality in an idiosyncratic manner. Based on psychometric theory, the present research explored an alternative explanation for the poor interlistener agreement observed in previous research. This approach suggests that poor agreement between listeners may result, in part, from measurement errors related to a variety of factors rather than true differences in the perception of voice quality. In this study, 10 listeners rated breathiness for 27 vowel stimuli using a 5-point rating scale. Each stimulus was presented to the listeners 10 times in random order. Interlistener agreement and reliability were calculated from these ratings. Agreement and reliability were observed to improve when multiple ratings of each stimulus from each listener were averaged and when standardized scores were used instead of absolute ratings. The probability of exact agreement was found to be approximately .9 when using averaged ratings and standardized scores. In contrast, the probability of exact agreement was only .4 when a single rating from each listener was used to measure agreement. These findings support the hypothesis that poor agreement reported in past research partly arises from errors in measurement rather than individual differences in the perception of voice quality.
Improving lidar turbulence estimates for wind energy
NASA Astrophysics Data System (ADS)
Newman, J. F.; Clifton, A.; Churchfield, M. J.; Klein, P.
2016-09-01
Remote sensing devices (e.g., lidars) are quickly becoming a cost-effective and reliable alternative to meteorological towers for wind energy applications. Although lidars can measure mean wind speeds accurately, these devices measure different values of turbulence intensity (TI) than an instrument on a tower. In response to these issues, a lidar TI error reduction model was recently developed for commercially available lidars. The TI error model first applies physics-based corrections to the lidar measurements, then uses machine-learning techniques to further reduce errors in lidar TI estimates. The model was tested at two sites in the Southern Plains where vertically profiling lidars were collocated with meteorological towers. Results indicate that the model works well under stable conditions but cannot fully mitigate the effects of variance contamination under unstable conditions. To understand how variance contamination affects lidar TI estimates, a new set of equations was derived in previous work to characterize the actual variance measured by a lidar. Terms in these equations were quantified using a lidar simulator and modeled wind field, and the new equations were then implemented into the TI error model.
Improving Lidar Turbulence Estimates for Wind Energy: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Newman, Jennifer; Clifton, Andrew; Churchfield, Matthew
2016-10-01
Remote sensing devices (e.g., lidars) are quickly becoming a cost-effective and reliable alternative to meteorological towers for wind energy applications. Although lidars can measure mean wind speeds accurately, these devices measure different values of turbulence intensity (TI) than an instrument on a tower. In response to these issues, a lidar TI error reduction model was recently developed for commercially available lidars. The TI error model first applies physics-based corrections to the lidar measurements, then uses machine-learning techniques to further reduce errors in lidar TI estimates. The model was tested at two sites in the Southern Plains where vertically profiling lidarsmore » were collocated with meteorological towers. Results indicate that the model works well under stable conditions but cannot fully mitigate the effects of variance contamination under unstable conditions. To understand how variance contamination affects lidar TI estimates, a new set of equations was derived in previous work to characterize the actual variance measured by a lidar. Terms in these equations were quantified using a lidar simulator and modeled wind field, and the new equations were then implemented into the TI error model.« less
Improving Lidar Turbulence Estimates for Wind Energy
Newman, Jennifer F.; Clifton, Andrew; Churchfield, Matthew J.; ...
2016-10-03
Remote sensing devices (e.g., lidars) are quickly becoming a cost-effective and reliable alternative to meteorological towers for wind energy applications. Although lidars can measure mean wind speeds accurately, these devices measure different values of turbulence intensity (TI) than an instrument on a tower. In response to these issues, a lidar TI error reduction model was recently developed for commercially available lidars. The TI error model first applies physics-based corrections to the lidar measurements, then uses machine-learning techniques to further reduce errors in lidar TI estimates. The model was tested at two sites in the Southern Plains where vertically profiling lidarsmore » were collocated with meteorological towers. Results indicate that the model works well under stable conditions but cannot fully mitigate the effects of variance contamination under unstable conditions. To understand how variance contamination affects lidar TI estimates, a new set of equations was derived in previous work to characterize the actual variance measured by a lidar. Terms in these equations were quantified using a lidar simulator and modeled wind field, and the new equations were then implemented into the TI error model.« less
Big Data and Large Sample Size: A Cautionary Note on the Potential for Bias
Chambers, David A.; Glasgow, Russell E.
2014-01-01
Abstract A number of commentaries have suggested that large studies are more reliable than smaller studies and there is a growing interest in the analysis of “big data” that integrates information from many thousands of persons and/or different data sources. We consider a variety of biases that are likely in the era of big data, including sampling error, measurement error, multiple comparisons errors, aggregation error, and errors associated with the systematic exclusion of information. Using examples from epidemiology, health services research, studies on determinants of health, and clinical trials, we conclude that it is necessary to exercise greater caution to be sure that big sample size does not lead to big inferential errors. Despite the advantages of big studies, large sample size can magnify the bias associated with error resulting from sampling or study design. Clin Trans Sci 2014; Volume #: 1–5 PMID:25043853
The influence of phonological context on the sound errors of a speaker with Wernicke's aphasia.
Goldmann, R E; Schwartz, M F; Wilshire, C E
2001-09-01
A corpus of phonological errors produced in narrative speech by a Wernicke's aphasic speaker (R.W.B.) was tested for context effects using two new methods for establishing chance baselines. A reliable anticipatory effect was found using the second method, which estimated chance from the distance between phoneme repeats in the speech sample containing the errors. Relative to this baseline, error-source distances were shorter than expected for anticipations, but not perseverations. R.W.B.'s anticipation/perseveration ratio measured intermediate between a nonaphasic error corpus and that of a more severe aphasic speaker (both reported in Schwartz et al., 1994), supporting the view that the anticipatory bias correlates to severity. Finally, R.W.B's anticipations favored word-initial segments, although errors and sources did not consistently share word or syllable position. Copyright 2001 Academic Press.
An experiment in software reliability: Additional analyses using data from automated replications
NASA Technical Reports Server (NTRS)
Dunham, Janet R.; Lauterbach, Linda A.
1988-01-01
A study undertaken to collect software error data of laboratory quality for use in the development of credible methods for predicting the reliability of software used in life-critical applications is summarized. The software error data reported were acquired through automated repetitive run testing of three independent implementations of a launch interceptor condition module of a radar tracking problem. The results are based on 100 test applications to accumulate a sufficient sample size for error rate estimation. The data collected is used to confirm the results of two Boeing studies reported in NASA-CR-165836 Software Reliability: Repetitive Run Experimentation and Modeling, and NASA-CR-172378 Software Reliability: Additional Investigations into Modeling With Replicated Experiments, respectively. That is, the results confirm the log-linear pattern of software error rates and reject the hypothesis of equal error rates per individual fault. This rejection casts doubt on the assumption that the program's failure rate is a constant multiple of the number of residual bugs; an assumption which underlies some of the current models of software reliability. data raises new questions concerning the phenomenon of interacting faults.
Bois, Aaron J; Fening, Stephen D; Polster, Josh; Jones, Morgan H; Miniaci, Anthony
2012-11-01
Glenoid support is critical for stability of the glenohumeral joint. An accepted noninvasive method of quantifying glenoid bone loss does not exist. To perform independent evaluations of the reliability and accuracy of standard 2-dimensional (2-D) and 3-dimensional (3-D) computed tomography (CT) measurements of glenoid bone deficiency. Descriptive laboratory study. Two sawbone models were used; one served as a model for 2 anterior glenoid defects and the other for 2 anteroinferior defects. For each scapular model, predefect and defect data were collected for a total of 6 data sets. Each sample underwent 3-D laser scanning followed by CT scanning. Six physicians measured linear indicators of bone loss (defect length and width-to-length ratio) on both 2-D and 3-D CT and quantified bone loss using the glenoid index method on 2-D CT and using the glenoid index, ratio, and Pico methods on 3-D CT. The intraclass correlation coefficient (ICC) was used to assess agreement, and percentage error was used to compare radiographic and true measurements. With use of 2-D CT, the glenoid index and defect length measurements had the least percentage error (-4.13% and 7.68%, respectively); agreement was very good (ICC, .81) for defect length only. With use of 3-D CT, defect length (0.29%) and the Pico(1) method (4.93%) had the least percentage error. Agreement was very good for all linear indicators of bone loss (range, .85-.90) and for the ratio linear and Pico surface area methods used to quantify bone loss (range, .84-.98). Overall, 3-D CT results demonstrated better agreement and accuracy compared to 2-D CT. None of the methods assessed in this study using 2-D CT was found to be valid, and therefore, 2-D CT is not recommended for these methods. However, the length of glenoid defects can be reliably and accurately measured on 3-D CT. The Pico and ratio techniques are most reliable; however, the Pico(1) method accurately quantifies glenoid bone loss in both the anterior and anteroinferior locations. Future work is required to implement valid imaging techniques of glenoid bone loss into clinical practice. This is one of the only studies to date that has investigated both the reliability and accuracy of multiple indicators and quantification methods that evaluate glenoid bone loss in anterior glenohumeral instability. These data are critical to ensure valid methods are used for preoperative assessment and to determine when a glenoid bone augmentation procedure is indicated.
Wonnapinij, Passorn; Chinnery, Patrick F.; Samuels, David C.
2010-01-01
In cases of inherited pathogenic mitochondrial DNA (mtDNA) mutations, a mother and her offspring generally have large and seemingly random differences in the amount of mutated mtDNA that they carry. Comparisons of measured mtDNA mutation level variance values have become an important issue in determining the mechanisms that cause these large random shifts in mutation level. These variance measurements have been made with samples of quite modest size, which should be a source of concern because higher-order statistics, such as variance, are poorly estimated from small sample sizes. We have developed an analysis of the standard error of variance from a sample of size n, and we have defined error bars for variance measurements based on this standard error. We calculate variance error bars for several published sets of measurements of mtDNA mutation level variance and show how the addition of the error bars alters the interpretation of these experimental results. We compare variance measurements from human clinical data and from mouse models and show that the mutation level variance is clearly higher in the human data than it is in the mouse models at both the primary oocyte and offspring stages of inheritance. We discuss how the standard error of variance can be used in the design of experiments measuring mtDNA mutation level variance. Our results show that variance measurements based on fewer than 20 measurements are generally unreliable and ideally more than 50 measurements are required to reliably compare variances with less than a 2-fold difference. PMID:20362273
Reliability of the Dutch translation of the Kujala Patellofemoral Score Questionnaire.
Ummels, P E J; Lenssen, A F; Barendrecht, M; Beurskens, A J H M
2017-01-01
There are no Dutch language disease-specific questionnaires for patients with patellofemoral pain syndrome available that could help Dutch physiotherapists to assess and monitor these symptoms and functional limitations. The aim of this study was to translate the original disease-specific Kujala Patellofemoral Score into Dutch and evaluate its reliability. The questionnaire was translated from English into Dutch in accordance with internationally recommended guidelines. Reliability was determined in 50 stable subjects with an interval of 1 week. The patient inclusion criteria were age between 14 and 60 years; knowledge of the Dutch language; and the presence of at least three of the following symptoms: pain while taking the stairs, pain when squatting, pain when running, pain when cycling, pain when sitting with knees flexed for a prolonged period, grinding of the patella and a positive clinical patella test. The internal consistency, test-retest reliability, measurement error and limits of agreement were calculated. Internal consistency was 0.78 for the first assessment and 0.80 for the second assessment. The intraclass correlation coefficient (ICC agreement ) between the first and second assessments was 0.98. The mean difference between the first and second measurements was 0.64, and standard deviation was 5.51. The standard error measurement was 3.9, and the smallest detectable change was 11. The Bland and Altman plot shows that the limits of agreement are -10.37 and 11.65. The results of the present study indicated that the test-retest reliability translated Dutch version of the Kujala Patellofemoral Score questionnaire is equivalent of the test-retest original English language version and has good internal consistency. Trial registration NTR (TC = 3258). Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Establishing the reliability of rhesus macaque social network assessment from video observations
Feczko, Eric; Mitchell, Thomas A. J.; Walum, Hasse; Brooks, Jenna M.; Heitz, Thomas R.; Young, Larry J.; Parr, Lisa A.
2015-01-01
Understanding the properties of a social environment is important for understanding the dynamics of social relationships. Understanding such dynamics is relevant for multiple fields, ranging from animal behaviour to social and cognitive neuroscience. To quantify social environment properties, recent studies have incorporated social network analysis. Social network analysis quantifies both the global and local properties of a social environment, such as social network efficiency and the roles played by specific individuals, respectively. Despite the plethora of studies incorporating social network analysis, methods to determine the amount of data necessary to derive reliable social networks are still being developed. Determining the amount of data necessary for a reliable network is critical for measuring changes in the social environment, for example following an experimental manipulation, and therefore may be critical for using social network analysis to statistically assess social behaviour. In this paper, we extend methods for measuring error in acquired data and for determining the amount of data necessary to generate reliable social networks. We derived social networks from a group of 10 male rhesus macaques, Macaca mulatta, for three behaviours: spatial proximity, grooming and mounting. Behaviours were coded using a video observation technique, where video cameras recorded the compound where the 10 macaques resided. We collected, coded and used 10 h of video data to construct these networks. Using the methods described here, we found in our data that 1 h of spatial proximity observations produced reliable social networks. However, this may not be true for other studies due to differences in data acquisition. Our results have broad implications for measuring and predicting the amount of error in any social network, regardless of species. PMID:26392632
Ultrasound measures of tendon thickness: Intra-rater, Inter-rater and Inter-machine reliability.
Del Baño-Aledo, María Elena; Martínez-Payá, Jacinto Javier; Ríos-Díaz, José; Mejías-Suárez, Silvia; Serrano-Carmona, Sergio; de Groot-Ferrando, Ana
2017-01-01
Ultrasound imaging is often used by physiotherapists and other healthcare professionals but the reliability of image acquisition with different ultrasound machines is unknown. The objective was to compare the intra-rater, inter-rater and intermachine reliability of thickness measurements of the plantar fascia (PF), Achilles tendon (AT), patellar tendon (PT) and elbow common extensor tendon (ECET) with musculoskeletal ultrasound imaging (MSUS). Tendon thickness was measured in four anatomical structures (14 participants, 28 images per tendon) by two sonographers and with two different ultrasound machines. Intraclass Correlation Coefficients (ICCs) and Bland-Altman plots were calculated. The standard error of measurement (SEM) and minimum detectable difference (MDD) were calculated. Inter-rater reliability was excellent for AT (ICC=0.98; 95% CI= 0.96-0.99) and very good for PT (ICC=0.85; 95% CI = 0.67-0.93) and ECET (ICC=0.81; 95% CI= 0.72-0.94). Reliability for PF was moderate, with an ICC of 0.63 (CI 95%= 0.20-0.83). Bland-Altman plot for inter-machine reliability showed a mean difference of 1 m for PF measurements and a mean difference of 4 m and 20 m for AT and PT. The relative SEMs were below 7% and the MDCs were below 0.7 mm. The MSUS reliability in measuring thickness of the four tendons is confirmed by the homogeneous readings intra sonographers, between operators and between different machines. Level of evidence: Tendon thickness can be measured reliably on different ultrasound devices, which is an important step forward in the use of this technique in daily clinical practice and research. III.
The multidriver: A reliable multicast service using the Xpress Transfer Protocol
NASA Technical Reports Server (NTRS)
Dempsey, Bert J.; Fenton, John C.; Weaver, Alfred C.
1990-01-01
A reliable multicast facility extends traditional point-to-point virtual circuit reliability to one-to-many communication. Such services can provide more efficient use of network resources, a powerful distributed name binding capability, and reduced latency in multidestination message delivery. These benefits will be especially valuable in real-time environments where reliable multicast can enable new applications and increase the availability and the reliability of data and services. We present a unique multicast service that exploits features in the next-generation, real-time transfer layer protocol, the Xpress Transfer Protocol (XTP). In its reliable mode, the service offers error, flow, and rate-controlled multidestination delivery of arbitrary-sized messages, with provision for the coordination of reliable reverse channels. Performance measurements on a single-segment Proteon ProNET-4 4 Mbps 802.5 token ring with heterogeneous nodes are discussed.
Neijenhuijs, Koen I; Jansen, Femke; Aaronson, Neil K; Brédart, Anne; Groenvold, Mogens; Holzner, Bernhard; Terwee, Caroline B; Cuijpers, Pim; Verdonck-de Leeuw, Irma M
2018-05-07
The EORTC IN-PATSAT32 is a patient-reported outcome measure (PROM) to assess cancer patients' satisfaction with in-patient health care. The aim of this study was to investigate whether the initial good measurement properties of the IN-PATSAT32 are confirmed in new studies. Within the scope of a larger systematic review study (Prospero ID 42017057237), a systematic search was performed of Embase, Medline, PsycINFO, and Web of Science for studies that investigated measurement properties of the IN-PATSAT32 up to July 2017. Study quality was assessed, data were extracted, and synthesized according to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) methodology. Nine studies were included in this review. The evidence on reliability and construct validity were rated as sufficient and of the quality of the evidence as moderate. The evidence on structural validity was rated as insufficient and of low quality. The evidence on internal consistency was indeterminate. Measurement error, responsiveness, criterion validity, and cross-cultural validity were not reported in the included studies. Measurement error could be calculated for two studies and was judged indeterminate. In summary, the IN-PATSAT32 performs as expected with respect to reliability and construct validity. No firm conclusions can be made yet whether the IN-PATSAT32 also performs as well with respect to structural validity and internal consistency. Further research on these measurement properties of the PROM is therefore needed as well as on measurement error, responsiveness, criterion validity, and cross-cultural validity. For future studies, it is recommended to take the COSMIN methodology into account.
Catching errors with patient-specific pretreatment machine log file analysis.
Rangaraj, Dharanipathy; Zhu, Mingyao; Yang, Deshan; Palaniswaamy, Geethpriya; Yaddanapudi, Sridhar; Wooten, Omar H; Brame, Scott; Mutic, Sasa
2013-01-01
A robust, efficient, and reliable quality assurance (QA) process is highly desired for modern external beam radiation therapy treatments. Here, we report the results of a semiautomatic, pretreatment, patient-specific QA process based on dynamic machine log file analysis clinically implemented for intensity modulated radiation therapy (IMRT) treatments delivered by high energy linear accelerators (Varian 2100/2300 EX, Trilogy, iX-D, Varian Medical Systems Inc, Palo Alto, CA). The multileaf collimator machine (MLC) log files are called Dynalog by Varian. Using an in-house developed computer program called "Dynalog QA," we automatically compare the beam delivery parameters in the log files that are generated during pretreatment point dose verification measurements, with the treatment plan to determine any discrepancies in IMRT deliveries. Fluence maps are constructed and compared between the delivered and planned beams. Since clinical introduction in June 2009, 912 machine log file analyses QA were performed by the end of 2010. Among these, 14 errors causing dosimetric deviation were detected and required further investigation and intervention. These errors were the result of human operating mistakes, flawed treatment planning, and data modification during plan file transfer. Minor errors were also reported in 174 other log file analyses, some of which stemmed from false positives and unreliable results; the origins of these are discussed herein. It has been demonstrated that the machine log file analysis is a robust, efficient, and reliable QA process capable of detecting errors originating from human mistakes, flawed planning, and data transfer problems. The possibility of detecting these errors is low using point and planar dosimetric measurements. Copyright © 2013 American Society for Radiation Oncology. Published by Elsevier Inc. All rights reserved.
Standards and reliability in evaluation: when rules of thumb don't apply.
Norcini, J J
1999-10-01
The purpose of this paper is to identify situations in which two rules of thumb in evaluation do not apply. The first rule is that all standards should be absolute. When selection decisions are being made or when classroom tests are given, however, relative standards may be better. The second rule of thumb is that every test should have a reliability of .80 or better. Depending on the circumstances, though, the standard error of measurement, the consistency of pass/fail classifications, and the domain-referenced reliability coefficients may be better indicators of reproducibility.
Feeney, Joanne; Savva, George M; O'Regan, Claire; King-Kallimanis, Bellinda; Cronin, Hilary; Kenny, Rose Anne
2016-05-31
Knowing the reliability of cognitive tests, particularly those commonly used in clinical practice, is important in order to interpret the clinical significance of a change in performance or a low score on a single test. To report the intra-class correlation (ICC), standard error of measurement (SEM) and minimum detectable change (MDC) for the Mini-Mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA), and Color Trails Test (CTT) among community dwelling older adults. 130 participants aged 55 and older without severe cognitive impairment underwent two cognitive assessments between two and four months apart. Half the group changed rater between assessments and half changed time of day. Mean (standard deviation) MMSE was 28.1 (2.1) at baseline and 28.4 (2.1) at repeat. Mean (SD) MoCA increased from 24.8 (3.6) to 25.2 (3.6). There was a rater effect on CTT, but not on the MMSE or MoCA. The SEM of the MMSE was 1.0, leading to an MDC (based on a 95% confidence interval) of 3 points. The SEM of the MoCA was 1.5, implying an MDC95 of 4 points. MoCA (ICC = 0.81) was more reliable than MMSE (ICC = 0.75), but all tests examined showed substantial within-patient variation. An individual's score would have to change by greater than or equal to 3 points on the MMSE and 4 points on the MoCA for the rater to be confident that the change was not due to measurement error. This has important implications for epidemiologists and clinicians in dementia screening and diagnosis.
Lewis, Matthew S; Maruff, Paul; Silbert, Brendan S; Evered, Lis A; Scott, David A
2007-02-01
The reliable change index (RCI) expresses change relative to its associated error, and is useful in the identification of postoperative cognitive dysfunction (POCD). This paper examines four common RCIs that each account for error in different ways. Three rules incorporate a constant correction for practice effects and are contrasted with the standard RCI that had no correction for practice. These rules are applied to 160 patients undergoing coronary artery bypass graft (CABG) surgery who completed neuropsychological assessments preoperatively and 1 week postoperatively using error and reliability data from a comparable healthy nonsurgical control group. The rules all identify POCD in a similar proportion of patients, but the use of the within-subject standard deviation (WSD), expressing the effects of random error, as an error estimate is a theoretically appropriate denominator when a constant error correction, removing the effects of systematic error, is deducted from the numerator in a RCI.
Reliability of cervical lordosis measurement techniques on long-cassette radiographs.
Janusz, Piotr; Tyrakowski, Marcin; Yu, Hailong; Siemionow, Kris
2016-11-01
Lateral radiographs are commonly used to assess cervical sagittal alignment. Three assessment methods have been described and are commonly utilized in clinical practice. These methods are described for perfect lateral cervical radiographs, however in everyday practice radiograph quality varies. The aim of this study was to compare the reliability and reproducibility of 3 cervical lordosis (CL) measurement methods. Forty-four standing lateral radiographs were randomly chosen from a lateral long-cassette radiograph database. Measurements of CL were performed with: Cobb method C2-C7 (CM), C2-C7 posterior tangent method (PTM), sum of posterior tangent method for each segment (SPTM). Three independent orthopaedic surgeons measured CL using the three methods on 44 lateral radiographs. One researcher used the three methods to measured CL three times at 4-week time intervals. Agreement between the methods as well as their intra- and interobserver reliability were tested and quantified by intraclass correlation coefficient (ICC) and median error for a single measurement (SEM). ICC of 0.75 or more reflected an excellent agreement/reliability. The results were compared with repeated ANOVA test, with p < 0.05 considered as significant. All methods revealed excellent intra- and interobserver reliability. Agreement (ICC, SEM) between three methods was (0.89°, 3.44°), between CM and SPTM was (0.82°, 4.42°), between CM and PTM was (0.80°, 4.80°) and between PTM and SPTM was (0.99°, 1.10°). Mean values CL for a CM, PTM, SPTM were 10.5° ± 13.9°, 17.5° ± 15.6° and 17.7° ± 15.9° (p < 0.0001), respectively. The significant difference was between CM vs PTM (p < 0.0001) and CM vs SPTM (p < 0.0001), but not between PTM vs SPTM (p > 0.05). All three methods appeared to be highly reliable. Although, high agreement between all measurement methods was shown, we do not recommend using Cobb measurement method interchangeably with PTM or SPTM within a single study as this could lead to error, whereas, such a comparison between tangent methods can be considered.
Awareness of deficits and error processing after traumatic brain injury.
Larson, Michael J; Perlstein, William M
2009-10-28
Severe traumatic brain injury is frequently associated with alterations in performance monitoring, including reduced awareness of physical and cognitive deficits. We examined the relationship between awareness of deficits and electrophysiological indices of performance monitoring, including the error-related negativity and posterror positivity (Pe) components of the scalp-recorded event-related potential, in 16 traumatic brain injury survivors who completed a Stroop color-naming task while event-related potential measurements were recorded. Awareness of deficits was measured as the discrepancy between patient and significant-other ratings on the Frontal Systems Behavior Scale. The amplitude of the Pe, but not error-related negativity, was reliably associated with decreased awareness of deficits. Results indicate that Pe amplitude may serve as an electrophysiological indicator of awareness of abilities and deficits.
Predicting Software Assurance Using Quality and Reliability Measures
2014-12-01
errors are not found in unit testing . The rework effort to correct requirement and design problems in later phases can be as high as 300 to 1,000...Literature 31 Appendix B: Quality Cannot Be Tested In 35 Bibliography 38 CMU/SEI-2014-TN-026 | ii CMU/SEI-2014-TN-026 | iii List of Figures...Removal Densities During Development 10 Figure 8: Quality and Security-Focused Workflow 14 Figure 9: Testing Reliability Results for the Largest Project
Baur, Heiner; Groppa, Alessia Severina; Limacher, Regula; Radlinger, Lorenz
2016-02-02
Maximum strength and rate of force development (RFD) are 2 important strength characteristics for everyday tasks and athletic performance. Measurements of both parameters must be reliable. Expensive isokinetic devices with isometric modes are often used. The possibility of cost-effective measurements in a practical setting would facilitate quality control. The purpose of this study was to assess the reliability of measurements of maximum isometric strength (Fmax) and RFD on a conventional leg press. Sixteen subjects (23 ± 2 y, 1.68 ± 0.05 m, 59 ± 5 kg) were tested twice within 1 session. After warm-up, subjects performed 2 times 5 trials eliciting maximum voluntary isometric contractions on an instrumented leg press (1- and 2-legged randomized). Fmax (N) and RFD (N/s) were extracted from force-time curves. Reliability was determined for Fmax and RFD by calculating the intraclass correlation coefficient (ICC), the test-retest variability (TRV), and the bias and limits of agreement. Reliability measures revealed good to excellent ICCs of .80-.93. TRV showed mean differences between measurement sessions of 0.4-6.9%. The systematic error was low compared with the absolute mean values (Fmax 5-6%, RFD 1-4%). The implementation of a force transducer into a conventional leg press provides a viable procedure to assess Fmax and RFD. Both performance parameters can be assessed with good to excellent reliability allowing quality control of interventions.
A Study on the Reliability of Sasang Constitutional Body Trunk Measurement
Jang, Eunsu; Kim, Jong Yeol; Lee, Haejung; Kim, Honggie; Baek, Younghwa; Lee, Siwoo
2012-01-01
Objective. Body trunk measurement for human plays an important diagnostic role not only in conventional medicine but also in Sasang constitutional medicine (SCM). The Sasang constitutional body trunk measurement (SCBTM) consists of the 5-widths and the 8-circumferences which are standard locations currently employed in the SCM society. This study suggests to what extent a comprehensive training can improve the reliability of the SCBTM. Methods. We recruited 10 male subjects and 5 male observers with no experience of anthropometric measurement. We conducted measurements twice before and after a comprehensive training. Relative technical error of measurement (%TEMs) was produced to assess intra and inter observer reliabilities. Results. Post-training intra-observer %TEMs of the SCBTM were 0.27% to 1.85% reduced from 0.27% to 6.26% in pre-training, respectively. Post-training inter-observer %TEMs of those were 0.56% to 1.66% reduced from 1.00% to 9.60% in pre-training, respectively. Post-training % total TEMs which represent the whole reliability were 0.68% to 2.18% reduced from maximum value of 10.18%. Conclusion. A comprehensive training makes the SCBTM more reliable, hence giving a sufficiently confident diagnostic tool. It is strongly recommended to give a comprehensive training in advance to take the SCBTM. PMID:21822442
NASA Astrophysics Data System (ADS)
Fernandez, Alvaro; Müller, Inigo A.; Rodríguez-Sanz, Laura; van Dijk, Joep; Looser, Nathan; Bernasconi, Stefano M.
2017-12-01
Carbonate clumped isotopes offer a potentially transformational tool to interpret Earth's history, but the proxy is still limited by poor interlaboratory reproducibility. Here, we focus on the uncertainties that result from the analysis of only a few replicate measurements to understand the extent to which unconstrained errors affect calibration relationships and paleoclimate reconstructions. We find that highly precise data can be routinely obtained with multiple replicate analyses, but this is not always done in many laboratories. For instance, using published estimates of external reproducibilities we find that typical clumped isotope measurements (three replicate analyses) have margins of error at the 95% confidence level (CL) that are too large for many applications. These errors, however, can be systematically reduced with more replicate measurements. Second, using a Monte Carlo-type simulation we demonstrate that the degree of disagreement on published calibration slopes is about what we should expect considering the precision of Δ47 data, the number of samples and replicate analyses, and the temperature range covered in published calibrations. Finally, we show that the way errors are typically reported in clumped isotope data can be problematic and lead to the impression that data are more precise than warranted. We recommend that uncertainties in Δ47 data should no longer be reported as the standard error of a few replicate measurements. Instead, uncertainties should be reported as margins of error at a specified confidence level (e.g., 68% or 95% CL). These error bars are a more realistic indication of the reliability of a measurement.
Reliability of Phase Velocity Measurements of Flexural Acoustic Waves in the Human Tibia In-Vivo.
Vogl, Florian; Schnüriger, Karin; Gerber, Hans; Taylor, William R
2016-01-01
Axial-transmission acoustics have shown to be a promising technique to measure individual bone properties and detect bone pathologies. With the ultimate goal being the in-vivo application of such systems, quantification of the key aspects governing the reliability is crucial to bring this method towards clinical use. This work presents a systematic reliability study quantifying the sources of variability and their magnitudes of in-vivo measurements using axial-transmission acoustics. 42 healthy subjects were measured by an experienced operator twice per week, over a four-month period, resulting in over 150000 wave measurements. In a complementary study to assess the influence of different operators performing the measurements, 10 novice operators were trained, and each measured 5 subjects on a single occasion, using the same measurement protocol as in the first part of the study. The estimated standard error for the measurement protocol used to collect the study data was ∼ 17 m/s (∼ 4% of the grand mean) and the index of dependability, as a measure of reliability, was Φ = 0.81. It was shown that the method is suitable for multi-operator use and that the reliability can be improved efficiently by additional measurements with device repositioning, while additional measurements without repositioning cannot improve the reliability substantially. Phase velocity values were found to be significantly higher in males than in females (p < 10-5) and an intra-class correlation coefficient of r = 0.70 was found between the legs of each subject. The high reliability of this non-invasive approach and its intrinsic sensitivity to mechanical properties opens perspectives for the rapid and inexpensive clinical assessment of bone pathologies, as well as for monitoring programmes without any radiation exposure for the patient.
Real-time line-width measurements: a new feature for reticle inspection systems
NASA Astrophysics Data System (ADS)
Eran, Yair; Greenberg, Gad; Joseph, Amnon; Lustig, Cornel; Mizrahi, Eyal
1997-07-01
The significance of line width control in mask production has become greater with the lessening of defect size. There are two conventional methods used for controlling line widths dimensions which employed in the manufacturing of masks for sub micron devices. These two methods are the critical dimensions (CD) measurement and the detection of edge defects. Achieving reliable and accurate control of line width errors is one of the most challenging tasks in mask production. Neither of the two methods cited above (namely CD measurement and the detection of edge defects) guarantees the detection of line width errors with good sensitivity over the whole mask area. This stems from the fact that CD measurement provides only statistical data on the mask features whereas applying edge defect detection method checks defects on each edge by itself, and does not supply information on the combined result of error detection on two adjacent edges. For example, a combination of a small edge defect together with a CD non- uniformity which are both within the allowed tolerance, may yield a significant line width error, which will not be detected using the conventional methods (see figure 1). A new approach for the detection of line width errors which overcomes this difficulty is presented. Based on this approach, a new sensitive line width error detector was developed and added to Orbot's RT-8000 die-to-database reticle inspection system. This innovative detector operates continuously during the mask inspection process and scans (inspects) the entire area of the reticle for line width errors. The detection is based on a comparison of measured line width that are taken on both the design database and the scanned image of the reticle. In section 2, the motivation for developing this new detector is presented. The section covers an analysis of various defect types, which are difficult to detect using conventional edge detection methods or, alternatively, CD measurements. In section 3, the basic concept of the new approach is introduced together with a description of the new detector and its characteristics. In section 4, the calibration process that took place in order to achieve reliable and repeatable line width measurements is presented. The description of an experiments conducted in order to evaluate the sensitivity of the new detector is given in section 5, followed by a report of the results of this evaluation. The conclusions are presented in section 6.
NASA Astrophysics Data System (ADS)
Demidov, V. I.; Koepke, M. E.; Kurlyandskaya, I. P.; Malkov, M. A.
2018-02-01
This paper reviews existing theories for interpreting probe measurements of electron distribution functions (EDF) at high gas pressure when collisions of electrons with atoms and/or molecules near the probe are pervasive. An explanation of whether or not the measurements are realizable and reliable, an enumeration of the most common sources of measurement error, and an outline of proper probe-experiment design elements that inherently limit or avoid error is presented. Additionally, we describe recent expanded plasma-condition compatibility for EDF measurement, including in applications of large wall probe plasma diagnostics. This summary of the authors’ experiences gained over decades of practicing and developing probe diagnostics is intended to inform, guide, suggest, and detail the advantages and disadvantages of probe application in plasma research.
Does Field Reliability for Static-99 Scores Decrease as Scores Increase?
Rice, Amanda K.; Boccaccini, Marcus T.; Harris, Paige B.; Hawes, Samuel W.
2015-01-01
This study examined the field reliability of Static-99 (Hanson & Thornton, 2000) scores among 21,983 sex offenders and focused on whether rater agreement decreased as scores increased. As expected, agreement was lowest for high-scoring offenders. Initial and most recent Static-99 scores were identical for only about 40% of offenders who had been assigned a score of 6 during their initial evaluations, but for more than 60% of offenders who had been assigned a score of 2 or lower. In addition, the size of the difference between scores increased as scores increased, with pairs of scores differing by 2 or more points for about 30% of offenders scoring in the high-risk range. Because evaluators and systems use high Static-99 scores to identify sexual offenders who may require intensive supervision or even postrelease civil commitment, it is important to recognize that there may be more measurement error for high scores than low scores and to consider adopting procedures for minimizing or accounting for measurement error. PMID:24932647
Kwon, Heon-Ju; Kim, Bohyun; Kim, So Yeon; Lee, Chul Seung; Lee, Jeongjin; Song, Gi Won; Lee, Sung Gyu
2018-01-01
Background/Aims Computed tomography (CT) hepatic volumetry is currently accepted as the most reliable method for preoperative estimation of graft weight in living donor liver transplantation (LDLT). However, several factors can cause inaccuracies in CT volumetry compared to real graft weight. The purpose of this study was to determine the frequency and degree of resection plane-dependent error in CT volumetry of the right hepatic lobe in LDLT. Methods Forty-six living liver donors underwent CT before donor surgery and on postoperative day 7. Prospective CT volumetry (VP) was measured via the assumptive hepatectomy plane. Retrospective liver volume (VR) was measured using the actual plane by comparing preoperative and postoperative CT. Compared with intraoperatively measured weight (W), errors in percentage (%) VP and VR were evaluated. Plane-dependent error in VP was defined as the absolute difference between VP and VR. % plane-dependent error was defined as follows: |VP–VR|/W∙100. Results Mean VP, VR, and W were 761.9 mL, 755.0 mL, and 696.9 g. Mean and % errors in VP were 73.3 mL and 10.7%. Mean error and % error in VR were 64.4 mL and 9.3%. Mean plane-dependent error in VP was 32.4 mL. Mean % plane-dependent error was 4.7%. Plane-dependent error in VP exceeded 10% of W in approximately 10% of the subjects in our study. Conclusions There was approximately 5% plane-dependent error in liver VP on CT volumetry. Plane-dependent error in VP exceeded 10% of W in approximately 10% of LDLT donors in our study. This error should be considered, especially when CT volumetry is performed by a less experienced operator who is not well acquainted with the donor hepatectomy plane. PMID:28759989
Validity and Reliability of 2 Goniometric Mobile Apps: Device, Application, and Examiner Factors.
Wellmon, Robert H; Gulick, Dawn T; Paterson, Mark L; Gulick, Colleen N
2016-12-01
Smartphones are being used in a variety of practice settings to measure joint range of motion (ROM). A number of factors can affect the validity of the measurements generated. However, there are no studies examining smartphone-based goniometer applications focusing on measurement variability and error arising from the electromechanical properties of the device being used. To examine the concurrent validity and interrater reliability of 2 goniometric mobile applications (Goniometer Records, Goniometer Pro), an inclinometer, and a universal goniometer (UG). Nonexperimental, descriptive validation study. University laboratory. 3 physical therapists having an average of 25 y of experience. Three standardized angles (acute, right, obtuse) were constructed to replicate the movement of a hinge joint in the human body. Angular changes were measured and compared across 3 raters who used 3 different devices (UG, inclinometer, and 2 goniometric apps installed on 3 different smartphones: Apple iPhone 5, LG Android, and Samsung SIII Android). Intraclass correlation coefficients (ICCs) and Bland-Altman plots were used to examine interrater reliability and concurrent validity. Interrater reliability for each of the smartphone apps, inclinometer and UG were excellent (ICC = .995-1.000). Concurrent validity was also good (ICC = .998-.999). Based on the Bland-Altman plots, the means of the differences between the devices were low (range = -0.4° to 1.2°). This study identifies the error inherent in measurement that is independent of patient factors and due to the smartphone, the installed apps, and examiner skill. Less than 2° of measurement variability was attributable to those factors alone. The data suggest that 3 smartphones with the 2 installed apps are a viable substitute for using a UG or an inclinometer when measuring angular changes that typically occur when examining ROM and demonstrate the capacity of multiple examiners to accurately use smartphone-based goniometers.
Evaluation of the 3dMDface system as a tool for soft tissue analysis.
Hong, C; Choi, K; Kachroo, Y; Kwon, T; Nguyen, A; McComb, R; Moon, W
2017-06-01
To evaluate the accuracy of three-dimensional stereophotogrammetry by comparing values obtained from direct anthropometry and the 3dMDface system. To achieve a more comprehensive evaluation of the reliability of 3dMD, both linear and surface measurements were examined. UCLA Section of Orthodontics. Mannequin head as model for anthropometric measurements. Image acquisition and analysis were carried out on a mannequin head using 16 anthropometric landmarks and 21 measured parameters for linear and surface distances. 3D images using 3dMDface system were made at 0, 1 and 24 hours; 1, 2, 3 and 4 weeks. Error magnitude statistics used include mean absolute difference, standard deviation of error, relative error magnitude and root mean square error. Intra-observer agreement for all measurements was attained. Overall mean errors were lower than 1.00 mm for both linear and surface parameter measurements, except in 5 of the 21 measurements. The three longest parameter distances showed increased variation compared to shorter distances. No systematic errors were observed for all performed paired t tests (P<.05). Agreement values between two observers ranged from 0.91 to 0.99. Measurements on a mannequin confirmed the accuracy of all landmarks and parameters analysed in this study using the 3dMDface system. Results indicated that 3dMDface system is an accurate tool for linear and surface measurements, with potentially broad-reaching applications in orthodontics, surgical treatment planning and treatment evaluation. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Gray, Vicki L; Ivanova, Tanya D; Garland, S Jayne
2014-01-01
Knowing the reliability of the center of pressure (COP) is important for interpreting balance deficits post-stroke, especially when the balance deficits can necessitate the use of short duration trials. The novel aspect of this reliability study was to examine the center of pressure measures using two adjacent force platforms between and within sessions in stroke and controls. After stroke, it is important to understand the contribution of the paretic and non-paretic leg to the motor control of standing balance. Because there is a considerable body of knowledge on COP reliability on a single platform, we chose to examine reliability using two adjacent platforms which has not been examined previously in stroke. Twenty participants post-stroke and 22 controls performed an arm raise, load drop and quiet stance balance task while standing on two adjacent force platforms, on two separate days. Intraclass correlations coefficient (ICC2,1) and percentage standard error of measurement (SEM%) were calculated for COP velocity, ellipse area, anterior-posterior (AP) displacement, and medial-lateral (ML) displacement. Between sessions, COP velocity was the most reliable with high ICCs and low SEM% across groups and tasks and ellipse area was less reliable with low ICCs across groups and tasks. COP measures were less reliable during the arm raise than load drop post-stroke. Within session reliability was high for COP velocity and ML displacement requiring no more than six trials across tasks. The COP velocity was the most reliable measure with high ICCs between sessions and the high reliability was achieved with fewer trials in both groups in a single session. Copyright © 2014 Elsevier B.V. All rights reserved.
The acceleration dependent validity and reliability of 10 Hz GPS.
Akenhead, Richard; French, Duncan; Thompson, Kevin G; Hayes, Philip R
2014-09-01
To examine the validity and inter-unit reliability of 10 Hz GPS for measuring instantaneous velocity during maximal accelerations. Experimental. Two 10 Hz GPS devices secured to a sliding platform mounted on a custom built monorail were towed whilst sprinting maximally over 10 m. Displacement of GPS devices was measured using a laser sampling at 2000 Hz, from which velocity and mean acceleration were derived. Velocity data was pooled into acceleration thresholds according to mean acceleration. Agreement between laser and GPS measures of instantaneous velocity within each acceleration threshold was examined using least squares linear regression and Bland-Altman limits of agreement (LOA). Inter-unit reliability was expressed as typical error (TE) and a Pearson correlation coefficient. Mean bias ± 95% LOA during accelerations of 0-0.99 ms(-2) was 0.12 ± 0.27 ms(-1), decreasing to -0.40 ± 0.67 ms(-1) during accelerations >4 ms(-2). Standard error of the estimate ± 95% CI (SEE) increased from 0.12 ± 0.02 ms(-1) during accelerations of 0-0.99 ms(-2) to 0.32 ± 0.06 ms(-1) during accelerations >4 ms(-2). TE increased from 0.05 ± 0.01 to 0.12 ± 0.01 ms(-1) during accelerations of 0-0.99 ms(-2) and >4 ms(-2) respectively. The validity and reliability of 10 Hz GPS for the measurement of instantaneous velocity has been shown to be inversely related to acceleration. Those using 10 Hz GPS should be aware that during accelerations of over 4 ms(-2), accuracy is compromised. Copyright © 2013 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Flosadottir, Vala; Roos, Ewa M; Ageberg, Eva
2017-09-01
The Activity Rating Scale (ARS) for disorders of the knee evaluates the level of activity by the frequency of participation in 4 separate activities with high demands on knee function, with a score ranging from 0 (none) to 16 (pivoting activities 4 times/wk). To translate and cross-culturally adapt the ARS into Swedish and to assess measurement properties of the Swedish version of the ARS. Cohort study (diagnosis); Level of evidence, 2. The COSMIN guidelines were followed. Participants (N = 100 [55 women]; mean age, 27 years) who were undergoing rehabilitation for a knee injury completed the ARS twice for test-retest reliability. The Knee injury and Osteoarthritis Outcome Score (KOOS), Tegner Activity Scale (TAS), and modernized Saltin-Grimby Physical Activity Level Scale (SGPALS) were administered at baseline to validate the ARS. Construct validity and responsiveness of the ARS were evaluated by testing predefined hypotheses regarding correlations between the ARS, KOOS, TAS, and SGPALS. The Cronbach alpha, intraclass correlation coefficients, absolute reliability, standard error of measurement, smallest detectable change, and Spearman rank-order correlation coefficients were calculated. The ARS showed good internal consistency (α ≈ 0.96), good test-retest reliability (intraclass correlation coefficient >0.9), and no systematic bias between measurements. The standard error of measurement was less than 2 points, and the smallest detectable change was less than 1 point at the group level and less than 5 points at the individual level. More than 75% of the hypotheses were confirmed, indicating good construct validity and good responsiveness of the ARS. The Swedish version of the ARS is valid, reliable, and responsive for evaluating the level of activity based on the frequency of participation in high-demand knee sports activities in young adults with a knee injury.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Proctor, Timothy; Rudinger, Kenneth; Young, Kevin
Randomized benchmarking (RB) is widely used to measure an error rate of a set of quantum gates, by performing random circuits that would do nothing if the gates were perfect. In the limit of no finite-sampling error, the exponential decay rate of the observable survival probabilities, versus circuit length, yields a single error metric r. For Clifford gates with arbitrary small errors described by process matrices, r was believed to reliably correspond to the mean, over all Clifford gates, of the average gate infidelity between the imperfect gates and their ideal counterparts. We show that this quantity is not amore » well-defined property of a physical gate set. It depends on the representations used for the imperfect and ideal gates, and the variant typically computed in the literature can differ from r by orders of magnitude. We present new theories of the RB decay that are accurate for all small errors describable by process matrices, and show that the RB decay curve is a simple exponential for all such errors. Here, these theories allow explicit computation of the error rate that RB measures (r), but as far as we can tell it does not correspond to the infidelity of a physically allowed (completely positive) representation of the imperfect gates.« less
A new approach to the measurement of pelvic asymmetry: proposed methods and reliability.
Gnat, Rafael; Biały, Maciej
2015-05-01
This is a methodological study presenting a novel method of pelvic asymmetry (PA) measurement for use in the research laboratory setting. The purpose of the study is (1) to establish intrarater and interrater reliability of the proposed measures of PA, (2) to verify the influence of repeated measurements on the reliability, and (3) to assess correlation between the proposed measures of PA. Twelve healthy volunteers participated, and 2 teams of raters were involved. Registration of anatomic landmarks' positions in the optical motion capture system was repeated 3 times. Two asymmetry indexes were calculated: for pelvic torsion and for lateral pelvic tilt. Interclass correlation coefficients (ICCs), standard errors of measurement, and smallest detectable differences were used to describe the intrarater and interrater reliability of the 2 indexes. After 2 repeated registrations of pelvic landmarks' positions, the reliability of our asymmetry indexes was good and excellent. The ICCs for intrarater reliability ranged from 0.96 to 0.97; the ICCs for interrater reliability ranged 0.81 to 0.90. There was moderate, nonsignificant correlation between asymmetry indexes for pelvis torsion and for lateral pelvic tilt (r = 0.45, P = .14). The 2 proposed asymmetry indexes showed good and excellent intrarater and interrater reliability after 2 repeated registrations of pelvic landmarks' positions and thus may be useful in the research laboratory setting. However, these indexes are not strongly correlated, which suggests that the 2 types of PA may constitute different clinical entities. Copyright © 2015 National University of Health Sciences. Published by Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Platt, M. E.; Lewis, E. E.; Boehm, F.
1991-01-01
A Monte Carlo Fortran computer program was developed that uses two variance reduction techniques for computing system reliability applicable to solving very large highly reliable fault-tolerant systems. The program is consistent with the hybrid automated reliability predictor (HARP) code which employs behavioral decomposition and complex fault-error handling models. This new capability is called MC-HARP which efficiently solves reliability models with non-constant failures rates (Weibull). Common mode failure modeling is also a specialty.
Reliability study of biometrics "do not contact" in myopia.
Migliorini, R; Fratipietro, M; Comberiati, A M; Pattavina, L; Arrico, L
The aim of the study is a comparison between the actually achieved after surgery condition versus the expected refractive condition of the eye as calculated via a biometer. The study was conducted in a random group of 38 eyes of patients undergoing surgery by phacoemulsification. The mean absolute error was calculated between the predicted values from the measurements with the optical biometer and those obtained in the post-operative error which was at around 0.47% Our study shows results not far from those reported in the literature, and in relation, to the mean absolute error is among the lowest values at 0.47 ± 0.11 SEM.
Coding for reliable satellite communications
NASA Technical Reports Server (NTRS)
Gaarder, N. T.; Lin, S.
1986-01-01
This research project was set up to study various kinds of coding techniques for error control in satellite and space communications for NASA Goddard Space Flight Center. During the project period, researchers investigated the following areas: (1) decoding of Reed-Solomon codes in terms of dual basis; (2) concatenated and cascaded error control coding schemes for satellite and space communications; (3) use of hybrid coding schemes (error correction and detection incorporated with retransmission) to improve system reliability and throughput in satellite communications; (4) good codes for simultaneous error correction and error detection, and (5) error control techniques for ring and star networks.
Error control for reliable digital data transmission and storage systems
NASA Technical Reports Server (NTRS)
Costello, D. J., Jr.; Deng, R. H.
1985-01-01
A problem in designing semiconductor memories is to provide some measure of error control without requiring excessive coding overhead or decoding time. In LSI and VLSI technology, memories are often organized on a multiple bit (or byte) per chip basis. For example, some 256K-bit DRAM's are organized in 32Kx8 bit-bytes. Byte oriented codes such as Reed Solomon (RS) codes can provide efficient low overhead error control for such memories. However, the standard iterative algorithm for decoding RS codes is too slow for these applications. In this paper we present some special decoding techniques for extended single-and-double-error-correcting RS codes which are capable of high speed operation. These techniques are designed to find the error locations and the error values directly from the syndrome without having to use the iterative alorithm to find the error locator polynomial. Two codes are considered: (1) a d sub min = 4 single-byte-error-correcting (SBEC), double-byte-error-detecting (DBED) RS code; and (2) a d sub min = 6 double-byte-error-correcting (DBEC), triple-byte-error-detecting (TBED) RS code.
Analysis of real-time numerical integration methods applied to dynamic clamp experiments.
Butera, Robert J; McCarthy, Maeve L
2004-12-01
Real-time systems are frequently used as an experimental tool, whereby simulated models interact in real time with neurophysiological experiments. The most demanding of these techniques is known as the dynamic clamp, where simulated ion channel conductances are artificially injected into a neuron via intracellular electrodes for measurement and stimulation. Methodologies for implementing the numerical integration of the gating variables in real time typically employ first-order numerical methods, either Euler or exponential Euler (EE). EE is often used for rapidly integrating ion channel gating variables. We find via simulation studies that for small time steps, both methods are comparable, but at larger time steps, EE performs worse than Euler. We derive error bounds for both methods, and find that the error can be characterized in terms of two ratios: time step over time constant, and voltage measurement error over the slope factor of the steady-state activation curve of the voltage-dependent gating variable. These ratios reliably bound the simulation error and yield results consistent with the simulation analysis. Our bounds quantitatively illustrate how measurement error restricts the accuracy that can be obtained by using smaller step sizes. Finally, we demonstrate that Euler can be computed with identical computational efficiency as EE.
Chen, Kuan-Wei; Lee, Shih-Chieh; Chiang, Hsin-Yu; Syu, Ya-Cing; Yu, Xiao-Xuan; Hsieh, Ching-Lin
2017-11-01
Patients with schizophrenia tend to have deficits in advanced Theory of Mind (ToM). The "Reading the mind in the eyes" test (RMET), the Faux Pas Task, and the Strange Stories are commonly used for assessing advanced ToM. However, most of the psychometric properties of these 3 measures in patients with schizophrenia are unknown. The aims of this study were to validate the psychometric properties of the 3 advanced ToM measures in patients with schizophrenia, including: (1) test-retest reliability; (2) random measurement error; (3) practice effect; (4) concurrent validity; and (5) ecological validity. We recruited 53 patients with schizophrenia, who completed the 3 measures twice, 4 weeks apart. The Revised Social Functioning Scale-Taiwan short version (R-SFST) was completed within 3 days of first session of assessments. We found that the intraclass correlation coefficients of the RMET, Strange Stories, and Faux Pas Task were 0.24, 0.5, and 0.76. All 3 advanced ToM measures had large random measurement error, trivial to small practice effects, poor concurrent validity, and low ecological validity. We recommend that the scores of the 3 advanced ToM measures be interpreted with caution because these measures may not provide reliable and valid results on patients' advanced ToM abilities. Copyright © 2017 Elsevier B.V. All rights reserved.
Clinical height measurements are unreliable: a call for improvement.
Mikula, A L; Hetzel, S J; Binkley, N; Anderson, P A
2016-10-01
Height measurements are currently used to guide imaging decisions that assist in osteoporosis care, but their clinical reliability is largely unknown. We found both clinical height measurements and electronic health record height data to be unreliable. Improvement in height measurement is needed to improve osteoporosis care. The aim of this study is to assess the accuracy and reliability of clinical height measurement in a university healthcare clinical setting. Electronic health record (EHR) review, direct measurement of clinical stadiometer accuracy, and observation of staff height measurement technique at outpatient facilities of the University of Wisconsin Hospital and Clinics. We examined 32 clinical stadiometers for reliability and observed 34 clinic staff perform height measurements at 12 outpatient primary care and specialty clinics. An EHR search identified 4711 men and women age 43 to 89 with no known metabolic bone disease who had more than one height measurement over 3 months. The short study period and exclusion were selected to evaluate change in recorded height not due to pathologic processes. Mean EHR recorded height change (first to last measurement) was -0.02 cm (SD 1.88 cm). Eighteen percent of patients had height measurement differences noted in the EHR of ≥2 cm over 3 months. The technical error of measurement (TEM) was 1.77 cm with a relative TEM of 1.04 %. None of the staff observed performing height measurements followed all recommended height measurement guidelines. Fifty percent of clinic staff reported they on occasion enter patient reported height into the EHR rather than performing a measurement. When performing direct measurements on stadiometers, the mean difference from a gold standard length was 0.24 cm (SD 0.80). Nine percent of stadiometers examined had an error of >1.5 cm. Clinical height measurements and EHR recorded height results are unreliable. Improvement in this measure is needed as an adjunct to improve osteoporosis care.
Using generalizability theory to develop clinical assessment protocols.
Preuss, Richard A
2013-04-01
Clinical assessment protocols must produce data that are reliable, with a clinically attainable minimal detectable change (MDC). In a reliability study, generalizability theory has 2 advantages over classical test theory. These advantages provide information that allows assessment protocols to be adjusted to match individual patient profiles. First, generalizability theory allows the user to simultaneously consider multiple sources of measurement error variance (facets). Second, it allows the user to generalize the findings of the main study across the different study facets and to recalculate the reliability and MDC based on different combinations of facet conditions. In doing so, clinical assessment protocols can be chosen based on minimizing the number of measures that must be taken to achieve a realistic MDC, using repeated measures to minimize the MDC, or simply based on the combination that best allows the clinician to monitor an individual patient's progress over a specified period of time.
Murray, Nicholas P.; Hunfalvay, Melissa; Bolte, Takumi
2017-01-01
Purpose The purpose of this study was to determine the reliability of interpupillary distance (IPD) and pupil diameter (PD) measures using an infrared eye tracker and central point stimuli. Validity of the test compared to known clinical tools was determined, and normative data was established against which individuals can measure themselves. Methods Participants (416) across various demographics were examined for normative data. Of these, 50 were examined for reliability and validity. Validity for IPD measured the test (RightEye IPD/PD) against the PL850 Pupilometer and the Essilor Digital CRP. For PD, the test was measured against the Rosenbaum Pocket Vision Screener (RPVS). Reliability was analyzed with intraclass correlation coefficients (ICC) between trials with Cronbach's alpha (CA) and the standard error of measurement for each ICC. Convergent validity was investigated by calculating the bivariate correlation coefficient. Results Reliability results were strong (CA > 0.7) for all measures. High positive significant correlations were found between the RightEye IPD test and the PL850 Pupilometer (P < 0.001) and Essilor Digital CRP (P < 0.001) and for the RightEye PD test and the RPVS (P < 0.001). Conclusions Using infrared eye tracking and the RightEye IPD/PD test stimuli, reliable and accurate measures of IPD and PD were found. Results from normative data showed an adequate comparison for people with normal vision development. Translational Relevance Results revealed a central point of fixation may remove variability in examining PD reliably using infrared eye tracking when consistent environmental and experimental procedures are conducted. PMID:28685104
The inter and intra rater reliability of the Netball Movement Screening Tool.
Reid, Duncan A; Vanweerd, Rebecca J; Larmer, Peter J; Kingstone, Rachel
2015-05-01
To establish the inter- and intra-rater reliability of the Netball Movement Screening Tool, for screening adolescent female netball players. Inter- and intra-rater reliability study. Forty secondary school netball players were recruited to take part in the study. Twenty subjects were screened simultaneously and independently by two raters to ascertain inter-rater agreement. Twenty subjects were scored by rater one on two occasions, separated by a week, to ascertain intra-rater agreement. Inter and intra-rater agreement was assessed utilising the two-way mixed inter class correlation coefficient and weighted kappa statistics. No significant demographic differences were found between the inter and intra-rater groups of subjects. Inter class correlation coefficients' demonstrated excellent inter-rater (two-way mixed inter class correlation coefficients 0.84, standard error of measurement 0.25) and intra-rater (two-way mixed inter class correlation coefficients 0.96, standard error of measurement 0.13) reliability for the overall Netball Movement Screening Tool score and substantial-excellent (two-way mixed inter class correlation coefficients 1.0-0.65) inter-rater and substantial-excellent intra-rater (two-way mixed inter class correlation coefficients 0.96-0.79) reliability for the component scores of the Netball Movement Screening Tool. Kappa statistic showed substantial to poor inter-rater (k=0.75-0.32) and intra-rater (k=0.77-0.27) agreement for individual tests of the NMST. The Netball Movement Screening Tool may be a reliable screening tool for adolescent netball players; however the individual test scores have low reliability. The screening tool can be administered reliably by raters with similar levels of training in the tool but variable clinical experience. On-going research needs to be undertaken to ascertain whether the Netball Movement Screening Tool is a valid tool in ascertaining increased injury risk for netball players. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Lidar Based Emissions Measurement at the Whole Facility Scale: Method and Error Analysis
USDA-ARS?s Scientific Manuscript database
Particulate emissions from agricultural sources vary from dust created by operations and animal movement to the fine secondary particulates generated from ammonia and other emitted gases. The development of reliable facility emission data using point sampling methods designed to characterize regiona...
Ability Self-Estimates and Self-Efficacy: Meaningfully Distinct?
ERIC Educational Resources Information Center
Bubany, Shawn T.; Hansen, Jo-Ida C.
2010-01-01
Conceptual differences between self-efficacy and ability self-estimate scores, used in vocational psychology and career counseling, were examined with confirmatory factor analysis, discriminate relations, and reliability analysis. Results suggest that empirical differences may be due to measurement error or scale content, rather than due to the…
de Carvalho, Rogério Mendonca; Perez, Maria Del Carmen Janerio; Miranda, Fausto
2012-10-01
Traditional volumetry based on Archimedes' principle is the gold standard for the measurement of limb volume, but the routine use of this technique is discouraged because of several disadvantages. The purpose of this study was to evaluate intraobserver and interobserver reliability of direct measurements of wrist-hand volume using a new communicating vessels volumeter based on Pascal's law. A reliability study was conducted. To evaluate the reliability of the communicating vessels volumeter in generating measurements, 30 hands of 15 participants (9 women, 6 men) were measured 3 times each by 3 observers, totaling 270 volumetric results. Measurement time was short (X =3 minutes 42 seconds). The intraclass correlation coefficient (ICC) was .9977 for observer 1 and .9976 for observers 2 and 3. The interobserver ICC was .9998. The standard error of measurement was about 3 mL for all observers; the interobserver result was 1 mL. The interrater coefficient of variance (CV) was 1.15% for the series of 9 measurements collected for each segment; the intrarater CV was 1.20%. Limitations No swollen hands were measured, and measurements were not compared with the gold standard technique. Thus, accuracy of the new volumeter was not determined in this study. A new device has been developed for plethysmography of the extremities, and the results of its use to measure the volume of the wrist-hand segment were reliable in both intraobserver and interobserver analyses.
Extensive validation of the pain disability index in 3 groups of patients with musculoskeletal pain.
Soer, Remko; Köke, Albère J A; Vroomen, Patrick C A J; Stegeman, Patrick; Smeets, Rob J E M; Coppes, Maarten H; Reneman, Michiel F
2013-04-20
A cross-sectional study design was performed. To validate the pain disability index (PDI) extensively in 3 groups of patients with musculoskeletal pain. The PDI is a widely used and studied instrument for disability related to various pain syndromes, although there is conflicting evidence concerning factor structure, test-retest reliability, and missing items. Additionally, an official translation of the Dutch language version has never been performed. For reliability, internal consistency, factor structure, test-retest reliability and measurement error were calculated. Validity was tested with hypothesized correlations with pain intensity, kinesiophobia, Rand-36 subscales, Depression, Roland-Morris Disability Questionnaire, Quality of Life, and Work Status. Structural validity was tested with independent backward translation and approval from the original authors. One hundred seventy-eight patients with acute back pain, 425 patients with chronic low back pain and 365 with widespread pain were included. Internal consistency of the PDI was good. One factor was identified with factor analyses. Test-retest reliability was good for the PDI (intraclass correlation coefficient, 0.76). Standard error of measurement was 6.5 points and smallest detectable change was 17.9 points. Little correlations between the PDI were observed with kinesiophobia and depression, fair correlations with pain intensity, work status, and vitality and moderate correlations with the Rand-36 subscales and the Roland-Morris Disability Questionnaire. The PDI-Dutch language version is internally consistent as a 1-factor structure, and test-retest reliable. Missing items seem high in sexual and professional items. Using the PDI as a 2-factor questionnaire has no additional value and is unreliable.
Terry, Leann; Kelley, Ken
2012-11-01
Composite measures play an important role in psychology and related disciplines. Composite measures almost always have error. Correspondingly, it is important to understand the reliability of the scores from any particular composite measure. However, the point estimates of the reliability of composite measures are fallible and thus all such point estimates should be accompanied by a confidence interval. When confidence intervals are wide, there is much uncertainty in the population value of the reliability coefficient. Given the importance of reporting confidence intervals for estimates of reliability, coupled with the undesirability of wide confidence intervals, we develop methods that allow researchers to plan sample size in order to obtain narrow confidence intervals for population reliability coefficients. We first discuss composite reliability coefficients and then provide a discussion on confidence interval formation for the corresponding population value. Using the accuracy in parameter estimation approach, we develop two methods to obtain accurate estimates of reliability by planning sample size. The first method provides a way to plan sample size so that the expected confidence interval width for the population reliability coefficient is sufficiently narrow. The second method ensures that the confidence interval width will be sufficiently narrow with some desired degree of assurance (e.g., 99% assurance that the 95% confidence interval for the population reliability coefficient will be less than W units wide). The effectiveness of our methods was verified with Monte Carlo simulation studies. We demonstrate how to easily implement the methods with easy-to-use and freely available software. ©2011 The British Psychological Society.
Gao, Zhongyang; Song, Hui; Ren, Fenggang; Li, Yuhuan; Wang, Dong; He, Xijing
2017-12-01
The aim of the present study was to evaluate the reliability of the Cartesian Optoelectronic Dynamic Anthropometer (CODA) motion system in measuring the cervical range of motion (ROM) and verify the construct validity of the CODA motion system. A total of 26 patients with cervical spondylosis and 22 patients with anterior cervical fusion were enrolled and the CODA motion analysis system was used to measure the three-dimensional cervical ROM. Intra- and inter-rater reliability was assessed by interclass correlation coefficients (ICCs), standard error of measurement (SEm), Limits of Agreements (LOA) and minimal detectable change (MDC). Independent samples t-tests were performed to examine the differences of cervical ROM between cervical spondylosis and anterior cervical fusion patients. The results revealed that in the cervical spondylosis group, the reliability was almost perfect (intra-rater reliability: ICC, 0.87-0.95; LOA, -12.86-13.70; SEm, 2.97-4.58; inter-rater reliability: ICC, 0.84-0.95; LOA, -13.09-13.48; SEm, 3.13-4.32). In the anterior cervical fusion group, the reliability was high (intra-rater reliability: ICC, 0.88-0.97; LOA, -10.65-11.08; SEm, 2.10-3.77; inter-rater reliability: ICC, 0.86-0.96; LOA, -10.91-13.66; SEm, 2.20-4.45). The cervical ROM in the cervical spondylosis group was significantly higher than that in the anterior cervical fusion group in all directions except for left rotation. In conclusion, the CODA motion analysis system is highly reliable in measuring cervical ROM and the construct validity was verified, as the system was sufficiently sensitive to distinguish between the cervical spondylosis and anterior cervical fusion groups based on their ROM.
Traceability of On-Machine Tool Measurement: A Review
Gomez-Acedo, Eneko; Kortaberria, Gorka; Olarra, Aitor
2017-01-01
Nowadays, errors during the manufacturing process of high value components are not acceptable in driving industries such as energy and transportation. Sectors such as aerospace, automotive, shipbuilding, nuclear power, large science facilities or wind power need complex and accurate components that demand close measurements and fast feedback into their manufacturing processes. New measuring technologies are already available in machine tools, including integrated touch probes and fast interface capabilities. They provide the possibility to measure the workpiece in-machine during or after its manufacture, maintaining the original setup of the workpiece and avoiding the manufacturing process from being interrupted to transport the workpiece to a measuring position. However, the traceability of the measurement process on a machine tool is not ensured yet and measurement data is still not fully reliable enough for process control or product validation. The scientific objective is to determine the uncertainty on a machine tool measurement and, therefore, convert it into a machine integrated traceable measuring process. For that purpose, an error budget should consider error sources such as the machine tools, components under measurement and the interactions between both of them. This paper reviews all those uncertainty sources, being mainly focused on those related to the machine tool, either on the process of geometric error assessment of the machine or on the technology employed to probe the measurand. PMID:28696358
Skin Friction at Very High Reynolds Numbers in the National Transonic Facility
NASA Technical Reports Server (NTRS)
Watson, Ralph D.; Anders, John B.; Hall, Robert M.
2006-01-01
Skin friction coefficients were derived from measurements using standard measurement technologies on an axisymmetric cylinder in the NASA Langley National Transonic Facility (NTF) at Mach numbers from 0.2 to 0.85. The pressure gradient was nominally zero, the wall temperature was nominally adiabatic, and the ratio of boundary layer thickness to model diameter within the measurement region was 0.10 to 0.14, varying with distance along the model. Reynolds numbers based on momentum thicknesses ranged from 37,000 to 605,000. The measurements approximately doubled the range of available data for flat plate skin friction coefficients. Three different techniques were used to measure surface shear. The maximum error of Preston tube measurements was estimated to be 2.5 percent, while that of Clauser derived measurements was estimated to be approximately 5 percent. Direct measurements by skin friction balance proved to be subject to large errors and were not considered reliable.
Duruturk, Neslihan; Tonga, Eda; Gabel, Charles Philip; Acar, Manolya; Tekindal, Agah
2015-07-26
This study aims to adapt culturally a Turkish version of the Lower Limb Functional Index (LLFI) and to determine its validity, reliability, internal consistency, measurement sensitivity and factor structure in lower limb problems. The LLFI was translated into Turkish and cross-culturally adapted with a double forward-backward protocol that determined face and content validity. Individuals (n = 120) with lower limb musculoskeletal disorders completed the LLFI and Short Form-36 questionnaires and the Timed Up and Go physical test. The psychometric properties were evaluated for the all participants from patient-reported outcome measures made at baseline and repeated at day 3 to determine criterion between scores (Pearson's r), internal consistency (Cronbachs α) and test-retest reliability (intraclass correlation coefficient - ICC 2.1 ). Error was determined using standard error of the measurement (SEM) and minimal detectable change at the 90% level (MDC 90 ), while factor structure was determined using exploratory factor analysis with maximum likelihood extraction and Varimax rotation. The psychometric characteristics showed strong criterion validity (r = 0.74-0.76), high internal consistency (α = 0.82) and high test-retest reability (ICC 2.1 = 0.97). The SEM of 3.2% gave an MDC 90 = 5.8%. The factor structure was uni-dimensional. Turkish version of LLFI was found to be valid and reliable for the measurement of lower limb function in a Turkish population. Implications for Rehabilitation Lower extremity musculoskeletal disorders are common and greatly impact activities among the affected individuals pertaining to daily living, work, leisure and quality of life. Patient-reported outcome (PRO) measures have advantages as they are practical, cost-effective and clinically convenient for use in patient-centered care. The Lower Limb Functional Index is a recently validated PRO measure shown to have strong clinimetric properties.
Cann, A P; Connolly, M; Ruuska, R; MacNeil, M; Birmingham, T B; Vandervoort, A A; Callaghan, J P
2008-04-01
Despite the ongoing health problem of repetitive strain injuries, there are few tools currently available for ergonomic applications evaluating cumulative loading that have well-documented evidence of reliability and validity. The purpose of this study was to determine the inter-rater reliability of a posture matching based analysis tool (3DMatch, University of Waterloo) for predicting cumulative and peak spinal loads. A total of 30 food service workers were each videotaped for a 1-h period while performing typical work activities and a single work task was randomly selected from each for analysis by two raters. Inter-rater reliability was determined using intraclass correlation coefficients (ICC) model 2,1 and standard errors of measurement for cumulative and peak spinal and shoulder loading variables across all subjects. Overall, 85.5% of variables had moderate to excellent inter-rater reliability, with ICCs ranging from 0.30-0.99 for all cumulative and peak loading variables. 3DMatch was found to be a reliable ergonomic tool when more than one rater is involved.
Sun, Xiao-Gang; Tang, Hong; Yuan, Gui-Bin
2008-05-01
For the total light scattering particle sizing technique, an inversion and classification method was proposed with the dependent model algorithm. The measured particle system was inversed simultaneously by different particle distribution functions whose mathematic model was known in advance, and then classified according to the inversion errors. The simulation experiments illustrated that it is feasible to use the inversion errors to determine the particle size distribution. The particle size distribution function was obtained accurately at only three wavelengths in the visible light range with the genetic algorithm, and the inversion results were steady and reliable, which decreased the number of multi wavelengths to the greatest extent and increased the selectivity of light source. The single peak distribution inversion error was less than 5% and the bimodal distribution inversion error was less than 10% when 5% stochastic noise was put in the transmission extinction measurement values at two wavelengths. The running time of this method was less than 2 s. The method has advantages of simplicity, rapidity, and suitability for on-line particle size measurement.
Hwang, Seonhong; Tsai, Chung-Ying; Koontz, Alicia M
2017-05-24
The purpose of this study was to test the concurrent validity and test-retest reliability of the Kinect skeleton tracking algorithm for measurement of trunk, shoulder, and elbow joint angle measurement during a wheelchair transfer task. Eight wheelchair users were recruited for this study. Joint positions were recorded simultaneously by the Kinect and Vicon motion capture systems while subjects transferred from their wheelchairs to a level bench. Shoulder, elbow, and trunk angles recorded with the Kinect system followed a similar trajectory as the angles recorded with the Vicon system with correlation coefficients that are larger than 0.71 on both sides (leading arm and trailing arm). The root mean square errors (RMSEs) ranged from 5.18 to 22.46 for the shoulder, elbow, and trunk angles. The 95% limits of agreement (LOA) for the discrepancy between the two systems exceeded the clinical significant level of 5°. For the trunk, shoulder, and elbow angles, the Kinect had very good relative reliability for the measurement of sagittal, frontal and horizontal trunk angles, as indicated by the high intraclass correlation coefficient (ICC) values (>0.90). Small standard error of the measure (SEM) values, indicating good absolute reliability, were observed for all joints except for the leading arm's shoulder joint. Relatively large minimal detectable changes (MDCs) were observed in all joint angles. The Kinect motion tracking has promising performance levels for some upper limb joints. However, more accurate measurement of the joint angles may be required. Therefore, understanding the limitations in precision and accuracy of Kinect is imperative before utilization of Kinect.
Educational Testing and Validity of Conclusions in the Scholarship of Teaching and Learning
Beltyukova, Svetlana A.; Martin, Beth A.
2013-01-01
Validity and its integral evidence of reliability are fundamentals for educational and psychological measurement, and standards of educational testing. Herein, we describe these standards of educational testing, along with their subtypes including internal consistency, inter-rater reliability, and inter-rater agreement. Next, related issues of measurement error and effect size are discussed. This article concludes with a call for future authors to improve reporting of psychometrics and practical significance with educational testing in the pharmacy education literature. By increasing the scientific rigor of educational research and reporting, the overall quality and meaningfulness of SoTL will be improved. PMID:24249848
Intraday and Interday Reliability of Ultra-Short-Term Heart Rate Variability in Rugby Union Players.
Nakamura, Fábio Y; Pereira, Lucas A; Esco, Michael R; Flatt, Andrew A; Moraes, José E; Cal Abad, Cesar C; Loturco, Irineu
2017-02-01
Nakamura, FY, Pereira, LA, Esco, MR, Flatt, AA, Moraes, JE, Cal Abad, CC, and Loturco, I. Intraday and interday reliability of ultra-short-term heart rate variability in rugby union players. J Strength Cond Res 31(2): 548-551, 2017-The aim of this study was to examine the intraday and interday reliability of ultra-short-term vagal-related heart rate variability (HRV) in elite rugby union players. Forty players from the Brazilian National Rugby Team volunteered to participate in this study. The natural log of the root mean square of successive RR interval differences (lnRMSSD) assessments were performed on 4 different days. The HRV was assessed twice (intraday reliability) on the first day and once per day on the following 3 days (interday reliability). The RR interval recordings were obtained from 2-minute recordings using a portable heart rate monitor. The relative reliability of intraday and interday lnRMSSD measures was analyzed using the intraclass correlation coefficient (ICC). The typical error of measurement (absolute reliability) of intraday and interday lnRMSSD assessments was analyzed using the coefficient of variation (CV). Both intraday (ICC = 0.96; CV = 3.99%) and interday (ICC = 0.90; CV = 7.65%) measures were highly reliable. The ultra-short-term lnRMSSD is a consistent measure for evaluating elite rugby union players, in both intraday and interday settings. This study provides further validity to using this shortened method in practical field conditions with highly trained team sports athletes.
Reliability analysis of a sensitive and independent stabilometry parameter set
Nagymáté, Gergely; Orlovits, Zsanett
2018-01-01
Recent studies have suggested reduced independent and sensitive parameter sets for stabilometry measurements based on correlation and variance analyses. However, the reliability of these recommended parameter sets has not been studied in the literature or not in every stance type used in stabilometry assessments, for example, single leg stances. The goal of this study is to evaluate the test-retest reliability of different time-based and frequency-based parameters that are calculated from the center of pressure (CoP) during bipedal and single leg stance for 30- and 60-second measurement intervals. Thirty healthy subjects performed repeated standing trials in a bipedal stance with eyes open and eyes closed conditions and in a single leg stance with eyes open for 60 seconds. A force distribution measuring plate was used to record the CoP. The reliability of the CoP parameters was characterized by using the intraclass correlation coefficient (ICC), standard error of measurement (SEM), minimal detectable change (MDC), coefficient of variation (CV) and CV compliance rate (CVCR). Based on the ICC, SEM and MDC results, many parameters yielded fair to good reliability values, while the CoP path length yielded the highest reliability (smallest ICC > 0.67 (0.54–0.79), largest SEM% = 19.2%). Usually, frequency type parameters and extreme value parameters yielded poor reliability values. There were differences in the reliability of the maximum CoP velocity (better with 30 seconds) and mean power frequency (better with 60 seconds) parameters between the different sampling intervals. PMID:29664938
Reliability analysis of a sensitive and independent stabilometry parameter set.
Nagymáté, Gergely; Orlovits, Zsanett; Kiss, Rita M
2018-01-01
Recent studies have suggested reduced independent and sensitive parameter sets for stabilometry measurements based on correlation and variance analyses. However, the reliability of these recommended parameter sets has not been studied in the literature or not in every stance type used in stabilometry assessments, for example, single leg stances. The goal of this study is to evaluate the test-retest reliability of different time-based and frequency-based parameters that are calculated from the center of pressure (CoP) during bipedal and single leg stance for 30- and 60-second measurement intervals. Thirty healthy subjects performed repeated standing trials in a bipedal stance with eyes open and eyes closed conditions and in a single leg stance with eyes open for 60 seconds. A force distribution measuring plate was used to record the CoP. The reliability of the CoP parameters was characterized by using the intraclass correlation coefficient (ICC), standard error of measurement (SEM), minimal detectable change (MDC), coefficient of variation (CV) and CV compliance rate (CVCR). Based on the ICC, SEM and MDC results, many parameters yielded fair to good reliability values, while the CoP path length yielded the highest reliability (smallest ICC > 0.67 (0.54-0.79), largest SEM% = 19.2%). Usually, frequency type parameters and extreme value parameters yielded poor reliability values. There were differences in the reliability of the maximum CoP velocity (better with 30 seconds) and mean power frequency (better with 60 seconds) parameters between the different sampling intervals.
Finkbeiner, Kristin M; Wilson, Kyle M; Russell, Paul N; Helton, William S
2015-04-01
Performance on the sustained attention to response task (SART) is often characterized by a speed-accuracy trade-off, and SART performance may be influenced by strategic factors (Head and Helton Conscious Cogn 22: 913-919, 2013). Previous research indicates a significant difference between reliable and unreliable warning cues on response times and errors (commission and omission), suggesting that SART tasks are influenced by strategic factors (Helton et al. Conscious Cogn 20: 1732-1737, 2011; Exp Brain Res 209: 401-407, 2011). With regards to warning stimuli, we chose to use cute images (exhibiting infantile features) during a SART, as previous literature indicates cute images cause participants to engage attention. If viewing cute things makes the viewer exert more attention than normal, then exposure to cute stimuli during the SART should improve performance if SART performance is a measure of perceptual coupling. Reliable warning cues were shown to reduce both response time and errors of commission, and increase errors of omission, relative to unreliable warning cues. Cuteness of the warning stimuli, however, had no significant effect on SART performance. These results suggest the importance of strategic factors in SART performance, not increased attention, and add to the growing literature which suggests the SART is not a good measure of sustained attention, vigilance or perceptual coupling.
Development of a refractive error quality of life scale for Thai adults (the REQ-Thai).
Sukhawarn, Roongthip; Wiratchai, Nonglak; Tatsanavivat, Pyatat; Pitiyanuwat, Somwung; Kanato, Manop; Srivannaboon, Sabong; Guyatt, Gordon H
2011-08-01
To develop a scale for measuring refractive error quality of life (QOL) for Thai adults. The full survey comprised 424 respondents from 5 medical centers in Bangkok and from 3 medical centers in Chiangmai, Songkla and KhonKaen provinces. Participants were emmetropes and persons with refractive correction with visual acuity of 20/30 or better An item reduction process was employed by combining 3 methods-expert opinion, impact method and item-total correlation methods. The classical reliability testing and the validity testing including convergent, discriminative and construct validity was performed. The developed questionnaire comprised 87 items in 6 dimensions: 1) quality of vision, 2) visual function, 3) social function, 4) psychological function, 5) symptoms and 6) refractive correction problems. It is the 5-level Likert scale type. The Cronbach's Alpha coefficients of its dimensions ranged from 0.756 to 0. 979. All validity testing were shown to be valid. The construct validity was validated by the confirmatory factor analysis. A short version questionnaire comprised 48 items with good reliability and validity was also developed. This is the first validated instrument for measuring refractive error quality of life for Thai adults that was developed with strong research methodology and large sample size.
Determination of Earth orientation using the Global Positioning System
NASA Technical Reports Server (NTRS)
Freedman, A. P.
1989-01-01
Modern spacecraft tracking and navigation require highly accurate Earth-orientation parameters. For near-real-time applications, errors in these quantities and their extrapolated values are a significant error source. A globally distributed network of high-precision receivers observing the full Global Positioning System (GPS) configuration of 18 or more satellites may be an efficient and economical method for the rapid determination of short-term variations in Earth orientation. A covariance analysis using the JPL Orbit Analysis and Simulation Software (OASIS) was performed to evaluate the errors associated with GPS measurements of Earth orientation. These GPS measurements appear to be highly competitive with those from other techniques and can potentially yield frequent and reliable centimeter-level Earth-orientation information while simultaneously allowing the oversubscribed Deep Space Network (DSN) antennas to be used more for direct project support.
Zhu, Mengshi; Murayama, Hideaki; Wada, Daichi
2017-10-12
A novel method is introduced in this work for effectively evaluating the performance of the PANDA type polarization-maintaining fiber Bragg grating (PANDA-FBG) distributed dynamic strain and temperature sensing system. Conventionally, the errors during the measurement are unknown or evaluated by using other sensors such as strain gauge and thermocouples. This will make the sensing system complicated and decrease the efficiency since more than one kind of sensor is applied for the same measurand. In this study, we used the approximately constant ratio of primary errors in strain and temperature measurement and realized the self-evaluation of the sensing system, which can significantly enhance the applicability, as well as the reliability in strategy making.
Reliability of anthropometric measurements in young male and female artistic gymnasts.
Siatras, Theophanis; Skaperda, Malamati; Mameletzi, Dimitra
2010-12-01
Body dimensions and body composition of children participating in artistic activities, such as gymnastics and many types of dancing, are important factors in performance improvement. The present study aimed to determine the reliability of a series of selected anthropometric measurements in young male and female gymnasts. Segment lengths, body breadths, circumferences, and skinfold thickness were measured in 20 young gymnasts by the same experienced examiner, using portable and easy-to-use instruments. All parameters were measured twice (test-retest) under the same conditions within a week's period. The high intra-class correlation coefficient (ICC) values ranging from 0.87 to 0.99, as well as the low coefficient of variation (CV) values (<5.3%), affirmed that the selected measurements were highly reliable. The technical error of measurement (TEM) values for lengths and breadths were 0.15 to 0.80 cm, for circumferences 0.22 to 1 cm, and for skinfold thickness 0.33 to 0.58 mm. The high test-retest ICC and the low CV and TEM values confirmed the reliability of all anthropometric measurements in young artistic gymnasts. Therefore, these measurements could contribute to further research in this field of investigation, helping to monitor young artistic gymnasts' growth status and identify specific characteristics for increased performance in this sport.
An FEC Adaptive Multicast MAC Protocol for Providing Reliability in WLANs
NASA Astrophysics Data System (ADS)
Basalamah, Anas; Sato, Takuro
For wireless multicast applications like multimedia conferencing, voice over IP and video/audio streaming, a reliable transmission of packets within short delivery delay is needed. Moreover, reliability is crucial to the performance of error intolerant applications like file transfer, distributed computing, chat and whiteboard sharing. Forward Error Correction (FEC) is frequently used in wireless multicast to enhance Packet Error Rate (PER) performance, but cannot assure full reliability unless coupled with Automatic Repeat Request forming what is knows as Hybrid-ARQ. While reliable FEC can be deployed at different levels of the protocol stack, it cannot be deployed on the MAC layer of the unreliable IEEE802.11 WLAN due to its inability to exchange ACKs with multiple recipients. In this paper, we propose a Multicast MAC protocol that enhances WLAN reliability by using Adaptive FEC and study it's performance through mathematical analysis and simulation. Our results show that our protocol can deliver high reliability and throughput performance.
Measuring Viscosities of Gases at Atmospheric Pressure
NASA Technical Reports Server (NTRS)
Singh, Jag J.; Mall, Gerald H.; Hoshang, Chegini
1987-01-01
Variant of general capillary method for measuring viscosities of unknown gases based on use of thermal mass-flowmeter section for direct measurement of pressure drops. In technique, flowmeter serves dual role, providing data for determining volume flow rates and serving as well-characterized capillary-tube section for measurement of differential pressures across it. New method simple, sensitive, and adaptable for absolute or relative viscosity measurements of low-pressure gases. Suited for very complex hydrocarbon mixtures where limitations of classical theory and compositional errors make theoretical calculations less reliable.
Ruiz, Jonatan R; Ortega, Francisco B; Castro-Piñero, Jose
2014-11-30
We investigated the criterion-related validity and the reliability of the 1/4 mile run-walk test (MRWT) in children and adolescents. A total of 86 children (n=42 girls) completed a maximal graded treadmill test using a gas analyzer and the 1/4MRW test. We investigated the test-retest reliability of the 1/4MRWT in a different group of children and adolescents (n=995, n=418 girls). The 1/4MRWT time, sex, and BMI significantly contributed to predict measured VO2peak (R2= 0.32). There was no systematic bias in the cross-validation group (P>0.1). The root mean sum of squared errors (RMSE) and the percentage error were 6.9 ml/kg/min and 17.7%, respectively, and the accurate prediction (i.e. the percentage of estimations within ±4.5 ml/kg/min of VO2peak) was 48.8%. The reliability analysis showed that the mean inter-trial difference ranged from 0.6 seconds in children aged 6-11 years to 1.3 seconds in adolescents aged 12-17 years (all P. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.
Powell, Adam C; Torous, John; Chan, Steven; Raynor, Geoffrey Stephen; Shwarts, Erik; Shanahan, Meghan; Landman, Adam B
2016-02-10
There are over 165,000 mHealth apps currently available to patients, but few have undergone an external quality review. Furthermore, no standardized review method exists, and little has been done to examine the consistency of the evaluation systems themselves. We sought to determine which measures for evaluating the quality of mHealth apps have the greatest interrater reliability. We identified 22 measures for evaluating the quality of apps from the literature. A panel of 6 reviewers reviewed the top 10 depression apps and 10 smoking cessation apps from the Apple iTunes App Store on these measures. Krippendorff's alpha was calculated for each of the measures and reported by app category and in aggregate. The measure for interactiveness and feedback was found to have the greatest overall interrater reliability (alpha=.69). Presence of password protection (alpha=.65), whether the app was uploaded by a health care agency (alpha=.63), the number of consumer ratings (alpha=.59), and several other measures had moderate interrater reliability (alphas>.5). There was the least agreement over whether apps had errors or performance issues (alpha=.15), stated advertising policies (alpha=.16), and were easy to use (alpha=.18). There were substantial differences in the interrater reliabilities of a number of measures when they were applied to depression versus smoking apps. We found wide variation in the interrater reliability of measures used to evaluate apps, and some measures are more robust across categories of apps than others. The measures with the highest degree of interrater reliability tended to be those that involved the least rater discretion. Clinical quality measures such as effectiveness, ease of use, and performance had relatively poor interrater reliability. Subsequent research is needed to determine consistent means for evaluating the performance of apps. Patients and clinicians should consider conducting their own assessments of apps, in conjunction with evaluating information from reviews.
Chan, Steven; Raynor, Geoffrey Stephen; Shwarts, Erik; Shanahan, Meghan; Landman, Adam B
2016-01-01
Background There are over 165,000 mHealth apps currently available to patients, but few have undergone an external quality review. Furthermore, no standardized review method exists, and little has been done to examine the consistency of the evaluation systems themselves. Objective We sought to determine which measures for evaluating the quality of mHealth apps have the greatest interrater reliability. Methods We identified 22 measures for evaluating the quality of apps from the literature. A panel of 6 reviewers reviewed the top 10 depression apps and 10 smoking cessation apps from the Apple iTunes App Store on these measures. Krippendorff’s alpha was calculated for each of the measures and reported by app category and in aggregate. Results The measure for interactiveness and feedback was found to have the greatest overall interrater reliability (alpha=.69). Presence of password protection (alpha=.65), whether the app was uploaded by a health care agency (alpha=.63), the number of consumer ratings (alpha=.59), and several other measures had moderate interrater reliability (alphas>.5). There was the least agreement over whether apps had errors or performance issues (alpha=.15), stated advertising policies (alpha=.16), and were easy to use (alpha=.18). There were substantial differences in the interrater reliabilities of a number of measures when they were applied to depression versus smoking apps. Conclusions We found wide variation in the interrater reliability of measures used to evaluate apps, and some measures are more robust across categories of apps than others. The measures with the highest degree of interrater reliability tended to be those that involved the least rater discretion. Clinical quality measures such as effectiveness, ease of use, and performance had relatively poor interrater reliability. Subsequent research is needed to determine consistent means for evaluating the performance of apps. Patients and clinicians should consider conducting their own assessments of apps, in conjunction with evaluating information from reviews. PMID:26863986
Clark, S; Rose, D J
2001-04-01
To establish reliability estimates of the 75% Limits of Stability Test (75% LOS test) when administered to community-dwelling older adults with a history of falls. Generalizability theory was used to estimate both the relative contribution of identified error sources to the total measurement error and generalizability coefficients. A random effects repeated-measures analysis of variance (ANOVA) was used to assess consistency of LOS test movement variables across both days and targets. A motor control research laboratory in a university setting. Fifty community-dwelling older adults with 2 or more falls in the previous year. Spatial and temporal measures of dynamic balance derived from the 75% LOS test included average movement velocity, maximum center of gravity (COG) excursion, end-point COG excursion, and directional control. Estimated generalizability coefficients for 2 testing days ranged from.58 to.87. Total variance in LOS test measures attributable to inconsistencies in day-to-day test performance (Day and Subject x Day facets) ranged from 2.5% to 8.4%. The ANOVA results indicated that no significant differences were observed in the LOS test variables across the 2 testing days. The 75% LOS test administered to older adult fallers on 2 consecutive days provides consistent and reliable measures of dynamic balance.
Muir-Hunter, Susan W; Graham, Laura; Montero Odasso, Manuel
2015-08-01
To measure test-retest and interrater reliability of the Berg Balance Scale (BBS) in community-dwelling adults with mild to moderate Alzheimer disease (AD). Method : A sample of 15 adults (mean age 80.20 [SD 5.03] years) with AD performed three balance tests: the BBS, timed up-and-go test (TUG), and Functional Reach Test (FRT). Both relative reliability, using the intra-class correlation coefficient (ICC), and absolute reliability, using standard error of measurement (SEM) and minimal detectable change (MDC95) values, were calculated; Bland-Altman plots were constructed to evaluate inter-tester agreement. The test-retest interval was 1 week. Results : For the BBS, relative reliability values were 0.95 (95% CI, 0.85-0.98) for test-retest reliability and 0.72 (95% CI, 0.31-0.91) for interrater reliability; SEM was 6.01 points and MDC95 was 16.66 points; and interrater agreement was 16.62 points. The BBS performed better in test-retest reliability than the TUG and FRT, tests with established reliability in AD. Between 33% and 50% of participants required cueing beyond standardized instructions because they were unable to remember test instructions. Conclusions : The BBS achieved relative reliability values that support its clinical utility, but MDC95 and agreement values indicate the scale has performance limitations in AD. Further research to optimize balance assessment for people with AD is required.
Inter-Rater Reliability of Cyclotorsion Measurements Using Fundus Photography.
Dysli, Muriel; Kanku, Madeleine; Traber, Ghislaine L
2018-04-01
The foveo-papillary angle (FPA) on fundus photographs is the accepted standard for the measurement of ocular cyclotorsion. We assessed the inter-rater reliability of this method in healthy subjects and in patients with trochlear nerve palsies. In this methodological study, fundus photographs of healthy subjects and of patients with trochlear nerve palsies were made with a fundus camera (Zeiss Fundus Camera FF 450 plus, Jena, Germany). Three independent observers measured the FPA on the fundus photographs of all subjects in synedra View (synedra View 16, Version 16.0.0.11, Innsbruck, Austria). One hundred and four eyes of 52 subjects (26 healthy controls and 26 patients) were assessed. The mean FPA of the healthy controls was 5.80 degrees (°) [± 0.44 standard error of the mean (SEM)] compared to 11.55° (± 0.80 SEM) for patients with trochlear nerve palsies. The inter-rater reliability of all measured FPAs showed an intraclass correlation coefficient (ICC) of 0.98 (95% CI 0.97 - 0.98). The inter-rater reliability of objective cyclotorsion measurements using fundus photographs was very high. Georg Thieme Verlag KG Stuttgart · New York.
NASA Astrophysics Data System (ADS)
Fisher, W. P., Jr.; Elbaum, B.; Coulter, A.
2010-07-01
Reliability coefficients indicate the proportion of total variance attributable to differences among measures separated along a quantitative continuum by a testing, survey, or assessment instrument. Reliability is usually considered to be influenced by both the internal consistency of a data set and the number of items, though textbooks and research papers rarely evaluate the extent to which these factors independently affect the data in question. Probabilistic formulations of the requirements for unidimensional measurement separate consistency from error by modelling individual response processes instead of group-level variation. The utility of this separation is illustrated via analyses of small sets of simulated data, and of subsets of data from a 78-item survey of over 2,500 parents of children with disabilities. Measurement reliability ultimately concerns the structural invariance specified in models requiring sufficient statistics, parameter separation, unidimensionality, and other qualities that historically have made quantification simple, practical, and convenient for end users. The paper concludes with suggestions for a research program aimed at focusing measurement research more on the calibration and wide dissemination of tools applicable to individuals, and less on the statistical study of inter-variable relations in large data sets.
Chen, Chia Lin; Lo, Chu Ling; Huang, Kai Chu; Huang, Chen Fu
2017-10-01
[Purpose] The aim of this study was to determine the intrarater reliability of using ultrasonography as a measurement tool to assess the patella position in a weight-bearing condition. [Subjects and Methods] Ten healthy adults participated in this study. Ultrasonography was used to assess the patella position during step down with the loading knee in flexion (0° and 20°). The distance between the patella and lateral condyle was measured to represent the patella position on the condylar groove. Two measurements were obtained on the first day and the day after 1 week by the same investigator. [Results] Excellent intrarater reliability, ranging from 0.83 to 0.93, was shown in both conditions. Standard errors of the measurements were 0.5 mm in the straight knee and 0.7 mm in the knee flexion at 20°. Minimal differences in knee flexion at 0° and knee flexion at 20° were 1.5 mm and 1.9 mm, respectively. [Conclusion] Ultrasonography is a reliable assessment tool for evaluating the positional changes of the patella in weight-bearing activities, and it can be easily used by practitioners in the clinical setting.
Metsavaht, Leonardo; Leporace, Gustavo; Riberto, Marcelo; Sposito, Maria Matilde M; Del Castillo, Letícia N C; Oliveira, Liszt P; Batista, Luiz Alberto
2012-11-01
Clinical measurement. To translate and culturally adapt the Lower Extremity Functional Scale (LEFS) into a Brazilian Portuguese version, and to test the construct and content validity and reliability of this version in patients with knee injuries. There is no Brazilian Portuguese version of an instrument to assess the function of the lower extremity after orthopaedic injury. The translation of the original English version of the LEFS into a Brazilian Portuguese version was accomplished using standard guidelines and tested in 31 patients with knee injuries. Subsequently, 87 patients with a variety of knee disorders completed the Brazilian Portuguese LEFS, the Medical Outcomes Study 36-Item Short-Form Health Survey, the Western Ontario and McMaster Universities Osteoarthritis Index, and the International Knee Documentation Committee Subjective Knee Evaluation Form and a visual analog scale for pain. All patients were retested within 2 days to determine reliability of these measures. Validation was assessed by determining the level of association between the Brazilian Portuguese LEFS and the other outcome measures. Reliability was documented by calculating internal consistency, test-retest reliability, and standard error of measurement. The Brazilian Portuguese LEFS had a high level of association with the physical component of the Medical Outcomes Study 36-Item Short-Form Health Survey (r = 0.82), the Western Ontario and McMaster Universities Osteoarthritis Index (r = 0.87), the International Knee Documentation Committee Subjective Knee Evaluation Form (r = 0.82), and the pain visual analog scale (r = -0.60) (all, P<.05). The Brazilian Portuguese LEFS had a low level of association with the mental component of the Medical Outcomes Study 36-Item Short-Form Health Survey (r = 0.38, P<.05). The internal consistency (Cronbach α = .952) and test-retest reliability (intraclass correlation coefficient = 0.957) of the Brazilian Portuguese version of the LEFS were high. The standard error of measurement was low (3.6) and the agreement was considered high, demonstrated by the small differences between test and retest and the narrow limit of agreement, as observed in Bland-Altman and survival-agreement plots. The translation of the LEFS into a Brazilian Portuguese version was successful in preserving the semantic and measurement properties of the original version and was shown to be valid and reliable in a Brazilian population with knee injuries.
L. R. Auchmoody
1976-01-01
A study to determine the reliability of first-year growth measurements obtained from aluminum band dendrometers showed that growth was underestimated for black cherry trees growing less than 0.5 inch in diameter or accumulating less than 0.080 square foot of basal area. Prediction equations to correct for these errors are given.
Using Analysis of Covariance (ANCOVA) with Fallible Covariates
ERIC Educational Resources Information Center
Culpepper, Steven Andrew; Aguinis, Herman
2011-01-01
Analysis of covariance (ANCOVA) is used widely in psychological research implementing nonexperimental designs. However, when covariates are fallible (i.e., measured with error), which is the norm, researchers must choose from among 3 inadequate courses of action: (a) know that the assumption that covariates are perfectly reliable is violated but…
ERIC Educational Resources Information Center
Demir, Ergul
2018-01-01
Purpose: The answer-copying tendency has the potential to detect suspicious answer patterns for prior distributions of statistical detection techniques. The aim of this study is to develop a valid and reliable measurement tool as a scale in order to observe the tendency of university students' copying of answers. Also, it is aimed to provide…
Phase measurement error in summation of electron holography series.
McLeod, Robert A; Bergen, Michael; Malac, Marek
2014-06-01
Off-axis electron holography is a method for the transmission electron microscope (TEM) that measures the electric and magnetic properties of a specimen. The electrostatic and magnetic potentials modulate the electron wavefront phase. The error in measurement of the phase therefore determines the smallest observable changes in electric and magnetic properties. Here we explore the summation of a hologram series to reduce the phase error and thereby improve the sensitivity of electron holography. Summation of hologram series requires independent registration and correction of image drift and phase wavefront drift, the consequences of which are discussed. Optimization of the electro-optical configuration of the TEM for the double biprism configuration is examined. An analytical model of image and phase drift, composed of a combination of linear drift and Brownian random-walk, is derived and experimentally verified. The accuracy of image registration via cross-correlation and phase registration is characterized by simulated hologram series. The model of series summation errors allows the optimization of phase error as a function of exposure time and fringe carrier frequency for a target spatial resolution. An experimental example of hologram series summation is provided on WS2 fullerenes. A metric is provided to measure the object phase error from experimental results and compared to analytical predictions. The ultimate experimental object root-mean-square phase error is 0.006 rad (2π/1050) at a spatial resolution less than 0.615 nm and a total exposure time of 900 s. The ultimate phase error in vacuum adjacent to the specimen is 0.0037 rad (2π/1700). The analytical prediction of phase error differs with the experimental metrics by +7% inside the object and -5% in the vacuum, indicating that the model can provide reliable quantitative predictions. Crown Copyright © 2014. Published by Elsevier B.V. All rights reserved.
Boeschen Hospers, J Mirjam; Smits, Niels; Smits, Cas; Stam, Mariska; Terwee, Caroline B; Kramer, Sophia E
2016-04-01
We reevaluated the psychometric properties of the Amsterdam Inventory for Auditory Disability and Handicap (AIADH; Kramer, Kapteyn, Festen, & Tobi, 1995) using item response theory. Item response theory describes item functioning along an ability continuum. Cross-sectional data from 2,352 adults with and without hearing impairment, ages 18-70 years, were analyzed. They completed the AIADH in the web-based prospective cohort study "Netherlands Longitudinal Study on Hearing." A graded response model was fitted to the AIADH data. Category response curves, item information curves, and the standard error as a function of self-reported hearing ability were plotted. The graded response model showed a good fit. Item information curves were most reliable for adults who reported having hearing disability and less reliable for adults with normal hearing. The standard error plot showed that self-reported hearing ability is most reliably measured for adults reporting mild up to moderate hearing disability. This is one of the few item response theory studies on audiological self-reports. All AIADH items could be hierarchically placed on the self-reported hearing ability continuum, meaning they measure the same construct. This provides a promising basis for developing a clinically useful computerized adaptive test, where item selection adapts to the hearing ability of individuals, resulting in efficient assessment of hearing disability.
Di Stefano, Danilo Alessio; Arosio, Paolo; Gastaldi, Giorgio; Gherlone, Enrico
2017-07-08
Recent research has shown that dynamic parameters correlate with insertion energy-that is, the total work needed to place an implant into its site-might convey more reliable information concerning immediate implant primary stability at insertion than the commonly used insertion torque (IT), the reverse torque (RT), or the implant stability quotient (ISQ). Yet knowledge on these dynamic parameters is still limited. The purpose of this in vitro study was to evaluate whether an energy-related parameter, the torque-depth curve integral (I), could be a reliable measure of primary stability. This was done by assessing if (I) measurement was operator-independent, by investigating its correlation with other known primary stability parameters (IT, RT, or ISQ) by quantifying the (I) average error and correlating (I), IT, RT, and ISQ variations with bone density. Five operators placed 200 implants in polyurethane foam blocks of different densities using a micromotor that calculated the (I) during implant placement. Primary implant stability was assessed by measuring the ISQ, IT, and RT. ANOVA tests were used to evaluate whether measurements were operator independent (P>.05 in all cases). A correlation analysis was performed between (I) and IT, ISQ, and RT. The (I) average error was calculated and compared with that of the other parameters by ANOVA. (I)-density, IT-density, ISQ-density, and RT-density plots were drawn, and their slopes were compared by ANCOVA. The (I) measurements were operator independent and correlated with IT, ISQ, and RT. The average error of these parameters was not significantly different (P>.05 in all cases). The (I)-density, IT-density, ISQ-density, and RT-density curves were linear in the 0.16 to 0.49 g/cm³ range, with the (I)-density curves having a significantly greater slope than those regarding the other parameters (P≤.001 in all cases). The torque-depth curve integral (I) provides a reliable assessment of primary stability and shows a greater sensitivity to density variations than other known primary stability parameters. Copyright © 2017 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.
Use of CCSDS and OSI Protocols on the Advanced Communications Technology Satellite
NASA Technical Reports Server (NTRS)
Chirieleison, Don
1996-01-01
Although ACTS (Advanced Communications Technology Satellite) provides an almost error-free channel during much of the day and under most conditions, there are times when it is not suitable for reliably error-free data communications when operating in the uncoded mode. Because coded operation is not always available to every earth station, measures must be taken in the end system to maintain adequate throughput when transferring data under adverse conditions. The most effective approach that we tested to improve performance was the addition of an 'outer' Reed-Solomon code through use of CCSDS (Consultative Committee for Space Data Systems) GOS 2 (a forward error correcting code). This addition can benefit all users of an ACTS channel including those applications that do not require totally reliable transport, but it is somewhat expensive because additional hardware is needed. Although we could not characterize the link noise statistically (it appeared to resemble uncorrelated white noise, the type that block codes are least effective in correcting), we did find that CCSDS GOS 2 gave an essentially error-free link at BER's (bit error rate) as high as 6x10(exp -4). For users that demand reliable transport, an ARQ (Automatic Repeat Queuing) protocol such as TCP (Transmission Control Protocol) or TP4 (Transport Protocol, Class 4) will probably be used. In this category, it comes as no surprise that the best choice of the protocol suites tested over ACTS was TP4 using CCSDS GOS 2. TP4 behaves very well over an error-free link which GOS 2 provides up to a point. Without forward error correction, however, TP4 service begins to degrade in the 10(exp -7)-10(exp -6) range and by 4x10(exp -6), it barely gives any throughput at all. If Congestion Avoidance is used in TP4, the degradation is even more pronounced. Fortunately, as demonstrated here, this effect can be more than compensated for by choosing the Selective Acknowledgment option. In fact, this option can enable TP4 to deliver some throughput at error rates as high as 10(exp -5).
Bourne, Richard S; Shulman, Rob; Tomlin, Mark; Borthwick, Mark; Berry, Will; Mills, Gary H
2017-04-01
To identify between and within profession-rater reliability of clinical impact grading for common critical care prescribing error and optimisation cases. To identify representative clinical impact grades for each individual case. Electronic questionnaire. 5 UK NHS Trusts. 30 Critical care healthcare professionals (doctors, pharmacists and nurses). Participants graded severity of clinical impact (5-point categorical scale) of 50 error and 55 optimisation cases. Case between and within profession-rater reliability and modal clinical impact grading. Between and within profession rater reliability analysis used linear mixed model and intraclass correlation, respectively. The majority of error and optimisation cases (both 76%) had a modal clinical severity grade of moderate or higher. Error cases: doctors graded clinical impact significantly lower than pharmacists (-0.25; P < 0.001) and nurses (-0.53; P < 0.001), with nurses significantly higher than pharmacists (0.28; P < 0.001). Optimisation cases: doctors graded clinical impact significantly lower than nurses and pharmacists (-0.39 and -0.5; P < 0.001, respectively). Within profession reliability grading was excellent for pharmacists (0.88 and 0.89; P < 0.001) and doctors (0.79 and 0.83; P < 0.001) but only fair to good for nurses (0.43 and 0.74; P < 0.001), for optimisation and error cases, respectively. Representative clinical impact grades for over 100 common prescribing error and optimisation cases are reported for potential clinical practice and research application. The between professional variability highlights the importance of multidisciplinary perspectives in assessment of medication error and optimisation cases in clinical practice and research. © The Author 2017. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Mills, Kathryn; Idris, Aula; Pham, Thu-An; Porte, John; Wiggins, Mark; Kavakli, Manolya
2017-12-18
To determine the validity and reliability of the peak frontal plane knee angle evaluated by a virtual reality (VR) netball game when landing from a drop vertical jump (DVJ). Laboratory Methods: Forty participants performed 3 DVJs evaluated by 3-dimensional (3D) motion analysis and 3 DVJs evaluated by the VR game. Limits of agreement for the peak projected frontal plane knee angle and peak knee abduction were determined. Participants were given a consensus category of "Above threshold" or "Below threshold" based on a pre-specified threshold angle of 9˚ during landing. Classification agreement was determined using kappa coefficient and accuracy was determined using specificity and sensitivity. Ten participants returned 1-week later to determine intra-rater reliability, standard error of the measure and typical error. The mean difference in detected frontal plane knee angle was 3.39˚ (1.03˚, 5.74˚). Limits of agreement were -10.27˚ (-14.36˚, -6.19˚) to 17.05˚ (12.97˚, 21.14˚). Substantial agreement, specificity and sensitivity were observed for the threshold classification (ĸ = 0.66, [0.42, 0.88] specificity= 0.96 [0.78, 1.0], sensitivity= 0.75 [0.43, 0.95]). The game exhibited acceptable reliability over time (ICC (3,1) = 0.844) and error was approximately 2˚. The VR game reliably evaluated a projected frontal plane knee angle. While the knee angle detected by the VR game is strongly related peak knee abduction, the accuracy of detecting the exact angle was limited. A threshold approach may be a more accurate approach for gaming technology to evaluate frontal plane knee angles when landing from a jump.
Rahnama, Leila; Rezasoltani, Asghar; Khalkhali-Zavieh, Minoo; Rahnama, Behnam; Noori-Kochi, Farhang
2015-01-01
OBJECTIVES: This study was conducted with the purpose of evaluating the inter-session reliability of new software to measure the diameters of the cervical multifidus muscle (CMM), both at rest and during isometric contractions of the shoulder abductors in subjects with neck pain and in healthy individuals. METHOD: In the present study, the reliability of measuring the diameters of the CMM with the Sonosynch software was evaluated by using 24 participants, including 12 subjects with chronic neck pain and 12 healthy individuals. The anterior-posterior diameter (APD) and the lateral diameter (LD) of the CMM were measured in a resting state and then repeated during isometric contraction of the shoulder abductors. Measurements were taken on separate occasions 3 to 7 days apart in order to determine inter-session reliability. Intraclass correlation coefficient (ICC), standard error of measurement (SEM), and smallest detectable difference (SDD) were used to evaluate the relative and absolute reliability, respectively. RESULTS: The Sonosynch software has shown to be highly reliable in measuring the diameters of the CMM both in healthy subjects and in those with neck pain. The ICCs 95% CI for APD ranged from 0.84 to 0.94 in subjects with neck pain and from 0.86 to 0.94 in healthy subjects. For LD, the ICC 95% CI ranged from 0.64 to 0.95 in subjects with neck pain and from 0.82 to 0.92 in healthy subjects. CONCLUSIONS: Ultrasonographic measurement of the diameters of the CMM using Sonosynch has proved to be reliable especially for APD in healthy subjects as well as subjects with neck pain. PMID:26443975
Stochastic Models of Human Errors
NASA Technical Reports Server (NTRS)
Elshamy, Maged; Elliott, Dawn M. (Technical Monitor)
2002-01-01
Humans play an important role in the overall reliability of engineering systems. More often accidents and systems failure are traced to human errors. Therefore, in order to have meaningful system risk analysis, the reliability of the human element must be taken into consideration. Describing the human error process by mathematical models is a key to analyzing contributing factors. Therefore, the objective of this research effort is to establish stochastic models substantiated by sound theoretic foundation to address the occurrence of human errors in the processing of the space shuttle.
Newman, Craig G J; Bevins, Adam D; Zajicek, John P; Hodges, John R; Vuillermoz, Emil; Dickenson, Jennifer M; Kelly, Denise S; Brown, Simona; Noad, Rupert F
2018-01-01
Ensuring reliable administration and reporting of cognitive screening tests are fundamental in establishing good clinical practice and research. This study captured the rate and type of errors in clinical practice, using the Addenbrooke's Cognitive Examination-III (ACE-III), and then the reduction in error rate using a computerized alternative, the ACEmobile app. In study 1, we evaluated ACE-III assessments completed in National Health Service (NHS) clinics ( n = 87) for administrator error. In study 2, ACEmobile and ACE-III were then evaluated for their ability to capture accurate measurement. In study 1, 78% of clinically administered ACE-IIIs were either scored incorrectly or had arithmetical errors. In study 2, error rates seen in the ACE-III were reduced by 85%-93% using ACEmobile. Error rates are ubiquitous in routine clinical use of cognitive screening tests and the ACE-III. ACEmobile provides a framework for supporting reduced administration, scoring, and arithmetical error during cognitive screening.
Chiang, Hsin-Yu; Lu, Wen-Shian; Yu, Wan-Hui; Hsueh, I-Ping; Hsieh, Ching-Lin
2018-04-11
To examine the interrater and intrarater reliability of the Balance Computerized Adaptive Test (Balance CAT) in patients with chronic stroke having a wide range of balance functions. Repeated assessments design (1wk apart). Seven teaching hospitals. A pooled sample (N=102) including 2 independent groups of outpatients (n=50 for the interrater reliability study; n=52 for the intrarater reliability study) with chronic stroke. Not applicable. Balance CAT. For the interrater reliability study, the values of intraclass correlation coefficient, minimal detectable change (MDC), and percentage of MDC (MDC%) for the Balance CAT were .84, 1.90, and 31.0%, respectively. For the intrarater reliability study, the values of intraclass correlation coefficient, MDC, and MDC% ranged from .89 to .91, from 1.14 to 1.26, and from 17.1% to 18.6%, respectively. The Balance CAT showed sufficient intrarater reliability in patients with chronic stroke having balance functions ranging from sitting with support to independent walking. Although the Balance CAT may have good interrater reliability, we found substantial random measurement error between different raters. Accordingly, if the Balance CAT is used as an outcome measure in clinical or research settings, same raters are suggested over different time points to ensure reliable assessments. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Impaired limb position sense after stroke: a quantitative test for clinical use.
Carey, L M; Oke, L E; Matyas, T A
1996-12-01
A quantitative measure of wrist position sense was developed to advance clinical measurement of proprioceptive limb sensibility after stroke. Test-retest reliability, normative standards, and ability to discriminate impaired and unimpaired performance were investigated. Retest reliability was assessed over three sessions, and a matched-pairs study compared stroke and unimpaired subjects. Both wrists were tested, in counterbalanced order. Patients were tested in hospital-based rehabilitation units. Reliability was investigated on a consecutive sample of 35 adult stroke patients with a range of proprioceptive discrimination abilities and no evidence of neglect. A consecutive sample of 50 stroke patients and convenience sample of 50 healthy volunteers, matched for age, sex, and hand dominance, were tested in the normative-discriminative study. Age and sex were representative of the adult stroke population. The test required matching of imposed wrist positions using a pointer aligned with the axis of movement and a protractor scale. The test was reliable (r = .88 and .92) and observed changes of 8 degrees can be interpreted, with 95% confidence, as genuine. Scores of healthy volunteers ranged from 3.1 degrees to 10.9 degrees average error. The criterion of impairment was conservatively defined as 11 degrees (+/-4.8 degrees) average error. Impaired and unimpaired performance were well differentiated. Clinicians can confidently and quantitatively sample one aspect of proprioceptive sensibility in stroke patients using the wrist position sense test. Development of tests on other joints using the present approach is supported by our findings.
Deng, Zhimin; Tian, Tianhai
2014-07-29
The advances of systems biology have raised a large number of sophisticated mathematical models for describing the dynamic property of complex biological systems. One of the major steps in developing mathematical models is to estimate unknown parameters of the model based on experimentally measured quantities. However, experimental conditions limit the amount of data that is available for mathematical modelling. The number of unknown parameters in mathematical models may be larger than the number of observation data. The imbalance between the number of experimental data and number of unknown parameters makes reverse-engineering problems particularly challenging. To address the issue of inadequate experimental data, we propose a continuous optimization approach for making reliable inference of model parameters. This approach first uses a spline interpolation to generate continuous functions of system dynamics as well as the first and second order derivatives of continuous functions. The expanded dataset is the basis to infer unknown model parameters using various continuous optimization criteria, including the error of simulation only, error of both simulation and the first derivative, or error of simulation as well as the first and second derivatives. We use three case studies to demonstrate the accuracy and reliability of the proposed new approach. Compared with the corresponding discrete criteria using experimental data at the measurement time points only, numerical results of the ERK kinase activation module show that the continuous absolute-error criteria using both function and high order derivatives generate estimates with better accuracy. This result is also supported by the second and third case studies for the G1/S transition network and the MAP kinase pathway, respectively. This suggests that the continuous absolute-error criteria lead to more accurate estimates than the corresponding discrete criteria. We also study the robustness property of these three models to examine the reliability of estimates. Simulation results show that the models with estimated parameters using continuous fitness functions have better robustness properties than those using the corresponding discrete fitness functions. The inference studies and robustness analysis suggest that the proposed continuous optimization criteria are effective and robust for estimating unknown parameters in mathematical models.
1997-06-27
This is a computer generated model of a ground based casting. The objective of the therophysical properties program is to measure thermal physical properties of commercial casting alloys for use in computer programs that predict soldification behavior. This could reduce trial and error in casting design and promote less scrap, sounder castings, and less weight. In order for the computer models to reliably simulate the details of industrial alloy solidification, the input thermophysical property data must be absolutely reliable. Recently Auburn University and TPRL Inc. formed a teaming relationship to establish reliable measurement techniques for the most critical properties of commercially important alloys: transformation temperatures, thermal conductivity, electrical conductivity, specific heat, latent heat, density, solid fraction evolution, surface tension, and viscosity. A new initiative with the American Foundrymens Society has been started to measure the thermophysical properties of commercial ferrous and non-ferrous casting alloys and make the thermophysical property data widely available. Development of casting processes for the new gamma titanium aluminide alloys as well as existing titanium alloys will remain a trial-and-error procedure until accurate thermophysical properties can be obtained. These molten alloys react with their containers on earth and change their composition - invalidating the measurements even while the data are being acquired in terrestrial laboratories. However, measurements on the molten alloys can be accomplished in space using freely floating droplets which are completely untouched by any container. These data are expected to be exceptionally precise because of the absence of impurity contamination and buoyancy convection effects. Although long duration orbital experiments will be required for the large scale industrial alloy measurement program that results from this research, short duration experiments on NASA's KC-135 low-g aircraft are already providing preliminary data and experience.
NASA Astrophysics Data System (ADS)
GonzáLez, Pablo J.; FernáNdez, José
2011-10-01
Interferometric Synthetic Aperture Radar (InSAR) is a reliable technique for measuring crustal deformation. However, despite its long application in geophysical problems, its error estimation has been largely overlooked. Currently, the largest problem with InSAR is still the atmospheric propagation errors, which is why multitemporal interferometric techniques have been successfully developed using a series of interferograms. However, none of the standard multitemporal interferometric techniques, namely PS or SB (Persistent Scatterers and Small Baselines, respectively) provide an estimate of their precision. Here, we present a method to compute reliable estimates of the precision of the deformation time series. We implement it for the SB multitemporal interferometric technique (a favorable technique for natural terrains, the most usual target of geophysical applications). We describe the method that uses a properly weighted scheme that allows us to compute estimates for all interferogram pixels, enhanced by a Montecarlo resampling technique that properly propagates the interferogram errors (variance-covariances) into the unknown parameters (estimated errors for the displacements). We apply the multitemporal error estimation method to Lanzarote Island (Canary Islands), where no active magmatic activity has been reported in the last decades. We detect deformation around Timanfaya volcano (lengthening of line-of-sight ˜ subsidence), where the last eruption in 1730-1736 occurred. Deformation closely follows the surface temperature anomalies indicating that magma crystallization (cooling and contraction) of the 300-year shallow magmatic body under Timanfaya volcano is still ongoing.
Johannsen, Finn; Jensen, Signe; Stallknecht, Sandra E; Olsen, Lars Otto; Magnusson, S Peter
2016-10-01
To determine intra- and interobserver reliability and precision of sonographic (US) scanning in measuring thickness of the Achilles tendon, plantar fascia, and heel fat pad in patients with heel pain. Seventeen consecutive patients referred with heel pain were included. Two evaluators blinded to the diagnosis performed independently US scanning of both feet without any dialogue with the patient. The examiner left the room, and the next examiner entered. All patients had two US scans performed by each examiner. Two months later, the US images were randomly presented to the evaluators for measurements. Reliability and agreement were assessed by calculation of intraclass correlation coefficient (ICC), 95% limits of agreement (LOA), and typical error (TE). LOA was calculated as a percentage of the mean thickness of each structure to obtain a unitless parameter. We found excellent intratester reliability (ICC 0.78-0.98) and good intertester reliability using one measurement (ICC 0.72-0.91) and excellent (ICC 0.85-0.95) when using average of two measurements. The intratester agreements were good with LOA: 9.5-23.4% and TE: 3.4-8.4%. The intertester agreements were acceptable using one measurement with LOA: 16.1-36.4%, and better using two measurements with LOA: 14.4-33.2%. US is a reliable technique of measurement in the daily clinic, and one single measurement is sufficient. In research, we recommend that the same observer performs the US measurements, if one single scanning is preferred; if more researchers are involved, the average measurement of two US scans is recommended. © 2016 Wiley Periodicals, Inc. J Clin Ultrasound 44:480-486, 2016. © 2016 Wiley Periodicals, Inc.
Stochastic estimation of plant-available soil water under fluctuating water table depths
NASA Astrophysics Data System (ADS)
Or, Dani; Groeneveld, David P.
1994-12-01
Preservation of native valley-floor phreatophytes while pumping groundwater for export from Owens Valley, California, requires reliable predictions of plant water use. These predictions are compared with stored soil water within well field regions and serve as a basis for managing groundwater resources. Soil water measurement errors, variable recharge, unpredictable climatic conditions affecting plant water use, and modeling errors make soil water predictions uncertain and error-prone. We developed and tested a scheme based on soil water balance coupled with implementation of Kalman filtering (KF) for (1) providing physically based soil water storage predictions with prediction errors projected from the statistics of the various inputs, and (2) reducing the overall uncertainty in both estimates and predictions. The proposed KF-based scheme was tested using experimental data collected at a location on the Owens Valley floor where the water table was artificially lowered by groundwater pumping and later allowed to recover. Vegetation composition and per cent cover, climatic data, and soil water information were collected and used for developing a soil water balance. Predictions and updates of soil water storage under different types of vegetation were obtained for a period of 5 years. The main results show that: (1) the proposed predictive model provides reliable and resilient soil water estimates under a wide range of external conditions; (2) the predicted soil water storage and the error bounds provided by the model offer a realistic and rational basis for decisions such as when to curtail well field operation to ensure plant survival. The predictive model offers a practical means for accommodating simple aspects of spatial variability by considering the additional source of uncertainty as part of modeling or measurement uncertainty.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gopan, O; Kalet, A; Smith, W
2016-06-15
Purpose: A standard tool for ensuring the quality of radiation therapy treatments is the initial physics plan review. However, little is known about its performance in practice. The goal of this study is to measure the effectiveness of physics plan review by introducing simulated errors into “mock” treatment plans and measuring the performance of plan review by physicists. Methods: We generated six mock treatment plans containing multiple errors. These errors were based on incident learning system data both within the department and internationally (SAFRON). These errors were scored for severity and frequency. Those with the highest scores were included inmore » the simulations (13 errors total). Observer bias was minimized using a multiple co-correlated distractor approach. Eight physicists reviewed these plans for errors, with each physicist reviewing, on average, 3/6 plans. The confidence interval for the proportion of errors detected was computed using the Wilson score interval. Results: Simulated errors were detected in 65% of reviews [51–75%] (95% confidence interval [CI] in brackets). The following error scenarios had the highest detection rates: incorrect isocenter in DRRs/CBCT (91% [73–98%]) and a planned dose different from the prescribed dose (100% [61–100%]). Errors with low detection rates involved incorrect field parameters in record and verify system (38%, [18–61%]) and incorrect isocenter localization in planning system (29% [8–64%]). Though pre-treatment QA failure was reliably identified (100%), less than 20% of participants reported the error that caused the failure. Conclusion: This is one of the first quantitative studies of error detection. Although physics plan review is a key safety measure and can identify some errors with high fidelity, others errors are more challenging to detect. This data will guide future work on standardization and automation. Creating new checks or improving existing ones (i.e., via automation) will help in detecting those errors with low detection rates.« less
Target thrust measurement for applied-field magnetoplasmadynamic thruster
NASA Astrophysics Data System (ADS)
Wang, B.; Yang, W.; Tang, H.; Li, Z.; Kitaeva, A.; Chen, Z.; Cao, J.; Herdrich, G.; Zhang, K.
2018-07-01
In this paper, we present a flat target thrust stand which is designed to measure the thrust of a steady-state applied-field magnetoplasmadynamic thruster (AF-MPDT). In our experiments we varied target-thruster distances and target size to analyze their influence on the target thrust measurement results. The obtained thrust-distance curves increase to local maximum and then decreases with the increasing distance, which means that the plume of the AF-MPDT can still accelerate outside the thruster exit. The peak positions are related to the target sizes: larger targets can make the peak positions further from the thruster and decrease the measurement errors. To further improve the reliability of measurement results, a thermal equilibrium assumption combined with Knudsen’s cosine law is adapted to analyze the error caused by the back stream of plume particles. Under the assumption, the error caused by particle backflow is no more than 3.6% and the largest difference between the measured thrust and the theoretical thrust is 14%. Moreover, it was verified that target thrust measurement can disturb the working of the AF-MPD thruster, and the influence on the thrust measurement result is no more than 1% in our experiment.
ERIC Educational Resources Information Center
Sheehan, Dwayne P.; Lafave, Mark R.; Katz, Larry
2011-01-01
This study was designed to test the intra- and inter-rater reliability of the University of North Carolina's Balance Error Scoring System in 9- and 10-year-old children. Additionally, a modified version of the Balance Error Scoring System was tested to determine if it was more sensitive in this population ("raw scores"). Forty-six…
Ruediger, T M; Allison, S C; Moore, J M; Wainner, R S
2014-09-01
The purposes of this descriptive and exploratory study were to examine electrophysiological measures of ulnar sensory nerve function in disease free adults to determine reliability, determine reference values computed with appropriate statistical methods, and examine predictive ability of anthropometric variables. Antidromic sensory nerve conduction studies of the ulnar nerve using surface electrodes were performed on 100 volunteers. Reference values were computed from optimally transformed data. Reliability was computed from 30 subjects. Multiple linear regression models were constructed from four predictor variables. Reliability was greater than 0.85 for all paired measures. Responses were elicited in all subjects; reference values for sensory nerve action potential (SNAP) amplitude from above elbow stimulation are 3.3 μV and decrement across-elbow less than 46%. No single predictor variable accounted for more than 15% of the variance in the response. Electrophysiologic measures of the ulnar sensory nerve are reliable. Absent SNAP responses are inconsistent with disease free individuals. Reference values recommended in this report are based on appropriate transformations of non-normally distributed data. No strong statistical model of prediction could be derived from the limited set of predictor variables. Reliability analyses combined with relatively low level of measurement error suggest that ulnar sensory reference values may be used with confidence. Copyright © 2014 Elsevier Masson SAS. All rights reserved.
Smile line assessment comparing quantitative measurement and visual estimation.
Van der Geld, Pieter; Oosterveld, Paul; Schols, Jan; Kuijpers-Jagtman, Anne Marie
2011-02-01
Esthetic analysis of dynamic functions such as spontaneous smiling is feasible by using digital videography and computer measurement for lip line height and tooth display. Because quantitative measurements are time-consuming, digital videography and semiquantitative (visual) estimation according to a standard categorization are more practical for regular diagnostics. Our objective in this study was to compare 2 semiquantitative methods with quantitative measurements for reliability and agreement. The faces of 122 male participants were individually registered by using digital videography. Spontaneous and posed smiles were captured. On the records, maxillary lip line heights and tooth display were digitally measured on each tooth and also visually estimated according to 3-grade and 4-grade scales. Two raters were involved. An error analysis was performed. Reliability was established with kappa statistics. Interexaminer and intraexaminer reliability values were high, with median kappa values from 0.79 to 0.88. Agreement of the 3-grade scale estimation with quantitative measurement showed higher median kappa values (0.76) than the 4-grade scale estimation (0.66). Differentiating high and gummy smile lines (4-grade scale) resulted in greater inaccuracies. The estimation of a high, average, or low smile line for each tooth showed high reliability close to quantitative measurements. Smile line analysis can be performed reliably with a 3-grade scale (visual) semiquantitative estimation. For a more comprehensive diagnosis, additional measuring is proposed, especially in patients with disproportional gingival display. Copyright © 2011 American Association of Orthodontists. Published by Mosby, Inc. All rights reserved.
Chen, Qi; Chen, Quan; Luo, Xiaobing
2014-09-01
In recent years, due to the fast development of high power light-emitting diode (LED), its lifetime prediction and assessment have become a crucial issue. Although the in situ measurement has been widely used for reliability testing in laser diode community, it has not been applied commonly in LED community. In this paper, an online testing method for LED life projection under accelerated reliability test was proposed and the prototype was built. The optical parametric data were collected. The systematic error and the measuring uncertainty were calculated to be within 0.2% and within 2%, respectively. With this online testing method, experimental data can be acquired continuously and sufficient amount of data can be gathered. Thus, the projection fitting accuracy can be improved (r(2) = 0.954) and testing duration can be shortened.
Within-Session Stability of Short-Term Heart Rate Variability Measurement
2016-01-01
Abstract The primary aim of this study was to assess the retest stability of the short-term heart rate variability (HRV) measurement performed within one session and without the use of any intervention. Additionally, a precise investigation of the possible impact of intrinsic biological variation on HRV reliability was also performed. First, a single test-retest HRV measurement was conducted with 20-30 min apart from one another. Second, the HRV measurement was repeated in ten non-interrupted consecutive intervals. The lowest typical error (CV = 21.1%) was found for the square root of the mean squared differences of successive RR intervals (rMSSD) and the highest for the low frequency power (PLF) (CV = 93.9%). The standardized changes in the mean were trivial to small. The correlation analysis revealed the highest level for ln rMSSD (ICC = 0.87), while ln PLF represented the worst case (ICC = 0.59). The reliability indices for ln rMSSD in 10 consecutive intervals improved (CV = 9.9%; trivial standardized changes in the mean; ICC = 0.96). In conclusion, major differences were found in the reliability level between the HRV indices. The rMSSD demonstrated the highest reliability level. No substantial influence of intrinsic biological variation on the HRV reliability was observed. PMID:28149345
Peri, Elisabetta; Ambrosini, Emilia; Colombo, Vera Maria; van de Ruit, Mark; Grey, Michael J; Monticone, Marco; Ferriero, Giorgio; Pedrocchi, Alessandra; Ferrigno, Giancarlo; Ferrante, Simona
2017-01-01
The clinical use of Transcranial Magnetic Stimulation (TMS) as a technique to assess corticospinal excitability is limited by the time for data acquisition and the measurement variability. This study aimed at evaluating the reliability of Stimulus-Response (SR) curves acquired with a recently proposed rapid protocol on tibialis anterior muscle of healthy older adults. Twenty-four neurologically-intact adults (age:55-75 years) were recruited for this test-retest study. During each session, six SR curves, 3 at rest and 3 during isometric muscle contractions at 5% of maximum voluntary contraction (MVC), were acquired. Motor Evoked Potentials (MEPs) were normalized to the maximum peripherally evoked response; the coil position and orientation were monitored with an optical tracking system. Intra- and inter-session reliability of motor threshold (MT), area under the curve (AURC), MEPmax, stimulation intensity at which the MEP is mid-way between MEPmax and MEPmin (I50), slope in I50, MEP latency, and silent period (SP) were assessed in terms of Standard Error of Measurement (SEM), relative SEM, Minimum Detectable Change (MDC), and Intraclass Correlation Coefficient (ICC). The relative SEM was ≤10% for MT, I50, latency and SP both at rest and 5%MVC, while it ranged between 11% and 37% for AURC, MEPmax, and slope. MDC values were overall quite large; e.g., MT required a change of 12%MSO at rest and 10%MSO at 5%MVC to be considered a real change. Inter-sessions ICC were >0.6 for all measures but slope at rest and MEPmax and latency at 5%MVC. Measures derived from SR curves acquired in <4 minutes are affected by similar measurement errors to those found with long-lasting protocols, suggesting that the rapid method is at least as reliable as the traditional methods. As specifically designed to include older adults, this study provides normative data for future studies involving older neurological patients (e.g. stroke survivors).
Reliability and validity of two isometric squat tests.
Blazevich, Anthony J; Gill, Nicholas; Newton, Robert U
2002-05-01
The purpose of the present study was first to examine the reliability of isometric squat (IS) and isometric forward hack squat (IFHS) tests to determine if repeated measures on the same subjects yielded reliable results. The second purpose was to examine the relation between isometric and dynamic measures of strength to assess validity. Fourteen male subjects performed maximal IS and IFHS tests on 2 occasions and 1 repetition maximum (1-RM) free-weight squat and forward hack squat (FHS) tests on 1 occasion. The 2 tests were found to be highly reliable (intraclass correlation coefficient [ICC](IS) = 0.97 and ICC(IFHS) = 1.00). There was a strong relation between average IS and 1-RM squat performance, and between IFHS and 1-RM FHS performance (r(squat) = 0.77, r(FHS) = 0.76; p < 0.01), but a weak relation between squat and FHS test performances (r < 0.55). There was also no difference between observed 1-RM values and those predicted by our regression equations. Errors in predicting 1-RM performance were in the order of 8.5% (standard error of the estimate [SEE] = 13.8 kg) and 7.3% (SEE = 19.4 kg) for IS and IFHS respectively. Correlations between isometric and 1-RM tests were not of sufficient size to indicate high validity of the isometric tests. Together the results suggest that IS and IFHS tests could detect small differences in multijoint isometric strength between subjects, or performance changes over time, and that the scores in the isometric tests are well related to 1-RM performance. However, there was a small error when predicting 1-RM performance from isometric performance, and these tests have not been shown to discriminate between small changes in dynamic strength. The weak relation between squat and FHS test performance can be attributed to differences in the movement patterns of the tests
NASA Astrophysics Data System (ADS)
Shedekar, Vinayak S.; King, Kevin W.; Fausey, Norman R.; Soboyejo, Alfred B. O.; Harmel, R. Daren; Brown, Larry C.
2016-09-01
Three different models of tipping bucket rain gauges (TBRs), viz. HS-TB3 (Hydrological Services Pty Ltd.), ISCO-674 (Isco, Inc.) and TR-525 (Texas Electronics, Inc.), were calibrated in the lab to quantify measurement errors across a range of rainfall intensities (5 mm·h- 1 to 250 mm·h- 1) and three different volumetric settings. Instantaneous and cumulative values of simulated rainfall were recorded at 1, 2, 5, 10 and 20-min intervals. All three TBR models showed a substantial deviation (α = 0.05) in measurements from actual rainfall depths, with increasing underestimation errors at greater rainfall intensities. Simple linear regression equations were developed for each TBR to correct the TBR readings based on measured intensities (R2 > 0.98). Additionally, two dynamic calibration techniques, viz. quadratic model (R2 > 0.7) and T vs. 1/Q model (R2 = > 0.98), were tested and found to be useful in situations when the volumetric settings of TBRs are unknown. The correction models were successfully applied to correct field-collected rainfall data from respective TBR models. The calibration parameters of correction models were found to be highly sensitive to changes in volumetric calibration of TBRs. Overall, the HS-TB3 model (with a better protected tipping bucket mechanism, and consistent measurement errors across a range of rainfall intensities) was found to be the most reliable and consistent for rainfall measurements, followed by the ISCO-674 (with susceptibility to clogging and relatively smaller measurement errors across a range of rainfall intensities) and the TR-525 (with high susceptibility to clogging and frequent changes in volumetric calibration, and highly intensity-dependent measurement errors). The study demonstrated that corrections based on dynamic and volumetric calibration can only help minimize-but not completely eliminate the measurement errors. The findings from this study will be useful for correcting field data from TBRs; and may have major implications to field- and watershed-scale hydrologic studies.
A cascaded coding scheme for error control and its performance analysis
NASA Technical Reports Server (NTRS)
Lin, Shu; Kasami, Tadao; Fujiwara, Tohru; Takata, Toyoo
1986-01-01
A coding scheme is investigated for error control in data communication systems. The scheme is obtained by cascading two error correcting codes, called the inner and outer codes. The error performance of the scheme is analyzed for a binary symmetric channel with bit error rate epsilon <1/2. It is shown that if the inner and outer codes are chosen properly, extremely high reliability can be attained even for a high channel bit error rate. Various specific example schemes with inner codes ranging form high rates to very low rates and Reed-Solomon codes as inner codes are considered, and their error probabilities are evaluated. They all provide extremely high reliability even for very high bit error rates. Several example schemes are being considered by NASA for satellite and spacecraft down link error control.
Hincapie, Ana L; Slack, Marion; Malone, Daniel C; MacKinnon, Neil J; Warholak, Terri L
2016-01-01
Patients may be the most reliable reporters of some aspects of the health care process; their perspectives should be considered when pursuing changes to improve patient safety. The authors evaluated the association between patients' perceived health care quality and self-reported medical, medication, and laboratory errors in a multinational sample. The analysis was conducted using the 2010 Commonwealth Fund International Health Policy Survey, a multinational consumer survey conducted in 11 countries. Quality of care was measured by a multifaceted construct developed using Rasch techniques. After adjusting for potentially important confounding variables, an increase in respondents' perceptions of care coordination decreased the odds of self-reporting medical errors, medication errors, and laboratory errors (P < .001). As health care stakeholders continue to search for initiatives that improve care experiences and outcomes, this study's results emphasize the importance of guaranteeing integrated care.
Roghani, Taybeh; Khalkhali Zavieh, Minoo; Rahimi, Abbas; Talebian, Saeed; Manshadi, Farideh Dehghan; Akbarzadeh Baghban, Alireza; King, Nicole; Katzman, Wendy
2018-01-25
The purpose of this study was to investigate the intra-rater reliability and validity of a designed load cell setup for the measurement of back extensor muscle force and endurance. The study sample included 19 older women with hyperkyphosis, mean age 67.0 ± 5.0 years, and 14 older women without hyperkyphosis, mean age 63.0 ± 6.0 years. Maximum back extensor force and endurance were measured in a sitting position with a designed load cell setup. Tests were performed by the same examiner on two separate days within a 72-hour interval. The intra-rater reliability of the measurements was analyzed using intraclass correlation coefficient (ICC), standard errors of measurement (SEM), and minimal detectable change (MDC). The validity of the setup was determined using Pearson correlation analysis and independent t-test. Using our designed load cell, the values of ICC indicated very high reliability of force measurement (hyperkyphosis group: 0.96, normal group: 0.97) and high reliability of endurance measurement (hyperkyphosis group: 0.82, normal group: 0.89). For all tests, the values of SEM and MDC were low in both groups. A significant correlation between two documented forces (load cell force and target force) and significant differences in the muscle force and endurance among the two groups were found. The measurements of static back muscle force and endurance are reliable and valid with our designed setup in older women with and without hyperkyphosis.
Reliability and Minimum Detectable Change of the Gait Deviation Index (GDI) in post-stroke patients.
Correa, Katren Pedroso; Devetak, Gisele Francini; Martello, Suzane Ketlyn; de Almeida, Juliana Carla; Pauleto, Ana Carolina; Manffra, Elisangela Ferretti
2017-03-01
The Gait Deviation Index (GDI) is a summary measure that provides a global picture of gait kinematic data. Since the ability to walk is critical for post-stroke patients, the aim of this study was to determine the reliability and Minimum Detectable Change (MDC) of the GDI in this patient population. Twenty post-stroke patients (11 males, 9 females; mean age, 55.2±9.9years) participated in this study. Patients presented with either right- (n=14) or left-sided (n=6) hemiparesis. Kinematic gait data were collected in two sessions (test and retest) that were 2 to 7days apart. GDI values in the first and second sessions were, respectively, 59.0±8.1 and 60.2±9.4 for the paretic limb and 53.3±8.3 and 53.4±8.3 for the non-paretic limb. The reliability in each session was determined by the intra-class correlation coefficient (ICC) of three strides and, in the test session, their values were 0.91 and 0.97 for the paretic and non-paretic limbs, respectively. Between-session reliability and MDC were determined using the average GDI of three strides from each session. For the paretic limb, between-session ICC, standard error of measurement (SEM), and MDC were 0.84, 3.4 and 9.4, respectively. Non paretic lower limb exhibited between-session ICC, standard error of measurement (SEM), and MDC of 0.89, 2.7 and 7.5, respectively. These MDC values indicate that very large changes in GDI are required to identify gait improvement. Therefore, the clinical usefulness of GDI with stroke patients is questionable. Copyright © 2017 Elsevier B.V. All rights reserved.
SU-D-209-01: Can Fluoroscopic Air-Kerma Rates Be Reliably Measured with Solid-State Meters?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feng, C; Thai, L; Wagner, L
Purpose: Ionization chambers remain the standard for calibration of air-kerma rate measuring devices. Despite their strong energy-dependent response, solid state radiation detectors are increasingly used, primarily due to their efficiency in making standardized measurements. To test the reliability of these devices in measuring air-kerma rates, we compared ion chambers measurements with solid-state measurements for various mobile fluoroscopes operated at different beam qualities and air-kerma rates. Methods: Six mobile fluoroscopes (GE OEC models 9800 and 9900) were used to generate test beams. Using various field sizes and dose rate controls, copper attenuators and a lead attenuator were placed at the imagemore » receptor in varying combinations to generate a range of air-kerma rates. Air-kerma rates at 30 centimeters from the image receptors were measured using two 6-cm{sup 3} ion chambers with electrometers (Radcal, models 1015 and 9015) and two with solid state detectors (Unfors Xi and Raysafe X2). No error messages occurred during measurements. However, about two months later, one solid-state device stopped working and was replaced by the manufacturer. Two out of six mobile fluoroscopic units were retested with the replacement unit. Results: Generally, solid state and ionization chambers agreed favorably well, with two exceptions. Before replacement of the detector, the Xi meter when set in the “RF High” mode deviated from ion chamber readings by factors of 2 and 10 with no message indicating error in measurement. When set in the “RF Low” mode, readings were within −4% to +3%. The replacement Xi detector displayed messages alerting the user when settings were not compatible with air-kerma rates. Conclusion: Air-kerma rates can be measured favorably well using solid-state devices, but users must be aware of the possibility that readings can be grossly in error with no discernible indication for the deviation.« less
Javanshir, Khodabakhsh; Mohseni-Bandpei, Mohammad Ali; Rezasoltani, Asghar; Amiri, Mohsen; Rahgozar, Mehdi
2011-01-01
In this study, the reliability of the longus colli muscle (LCM) size was assessed in a relaxed state by a real time ultrasonography (US) device in a group of healthy subjects and a group of patients with chronic neck pain. Fifteen healthy subjects (19-41 years old) and 10 patients with chronic neck pain (27-44 years old) were recruited for the purpose of this study. LCM size was measured at the level of thyroid cartilage. Two images were taken on the same day with an hour interval to assess the within day reliability and the third image was taken 1 week later to determine between days reliability. Cross sectional area (CSA), anterior posterior dimension (APD), and lateral dimension (LD) were measured each time. The shape ratio was calculated as LD/APD. Intraclass correlation coefficients (ICC) and standard error of measurement (SEM) were computed for data analysis. The ICC of left and right CSA for within day and between days reliability in healthy subjects were (0.90, 0.93) and (0.85, 0.82), respectively. The ICC of left and right CSA for within day and between days reliability in patients with neck pain were (0.86, 0.82) and (0.76, 0.81), respectively. The results indicated that US could be used as a reliable tool to measure the LCM dimensions in healthy subjects and patients with chronic neck pain. Copyright © 2009 Elsevier Ltd. All rights reserved.
Component Analysis of Errors on PERSIANN Precipitation Estimates over Urmia Lake Basin, IRAN
NASA Astrophysics Data System (ADS)
Ghajarnia, N.; Daneshkar Arasteh, P.; Liaghat, A. M.; Araghinejad, S.
2016-12-01
In this study, PERSIANN daily dataset is evaluated from 2000 to 2011 in 69 pixels over Urmia Lake basin in northwest of Iran. Different analytical approaches and indexes are used to examine PERSIANN precision in detection and estimation of rainfall rate. The residuals are decomposed into Hit, Miss and FA estimation biases while continues decomposition of systematic and random error components are also analyzed seasonally and categorically. New interpretation of estimation accuracy named "reliability on PERSIANN estimations" is introduced while the changing manners of existing categorical/statistical measures and error components are also seasonally analyzed over different rainfall rate categories. This study yields new insights into the nature of PERSIANN errors over Urmia lake basin as a semi-arid region in the middle-east, including the followings: - The analyzed contingency table indexes indicate better detection precision during spring and fall. - A relatively constant level of error is generally observed among different categories. The range of precipitation estimates at different rainfall rate categories is nearly invariant as a sign for the existence of systematic error. - Low level of reliability is observed on PERSIANN estimations at different categories which are mostly associated with high level of FA error. However, it is observed that as the rate of precipitation increase, the ability and precision of PERSIANN in rainfall detection also increases. - The systematic and random error decomposition in this area shows that PERSIANN has more difficulty in modeling the system and pattern of rainfall rather than to have bias due to rainfall uncertainties. The level of systematic error also considerably increases in heavier rainfalls. It is also important to note that PERSIANN error characteristics at each season varies due to the condition and rainfall patterns of that season which shows the necessity of seasonally different approach for the calibration of this product. Overall, we believe that different error component's analysis performed in this study, can substantially help any further local studies for post-calibration and bias reduction of PERSIANN estimations.
Quantitative measurement of hypertrophic scar: interrater reliability and concurrent validity.
Nedelec, Bernadette; Correa, José A; Rachelska, Grazyna; Armour, Alexis; LaSalle, Léo
2008-01-01
Research into the pathophysiology and treatment of hypertrophic scar (HSc) remains limited by the heterogeneity of scar and the imprecision with which its severity is measured. The objective of this study was to test the interrater reliability and concurrent validity of the Cutometer measurement of elasticity, the Mexameter measurement of erythema and pigmentation, and total thickness measure of the DermaScan C relative to the modified Vancouver Scar Scale (mVSS) in patient-matched normal skin, normal scar, and HSc. Three independent investigators evaluated 128 sites (severe HSc, moderate or mild HSc, donor site, and normal skin) on 32 burn survivors using all of the above measurement tools. The intraclass correlation coefficient, which was used to measure interrater reliability, reflects the inherent amount of error in the measure and is considered acceptable when it is >0.75. Interrater reliability of the totals of the height, pliability, and vascularity subscales of the mVSS fell below the acceptable limit ( congruent with0.50). The individual subscales of the mVSS fell well below the acceptable level (< or =0.3). The Cutometer reading of elasticity provided acceptable reliability (>0.89) for each study site with the exception of severe scar. Mexameter and DermaScan C reliability measurements were acceptable for all sites (>0.82). Concurrent validity correlations with the mVSS were significant except for the comparison of the mVSS pliability subscale and the Cutometer maximum deformation measure comparison in severe scar. In conclusion, the Mexameter and DermaScan C measurements of scar color and thickness of all sites, as well as the Cutometer measurement of elasticity in all but the most severe scars shows high interrater reliability. Their significant concurrent validity with the mVSS confirms that these tools are measuring the same traits as the mVSS, and in a more objective way.
Do CAS measurements correlate with EOS 3D alignment measurements in primary TKA?
Meijer, Marrigje F; Boerboom, Alexander L; Bulstra, Sjoerd K; Reininga, Inge H F; Stevens, Martin
2017-09-01
Objective of this study was to compare intraoperative computer-assisted surgery (CAS) alignment measurements during total knee arthroplasty (TKA) with pre- and postoperative coronal alignment measurements using EOS 3D reconstructions. In a prospective study, 56 TKAs using imageless CAS were performed and coronal alignment measurements were recorded twice: before bone cuts were made and after implantation of the prosthesis. Pre- and postoperative coronal alignment measurements were performed using EOS 3D reconstructions. Thanks to the EOS radiostereography system, measurement errors due to malpositioning and deformity during acquisition are eliminated. CAS measurements were compared with EOS 3D reconstructions. Varus/valgus angle (VV), mechanical lateral distal femoral angle (mLDFA) and mechanical medial proximal tibial angle (mMPTA) were measured. Significantly different VV angles were measured pre- and postoperatively with CAS compared to EOS. For preoperative measurements, mLDFA did not differ significantly, but a significantly larger mMPTA in valgus was measured with CAS. Results of this study indicate that differences in alignment measurements between CAS measurements and pre- and postoperative EOS 3D are due mainly to the difference between weight-bearing and non-weight-bearing position and potential errors in validity and reliability of the CAS system. EOS 3D measurements overestimate VV angle in lower limbs with substantial mechanical axis deviation. For lower limbs with minor mechanical axis deviation as well as for mMPTA measurements, CAS measures more valgus than EOS. Eventually the results of this study are of clinical relevance, since it raises concerns regarding the validity and reliability of CAS systems in TKA. IIb.
Validity of mail survey data on bagged waterfowl
Atwood, E.L.
1956-01-01
Knowledge of the pattern of occurrence and characteristics of response errors obtained during an investigation of the validity of post-season surveys of hunters was used to advantage to devise a two-step method for removing the response-bias errors from the raw survey data. The method was tested on data with known errors and found to have a high efficiency in reducing the effect of response-bias errors. The development of this method for removing the effect of the response-bias errors, and its application to post-season hunter-take survey data, increased the reliability of the data from below the point of practical management significance up to the approximate reliability limits corresponding to the sampling errors.
An experiment in software reliability
NASA Technical Reports Server (NTRS)
Dunham, J. R.; Pierce, J. L.
1986-01-01
The results of a software reliability experiment conducted in a controlled laboratory setting are reported. The experiment was undertaken to gather data on software failures and is one in a series of experiments being pursued by the Fault Tolerant Systems Branch of NASA Langley Research Center to find a means of credibly performing reliability evaluations of flight control software. The experiment tests a small sample of implementations of radar tracking software having ultra-reliability requirements and uses n-version programming for error detection, and repetitive run modeling for failure and fault rate estimation. The experiment results agree with those of Nagel and Skrivan in that the program error rates suggest an approximate log-linear pattern and the individual faults occurred with significantly different error rates. Additional analysis of the experimental data raises new questions concerning the phenomenon of interacting faults. This phenomenon may provide one explanation for software reliability decay.
Reliability of a single objective measure in assessing sleepiness.
Sunwoo, Bernie Y; Jackson, Nicholas; Maislin, Greg; Gurubhagavatula, Indira; George, Charles F; Pack, Allan I
2012-01-01
To evaluate reliability of single objective tests in assessing sleepiness. Subjects who completed polysomnography underwent a 4-nap multiple sleep latency test (MSLT) the following day. Prior to each nap opportunity on MSLT, subjects performed the psychomotor vigilance test (PVT) and divided attention driving task (DADT). Results of single versus multiple test administrations were compared using the intraclass correlation coefficient (ICC) and adjusted for test administration order effects to explore time of day effects. Measures were explored as continuous and binary (i.e., impaired or not impaired). Community-based sample evaluated at a tertiary, university-based sleep center. 372 adult commercial vehicle operators oversampled for increased obstructive sleep apnea risk. N/A. AS CONTINUOUS MEASURES, ICC WERE AS FOLLOWS: MSLT 0.45, PVT median response time 0.69, PVT number of lapses 0.51, 10-min DADT tracking error 0.87, 20-min DADT tracking error 0.90. Based on binary outcomes, ICC were: MSLT 0.63, PVT number of lapses 0.85, 10-min DADT 0.95, 20-min DADT 0.96. Statistically significant time of day effects were seen in both the MSLT and PVT but not the DADT. Correlation between ESS and different objective tests was strongest for MSLT, range [-0.270 to -0.195] and persisted across all time points. Single DADT and PVT administrations are reliable measures of sleepiness. A single MSLT administration can reasonably discriminate individuals with MSL < 8 minutes. These results support the use of a single administration of some objective tests of sleepiness when performed under controlled conditions in routine clinical care.
Williams, Valerie J; Piva, Sara R; Irrgang, James J; Crossley, Chad; Fitzgerald, G Kelley
2012-08-01
Secondary analysis, pretreatment-posttreatment observational study. To compare the reliability and responsiveness of the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), the Knee Outcome Survey activities of daily living subscale (KOS-ADL), and the Lower Extremity Functional Scale (LEFS) in individuals with knee osteoarthritis (OA). The WOMAC is the current standard in patient-reported measures of function in patients with knee OA. The KOS-ADL and LEFS were designed for potential use in patients with knee OA. If the KOS-ADL and LEFS are to be considered viable alternatives to the WOMAC for measuring patient-reported function in individuals with knee OA, they should have measurement properties comparable to the WOMAC. It would also be important to determine whether either of these instruments may be superior to the WOMAC in terms of reliability or responsiveness in this population. Data from 168 subjects with knee OA, who participated in a rehabilitation program, were used in the analyses. Reliability and responsiveness of each outcome measure were estimated at follow-ups of 2, 6, and 12 months. Reliability was estimated by calculating the intraclass correlation coefficient (ICC2,1) for subjects who were unchanged in status from baseline at each follow-up time, based on a global rating of change score. To examine responsiveness, the standard error of the measurement, minimal detectable change, minimal clinically important difference, and the Guyatt responsiveness index were calculated for each outcome measure at each follow-up time. All 3 outcome measures demonstrated reasonable reliability and responsiveness to change. Reliability and responsiveness tended to decrease somewhat with increasing follow-up time. There were no substantial differences between outcome measures for reliability or any of the 3 measures of responsiveness at any follow-up time. The results do not indicate that one outcome measure is more reliable or responsive than another when applied to subjects with knee OA. We believe that all 3 instruments are appropriate outcome measures to examine change in functional status of patients with knee OA.
Attia, A; Dhahbi, W; Chaouachi, A; Padulo, J; Wong, D P; Chamari, K
2017-03-01
Common methods to estimate vertical jump height (VJH) are based on the measurements of flight time (FT) or vertical reaction force. This study aimed to assess the measurement errors when estimating the VJH with flight time using photocell devices in comparison with the gold standard jump height measured by a force plate (FP). The second purpose was to determine the intrinsic reliability of the Optojump photoelectric cells in estimating VJH. For this aim, 20 subjects (age: 22.50±1.24 years) performed maximal vertical jumps in three modalities in randomized order: the squat jump (SJ), counter-movement jump (CMJ), and CMJ with arm swing (CMJarm). Each trial was simultaneously recorded by the FP and Optojump devices. High intra-class correlation coefficients (ICCs) for validity (0.98-0.99) and low limits of agreement (less than 1.4 cm) were found; even a systematic difference in jump height was consistently observed between FT and double integration of force methods (-31% to -27%; p<0.001) and a large effect size (Cohen's d >1.2). Intra-session reliability of Optojump was excellent, with ICCs ranging from 0.98 to 0.99, low coefficients of variation (3.98%), and low standard errors of measurement (0.8 cm). It was concluded that there was a high correlation between the two methods to estimate the vertical jump height, but the FT method cannot replace the gold standard, due to the large systematic bias. According to our results, the equations of each of the three jump modalities were presented in order to obtain a better estimation of the jump height.
Attia, A; Chaouachi, A; Padulo, J; Wong, DP; Chamari, K
2016-01-01
Common methods to estimate vertical jump height (VJH) are based on the measurements of flight time (FT) or vertical reaction force. This study aimed to assess the measurement errors when estimating the VJH with flight time using photocell devices in comparison with the gold standard jump height measured by a force plate (FP). The second purpose was to determine the intrinsic reliability of the Optojump photoelectric cells in estimating VJH. For this aim, 20 subjects (age: 22.50±1.24 years) performed maximal vertical jumps in three modalities in randomized order: the squat jump (SJ), counter-movement jump (CMJ), and CMJ with arm swing (CMJarm). Each trial was simultaneously recorded by the FP and Optojump devices. High intra-class correlation coefficients (ICCs) for validity (0.98-0.99) and low limits of agreement (less than 1.4 cm) were found; even a systematic difference in jump height was consistently observed between FT and double integration of force methods (-31% to -27%; p<0.001) and a large effect size (Cohen’s d>1.2). Intra-session reliability of Optojump was excellent, with ICCs ranging from 0.98 to 0.99, low coefficients of variation (3.98%), and low standard errors of measurement (0.8 cm). It was concluded that there was a high correlation between the two methods to estimate the vertical jump height, but the FT method cannot replace the gold standard, due to the large systematic bias. According to our results, the equations of each of the three jump modalities were presented in order to obtain a better estimation of the jump height. PMID:28416900
Weakley, Jonathon J S; Till, Kevin; Darrall-Jones, Joshua; Roe, Gregory A B; Phibbs, Padraic J; Read, Dale B; Jones, Ben L
2017-07-01
Weakley, JJS, Till, K, Darrall-Jones, J, Roe, GAB, Phibbs, PJ, Read, DB, and Jones, BL. The influence of resistance training experience on the between-day reliability of commonly used strength measures in male youth athletes. J Strength Cond Res 31(7): 2005-2010, 2017-The purpose of this study was to determine the between-day reliability of commonly used strength measures in male youth athletes while considering resistance training experience. Data were collected on 25 male athletes over 2 testing sessions, with 72 hours rest between, for the 3 repetition maximum (3RM) front squat, chin-up, and bench press. Subjects were initially categorized by resistance training experience (inexperienced; 6-12 months, experienced; >2 years). The assessment of the between-day reliability (coefficient of variation [CV%]) showed that the front squat (experienced: 2.90%; inexperienced: 1.90%), chin-up (experienced: 1.70%; inexperienced: 1.90%), and bench press (experienced: 4.50%; inexperienced: 2.40%) were all reliable measures of strength in both groups. Comparison between groups for the error of measurement for each exercise showed trivial differences. When both groups were combined, the CV% for the front squat, bench press, and chin-up were 2.50, 1.80, and 3.70%, respectively. This study provides scientists and practitioners with the between-day reliability reference data to determine real and practical changes for strength in male youth athletes with different resistance training experience. Furthermore, this study demonstrates that 3RM front squat, chin-up, and bench press are reliable exercises to quantify strength in male youth athletes.
An analysis of estimation of pulmonary blood flow by the single-breath method
NASA Technical Reports Server (NTRS)
Srinivasan, R.
1986-01-01
The single-breath method represents a simple noninvasive technique for the assessment of capillary blood flow across the lung. However, this method has not gained widespread acceptance, because its accuracy is still being questioned. A rigorous procedure is described for estimating pulmonary blood flow (PBF) using data obtained with the aid of the single-breath method. Attention is given to the minimization of data-processing errors in the presence of measurement errors and to questions regarding a correction for possible loss of CO2 in the lung tissue. It is pointed out that the estimations are based on the exact solution of the underlying differential equations which describe the dynamics of gas exchange in the lung. The reported study demonstrates the feasibility of obtaining highly reliable estimates of PBF from expiratory data in the presence of random measurement errors.
Error Estimation for the Linearized Auto-Localization Algorithm
Guevara, Jorge; Jiménez, Antonio R.; Prieto, Jose Carlos; Seco, Fernando
2012-01-01
The Linearized Auto-Localization (LAL) algorithm estimates the position of beacon nodes in Local Positioning Systems (LPSs), using only the distance measurements to a mobile node whose position is also unknown. The LAL algorithm calculates the inter-beacon distances, used for the estimation of the beacons’ positions, from the linearized trilateration equations. In this paper we propose a method to estimate the propagation of the errors of the inter-beacon distances obtained with the LAL algorithm, based on a first order Taylor approximation of the equations. Since the method depends on such approximation, a confidence parameter τ is defined to measure the reliability of the estimated error. Field evaluations showed that by applying this information to an improved weighted-based auto-localization algorithm (WLAL), the standard deviation of the inter-beacon distances can be improved by more than 30% on average with respect to the original LAL method. PMID:22736965
NASA Astrophysics Data System (ADS)
Prószyński, Witold; Kwaśniak, Mieczysław
2016-12-01
The paper presents the results of investigating the effect of increase of observation correlations on detectability and identifiability of a single gross error, the outlier test sensitivity and also the response-based measures of internal reliability of networks. To reduce in a research a practically incomputable number of possible test options when considering all the non-diagonal elements of the correlation matrix as variables, its simplest representation was used being a matrix with all non-diagonal elements of equal values, termed uniform correlation. By raising the common correlation value incrementally, a sequence of matrix configurations could be obtained corresponding to the increasing level of observation correlations. For each of the measures characterizing the above mentioned features of network reliability the effect is presented in a diagram form as a function of the increasing level of observation correlations. The influence of observation correlations on sensitivity of the w-test for correlated observations (Förstner 1983, Teunissen 2006) is investigated in comparison with the original Baarda's w-test designated for uncorrelated observations, to determine the character of expected sensitivity degradation of the latter when used for correlated observations. The correlation effects obtained for different reliability measures exhibit mutual consistency in a satisfactory extent. As a by-product of the analyses, a simple formula valid for any arbitrary correlation matrix is proposed for transforming the Baarda's w-test statistics into the w-test statistics for correlated observations.
RELIABILITY OF ANKLE-FOOT MORPHOLOGY, MOBILITY, STRENGTH, AND MOTOR PERFORMANCE MEASURES.
Fraser, John J; Koldenhoven, Rachel M; Saliba, Susan A; Hertel, Jay
2017-12-01
Assessment of foot posture, morphology, intersegmental mobility, strength and motor control of the ankle-foot complex are commonly used clinically, but measurement properties of many assessments are unclear. To determine test-retest and inter-rater reliability, standard error of measurement, and minimal detectable change of morphology, joint excursion and play, strength, and motor control of the ankle-foot complex. Reliability study. 24 healthy, recreationally-active young adults without history of ankle-foot injury were assessed by two clinicians on two occasions, three to ten days apart. Measurement properties were assessed for foot morphology (foot posture index, total and truncated length, width, arch height), joint excursion (weight-bearing dorsiflexion, rearfoot and hallux goniometry, forefoot inclinometry, 1 st metatarsal displacement) and joint play, strength (handheld dynamometry), and motor control rating during intrinsic foot muscle (IFM) exercises. Clinician order was randomized using a Latin Square. The clinicians performed independent examinations and did not confer on the findings for the duration of the study. Test-retest and inter-tester reliability and agreement was assessed using intraclass correlation coefficients (ICC 2,k ) and weighted kappa ( K w ). Test-retest reliability ICC were as follows: morphology: .80-1.00, joint excursion: .58-.97, joint play: -.67-.84, strength: .67-.92, IFM motor rating: K W -.01-.71. Inter-rater reliability ICC were as follows: morphology: .81-1.00, joint excursion: .32-.97, joint play: -1.06-1.00, strength: .53-.90, and IFM motor rating: K w .02-.56. Measures of ankle-foot posture, morphology, joint excursion, and strength demonstrated fair to excellent test-retest and inter-rater reliability. Test-retest reliability for rating of perceived difficulty and motor performance was good to excellent for short-foot, toe-spread-out, and hallux exercises and poor to fair for lesser toe extension. Joint play measures had poor to fair reliability overall. The findings of this study should be considered when choosing methods of clinical assessment and outcome measures in practice and research. 3.
Validity and reliability of the Fitbit Zip as a measure of preschool children’s step count
Sharp, Catherine A; Mackintosh, Kelly A; Erjavec, Mihela; Pascoe, Duncan M; Horne, Pauline J
2017-01-01
Objectives Validation of physical activity measurement tools is essential to determine the relationship between physical activity and health in preschool children, but research to date has not focused on this priority. The aims of this study were to ascertain inter-rater reliability of observer step count, and interdevice reliability and validity of Fitbit Zip accelerometer step counts in preschool children. Methods Fifty-six children aged 3–4 years (29 girls) recruited from 10 nurseries in North Wales, UK, wore two Fitbit Zip accelerometers while performing a timed walking task in their childcare settings. Accelerometers were worn in secure pockets inside a custom-made tabard. Video recordings enabled two observers to independently code the number of steps performed in 3 min by each child during the walking task. Intraclass correlations (ICCs), concordance correlation coefficients, Bland-Altman plots and absolute per cent error were calculated to assess the reliability and validity of the consumer-grade device. Results An excellent ICC was found between the two observer codings (ICC=1.00) and the two Fitbit Zips (ICC=0.91). Concordance between the Fitbit Zips and observer counts was also high (r=0.77), with an acceptable absolute per cent error (6%–7%). Bland-Altman analyses identified a bias for Fitbit 1 of 22.8±19.1 steps with limits of agreement between −14.7 and 60.2 steps, and a bias for Fitbit 2 of 25.2±23.2 steps with limits of agreement between −20.2 and 70.5 steps. Conclusions Fitbit Zip accelerometers are a reliable and valid method of recording preschool children’s step count in a childcare setting. PMID:29081984
Grigg, Josephine; Haakonssen, Eric; Rathbone, Evelyne; Orr, Robin; Keogh, Justin W L
2017-11-13
The aim of this study was to quantify the validity and intra-tester reliability of a novel method of kinematic measurement. The measurement target was the joint angles of an athlete performing a BMX Supercross (SX) gate start action through the first 1.2 s of movement in situ on a BMX SX ramp using a standard gate start procedure. The method employed GoPro® Hero 4 Silver (GoPro Inc., USA) cameras capturing data at 120 fps 720 p on a 'normal' lens setting. Kinovea 0.8.15 (Kinovea.org, France) was used for analysis. Tracking data was exported and angles computed in Matlab (Mathworks®, USA). The gold standard 3D method for joint angle measurement could not safely be employed in this environment, so a rigid angle was used. Validity was measured to be within 2°. Intra-tester reliability was measured by the same tester performing the analysis twice with an average of 55 days between analyses. Intra-tester reliability was high, with an absolute error <6° and <9 frames (0.075 s) across all angles and time points for key positions, respectively. The methodology is valid within 2° and reliable within 6° for the calculation of joint angles in the first ~1.25 s.
Correcting Measurement Error in Latent Regression Covariates via the MC-SIMEX Method
ERIC Educational Resources Information Center
Rutkowski, Leslie; Zhou, Yan
2015-01-01
Given the importance of large-scale assessments to educational policy conversations, it is critical that subpopulation achievement is estimated reliably and with sufficient precision. Despite this importance, biased subpopulation estimates have been found to occur when variables in the conditioning model side of a latent regression model contain…
Psychometric Properties of Raw and Scale Scores on Mixed-Format Tests
ERIC Educational Resources Information Center
Kolen, Michael J.; Lee, Won-Chan
2011-01-01
This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…
ERIC Educational Resources Information Center
Diamond, James J.; McCormick, Janet
1986-01-01
Using item responses from an in-training examination in diagnostic radiology, the application of a strength of association statistic to the general problem of item analysis is illustrated. Criteria for item selection, general issues of reliability, and error of measurement are discussed. (Author/LMO)
Application of the differential decay-curve method to γ-γ fast-timing lifetime measurements
NASA Astrophysics Data System (ADS)
Petkov, P.; Régis, J.-M.; Dewald, A.; Kisyov, S.
2016-10-01
A new procedure for the analysis of delayed-coincidence lifetime experiments focused on the Fast-timing case is proposed following the approach of the Differential decay-curve method. Examples of application of the procedure on experimental data reveal its reliability for lifetimes even in the sub-nanosecond range. The procedure is expected to improve both precision/reliability and treatment of systematic errors and scarce data as well as to provide an option for cross-check with the results obtained by means of other analyzing methods.
Camera-tracking gaming control device for evaluation of active wrist flexion and extension.
Shefer Eini, Dalit; Ratzon, Navah Z; Rizzo, Albert A; Yeh, Shih-Ching; Lange, Belinda; Yaffe, Batia; Daich, Alexander; Weiss, Patrice L; Kizony, Rachel
Cross sectional. Measuring wrist range of motion (ROM) is an essential procedure in hand therapy clinics. To test the reliability and validity of a dynamic ROM assessment, the Camera Wrist Tracker (CWT). Wrist flexion and extension ROM of 15 patients with distal radius fractures and 15 matched controls were assessed with the CWT and with a universal goniometer. One-way model intraclass correlation coefficient analysis indicated high test-retest reliability for extension (ICC = 0.92) and moderate reliability for flexion (ICC = 0.49). Standard error for extension was 2.45° and for flexion was 4.07°. Repeated-measures analysis revealed a significant main effect for group; ROM was greater in the control group (F[1, 28] = 47.35; P < .001). The concurrent validity of the CWT was partially supported. The results indicate that the CWT may provide highly reliable scores for dynamic wrist extension ROM, and moderately reliable scores for flexion, in people recovering from a distal radius fracture. N/A. Copyright © 2016 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Houwink, Annemieke; Geerdink, Yvonne A; Steenbergen, Bert; Geurts, Alexander C H; Aarts, Pauline B M
2013-01-01
To investigate the validity and reliability of the revised Video-Observation Aarts and Aarts module: Determine Developmental Disregard (VOAA-DDD-R). Upper-limb capacity and performance were assessed in children with unilateral spastic cerebral palsy (CP) by measuring overall duration of affected upper-limb use and the frequency of specific behaviours during a task in which bimanual activity was demanded ('stringing beads') and stimulated ('decorating a muffin'). Developmental disregard was defined as the difference in duration of affected upper-limb use between both tasks. Raters were two occupational and one physical therapist who received 3 hours of training. Construct validity was determined by comparing children with CP with typically developing children. Intrarater, interrater, and test-retest reliability were determined using the intraclass correlation coefficient. Standard errors of measurement and smallest detectable differences were also calculated. Twenty-five children with CP (15 females, 10 males; mean age 4 y 9 mo [SD 1 y 7 mo], range 2 y 9 mo-8 y; Manual Ability Classification System levels I-III) scored lower on capacity (p=0.052) and performance (p<0.001), and higher on developmental disregard (p<0.001) than 46 age- and sex-matched typically developing children (23 males; mean age 5 y 3 mo [SD 1 y 5 mo], range 2 y 6 mo-8 y). The intraclass correlation coefficients (0.79-1.00) indicated good reliability. Absolute agreement was high, standard errors of measurement ranged from 4.5 to 6.8%, and smallest detectable differences ranged from 12.5 to 19.0%. The VOAA-DDD-R can be reliably and validly used by occupational and physical therapists to assess upper-limb capacity, performance, and developmental disregard in children (2 y 6 mo-8 y) with CP. © The Authors. Developmental Medicine & Child Neurology © 2012 Mac Keith Press.
Assessment of C-band Polarimetric Radar Rainfall Measurements During Strong Attenuation.
NASA Astrophysics Data System (ADS)
Paredes-Victoria, P. N.; Rico-Ramirez, M. A.; Pedrozo-Acuña, A.
2016-12-01
In the modern hydrological modelling and their applications on flood forecasting systems and climate modelling, reliable spatiotemporal rainfall measurements are the keystone. Raingauges are the foundation in hydrology to collect rainfall data, however they are prone to errors (e.g. systematic, malfunctioning, and instrumental errors). Moreover rainfall data from gauges is often used to calibrate and validate weather radar rainfall, which is distributed in space. Therefore, it is important to apply techniques to control the quality of the raingauge data in order to guarantee a high level of confidence in rainfall measurements for radar calibration and numerical weather modelling. Also, the reliability of radar data is often limited because of the errors in the radar signal (e.g. clutter, variation of the vertical reflectivity profile, beam blockage, attenuation, etc) which need to be corrected in order to increase the accuracy of the radar rainfall estimation. This paper presents a method for raingauge-measurement quality-control correction based on the inverse distance weighted as a function of correlated climatology (i.e. performed by using the reflectivity from weather radar). Also a Clutter Mitigation Decision (CMD) algorithm is applied for clutter filtering process, finally three algorithms based on differential phase measurements are applied for radar signal attenuation correction. The quality-control method proves that correlated climatology is very sensitive in the first 100 kilometres for this area. The results also showed that ground clutter affects slightly the radar measurements due to the low gradient of the terrain in the area. However, strong radar signal attenuation is often found in this data set due to the heavy storms that take place in this region and the differential phase measurements are crucial to correct for attenuation at C-band frequencies. The study area is located in Sabancuy-Campeche, Mexico (Latitude 18.97 N, Longitude 91.17º W) and the radar rainfall measurements are obtained from a C-band polarimetric radar whereas raingauge measurements come from stations with 10-min and 24-hr time resolutions.
Bidulescu, Aurelian; Chambless, Lloyd E; Siega-Riz, Anna Maria; Zeisel, Steven H; Heiss, Gerardo
2009-02-20
The repeatability of a risk factor measurement affects the ability to accurately ascertain its association with a specific outcome. Choline is involved in methylation of homocysteine, a putative risk factor for cardiovascular disease, to methionine through a betaine-dependent pathway (one-carbon metabolism). It is unknown whether dietary intake of choline meets the recommended Adequate Intake (AI) proposed for choline (550 mg/day for men and 425 mg/day for women). The Estimated Average Requirement (EAR) remains to be established in population settings. Our objectives were to ascertain the reliability of choline and related nutrients (folate and methionine) intakes assessed with a brief food frequency questionnaire (FFQ) and to estimate dietary intake of choline and betaine in a bi-ethnic population. We estimated the FFQ dietary instrument reliability for the Atherosclerosis Risk in Communities (ARIC) study and the measurement error for choline and related nutrients from a stratified random sample of the ARIC study participants at the second visit, 1990-92 (N = 1,004). In ARIC, a population-based cohort of 15,792 men and women aged 45-64 years (1987-89) recruited at four locales in the U.S., diet was assessed in 15,706 baseline study participants using a version of the Willett 61-item FFQ, expanded to include some ethnic foods. Intraindividual variability for choline, folate and methionine were estimated using mixed models regression. Measurement error was substantial for the nutrients considered. The reliability coefficients were 0.50 for choline (0.50 for choline plus betaine), 0.53 for folate, 0.48 for methionine and 0.43 for total energy intake. In the ARIC population, the median and the 75th percentile of dietary choline intake were 284 mg/day and 367 mg/day, respectively. 94% of men and 89% of women had an intake of choline below that proposed as AI. African Americans had a lower dietary intake of choline in both genders. The three-year reliability of reported dietary intake was similar for choline and related nutrients, in the range as that published in the literature for other micronutrients. Using a brief FFQ to estimate intake, the majority of individuals in the ARIC cohort had an intake of choline below the values proposed as AI.
Bidulescu, Aurelian; Chambless, Lloyd E; Siega-Riz, Anna Maria; Zeisel, Steven H; Heiss, Gerardo
2009-01-01
Background The repeatability of a risk factor measurement affects the ability to accurately ascertain its association with a specific outcome. Choline is involved in methylation of homocysteine, a putative risk factor for cardiovascular disease, to methionine through a betaine-dependent pathway (one-carbon metabolism). It is unknown whether dietary intake of choline meets the recommended Adequate Intake (AI) proposed for choline (550 mg/day for men and 425 mg/day for women). The Estimated Average Requirement (EAR) remains to be established in population settings. Our objectives were to ascertain the reliability of choline and related nutrients (folate and methionine) intakes assessed with a brief food frequency questionnaire (FFQ) and to estimate dietary intake of choline and betaine in a bi-ethnic population. Methods We estimated the FFQ dietary instrument reliability for the Atherosclerosis Risk in Communities (ARIC) study and the measurement error for choline and related nutrients from a stratified random sample of the ARIC study participants at the second visit, 1990–92 (N = 1,004). In ARIC, a population-based cohort of 15,792 men and women aged 45–64 years (1987–89) recruited at four locales in the U.S., diet was assessed in 15,706 baseline study participants using a version of the Willett 61-item FFQ, expanded to include some ethnic foods. Intraindividual variability for choline, folate and methionine were estimated using mixed models regression. Results Measurement error was substantial for the nutrients considered. The reliability coefficients were 0.50 for choline (0.50 for choline plus betaine), 0.53 for folate, 0.48 for methionine and 0.43 for total energy intake. In the ARIC population, the median and the 75th percentile of dietary choline intake were 284 mg/day and 367 mg/day, respectively. 94% of men and 89% of women had an intake of choline below that proposed as AI. African Americans had a lower dietary intake of choline in both genders. Conclusion The three-year reliability of reported dietary intake was similar for choline and related nutrients, in the range as that published in the literature for other micronutrients. Using a brief FFQ to estimate intake, the majority of individuals in the ARIC cohort had an intake of choline below the values proposed as AI. PMID:19232103
Kwon, Heon-Ju; Kim, Kyoung Won; Kim, Bohyun; Kim, So Yeon; Lee, Chul Seung; Lee, Jeongjin; Song, Gi Won; Lee, Sung Gyu
2018-03-01
Computed tomography (CT) hepatic volumetry is currently accepted as the most reliable method for preoperative estimation of graft weight in living donor liver transplantation (LDLT). However, several factors can cause inaccuracies in CT volumetry compared to real graft weight. The purpose of this study was to determine the frequency and degree of resection plane-dependent error in CT volumetry of the right hepatic lobe in LDLT. Forty-six living liver donors underwent CT before donor surgery and on postoperative day 7. Prospective CT volumetry (V P ) was measured via the assumptive hepatectomy plane. Retrospective liver volume (V R ) was measured using the actual plane by comparing preoperative and postoperative CT. Compared with intraoperatively measured weight (W), errors in percentage (%) V P and V R were evaluated. Plane-dependent error in V P was defined as the absolute difference between V P and V R . % plane-dependent error was defined as follows: |V P -V R |/W∙100. Mean V P , V R , and W were 761.9 mL, 755.0 mL, and 696.9 g. Mean and % errors in V P were 73.3 mL and 10.7%. Mean error and % error in V R were 64.4 mL and 9.3%. Mean plane-dependent error in V P was 32.4 mL. Mean % plane-dependent error was 4.7%. Plane-dependent error in V P exceeded 10% of W in approximately 10% of the subjects in our study. There was approximately 5% plane-dependent error in liver V P on CT volumetry. Plane-dependent error in V P exceeded 10% of W in approximately 10% of LDLT donors in our study. This error should be considered, especially when CT volumetry is performed by a less experienced operator who is not well acquainted with the donor hepatectomy plane.
NASA Astrophysics Data System (ADS)
Golobokov, M.; Danilevich, S.
2018-04-01
In order to assess calibration reliability and automate such assessment, procedures for data collection and simulation study of thermal imager calibration procedure have been elaborated. The existing calibration techniques do not always provide high reliability. A new method for analyzing the existing calibration techniques and developing new efficient ones has been suggested and tested. A type of software has been studied that allows generating instrument calibration reports automatically, monitoring their proper configuration, processing measurement results and assessing instrument validity. The use of such software allows reducing man-hours spent on finalization of calibration data 2 to 5 times and eliminating a whole set of typical operator errors.
Absolute vs. relative error characterization of electromagnetic tracking accuracy
NASA Astrophysics Data System (ADS)
Matinfar, Mohammad; Narayanasamy, Ganesh; Gutierrez, Luis; Chan, Raymond; Jain, Ameet
2010-02-01
Electromagnetic (EM) tracking systems are often used for real time navigation of medical tools in an Image Guided Therapy (IGT) system. They are specifically advantageous when the medical device requires tracking within the body of a patient where line of sight constraints prevent the use of conventional optical tracking. EM tracking systems are however very sensitive to electromagnetic field distortions. These distortions, arising from changes in the electromagnetic environment due to the presence of conductive ferromagnetic surgical tools or other medical equipment, limit the accuracy of EM tracking, in some cases potentially rendering tracking data unusable. We present a mapping method for the operating region over which EM tracking sensors are used, allowing for characterization of measurement errors, in turn providing physicians with visual feedback about measurement confidence or reliability of localization estimates. In this instance, we employ a calibration phantom to assess distortion within the operating field of the EM tracker and to display in real time the distribution of measurement errors, as well as the location and extent of the field associated with minimal spatial distortion. The accuracy is assessed relative to successive measurements. Error is computed for a reference point and consecutive measurement errors are displayed relative to the reference in order to characterize the accuracy in near-real-time. In an initial set-up phase, the phantom geometry is calibrated by registering the data from a multitude of EM sensors in a non-ferromagnetic ("clean") EM environment. The registration results in the locations of sensors with respect to each other and defines the geometry of the sensors in the phantom. In a measurement phase, the position and orientation data from all sensors are compared with the known geometry of the sensor spacing, and localization errors (displacement and orientation) are computed. Based on error thresholds provided by the operator, the spatial distribution of localization errors are clustered and dynamically displayed as separate confidence zones within the operating region of the EM tracker space.
Estimation of perspective errors in 2D2C-PIV measurements for 3D concentrated vortices
NASA Astrophysics Data System (ADS)
Ma, Bao-Feng; Jiang, Hong-Gang
2018-06-01
Two-dimensional planar PIV (2D2C) is still extensively employed in flow measurement owing to its availability and reliability, although more advanced PIVs have been developed. It has long been recognized that there exist perspective errors in velocity fields when employing the 2D2C PIV to measure three-dimensional (3D) flows, the magnitude of which depends on out-of-plane velocity and geometric layouts of the PIV. For a variety of vortex flows, however, the results are commonly represented by vorticity fields, instead of velocity fields. The present study indicates that the perspective error in vorticity fields relies on gradients of the out-of-plane velocity along a measurement plane, instead of the out-of-plane velocity itself. More importantly, an estimation approach to the perspective error in 3D vortex measurements was proposed based on a theoretical vortex model and an analysis on physical characteristics of the vortices, in which the gradient of out-of-plane velocity is uniquely determined by the ratio of the maximum out-of-plane velocity to maximum swirling velocity of the vortex; meanwhile, the ratio has upper limits for naturally formed vortices. Therefore, if the ratio is imposed with the upper limits, the perspective error will only rely on the geometric layouts of PIV that are known in practical measurements. Using this approach, the upper limits of perspective errors of a concentrated vortex can be estimated for vorticity and other characteristic quantities of the vortex. In addition, the study indicates that the perspective errors in vortex location, vortex strength, and vortex radius can be all zero for axisymmetric vortices if they are calculated by proper methods. The dynamic mode decomposition on an oscillatory vortex indicates that the perspective errors of each DMD mode are also only dependent on the gradient of out-of-plane velocity if the modes are represented by vorticity.
Blind system identification of two-thermocouple sensor based on cross-relation method.
Li, Yanfeng; Zhang, Zhijie; Hao, Xiaojian
2018-03-01
In dynamic temperature measurement, the dynamic characteristics of the sensor affect the accuracy of the measurement results. Thermocouples are widely used for temperature measurement in harsh conditions due to their low cost, robustness, and reliability, but because of the presence of the thermal inertia, there is a dynamic error in the dynamic temperature measurement. In order to eliminate the dynamic error, two-thermocouple sensor was used to measure dynamic gas temperature in constant velocity flow environments in this paper. Blind system identification of two-thermocouple sensor based on a cross-relation method was carried out. Particle swarm optimization algorithm was used to estimate time constants of two thermocouples and compared with the grid based search method. The method was validated on the experimental equipment built by using high temperature furnace, and the input dynamic temperature was reconstructed by using the output data of the thermocouple with small time constant.
Blind system identification of two-thermocouple sensor based on cross-relation method
NASA Astrophysics Data System (ADS)
Li, Yanfeng; Zhang, Zhijie; Hao, Xiaojian
2018-03-01
In dynamic temperature measurement, the dynamic characteristics of the sensor affect the accuracy of the measurement results. Thermocouples are widely used for temperature measurement in harsh conditions due to their low cost, robustness, and reliability, but because of the presence of the thermal inertia, there is a dynamic error in the dynamic temperature measurement. In order to eliminate the dynamic error, two-thermocouple sensor was used to measure dynamic gas temperature in constant velocity flow environments in this paper. Blind system identification of two-thermocouple sensor based on a cross-relation method was carried out. Particle swarm optimization algorithm was used to estimate time constants of two thermocouples and compared with the grid based search method. The method was validated on the experimental equipment built by using high temperature furnace, and the input dynamic temperature was reconstructed by using the output data of the thermocouple with small time constant.
Optimizing Hybrid Metrology: Rigorous Implementation of Bayesian and Combined Regression.
Henn, Mark-Alexander; Silver, Richard M; Villarrubia, John S; Zhang, Nien Fan; Zhou, Hui; Barnes, Bryan M; Ming, Bin; Vladár, András E
2015-01-01
Hybrid metrology, e.g., the combination of several measurement techniques to determine critical dimensions, is an increasingly important approach to meet the needs of the semiconductor industry. A proper use of hybrid metrology may yield not only more reliable estimates for the quantitative characterization of 3-D structures but also a more realistic estimation of the corresponding uncertainties. Recent developments at the National Institute of Standards and Technology (NIST) feature the combination of optical critical dimension (OCD) measurements and scanning electron microscope (SEM) results. The hybrid methodology offers the potential to make measurements of essential 3-D attributes that may not be otherwise feasible. However, combining techniques gives rise to essential challenges in error analysis and comparing results from different instrument models, especially the effect of systematic and highly correlated errors in the measurement on the χ 2 function that is minimized. Both hypothetical examples and measurement data are used to illustrate solutions to these challenges.
Psychometric assessment of a scale to measure bonding workplace social capital
Tsutsumi, Akizumi; Inoue, Akiomi; Odagiri, Yuko
2017-01-01
Objectives Workplace social capital (WSC) has attracted increasing attention as an organizational and psychosocial factor related to worker health. This study aimed to assess the psychometric properties of a newly developed WSC scale for use in work environments, where bonding social capital is important. Methods We assessed the psychometric properties of a newly developed 6-item scale to measure bonding WSC using two data sources. Participants were 1,650 randomly selected workers who completed an online survey. Exploratory factor analyses were conducted. We examined the item–item and item–total correlations, internal consistency, and associations between scale scores and a previous 8-item measure of WSC. We evaluated test–retest reliability by repeating the survey with 900 of the respondents 2 weeks later. The overall scale reliability was quantified by an intraclass coefficient and the standard error of measurement. We evaluated convergent validity by examining the association with several relevant workplace psychosocial factors using a dataset from workers employed by an electrical components company (n = 2,975). Results The scale was unidimensional. The item–item and item–total correlations ranged from 0.52 to 0.78 (p < 0.01) and from 0.79 to 0.89 (p < 0.01), respectively. Internal consistency was good (Cronbach’s α coefficient: 0.93). The correlation with the 8-item scale indicated high criterion validity (r = 0.81) and the scale showed high test–retest reliability (r = 0.74, p < 0.01). The intraclass coefficient and standard error of measurement were 0.74 (95% confidence intervals: 0.71–0.77) and 4.04 (95% confidence intervals: 1.86–6.20), respectively. Correlations with relevant workplace psychosocial factors showed convergent validity. Conclusions The results confirmed that the newly developed WSC scale has adequate psychometric properties. PMID:28662058
Ohno, Shotaro; Takahashi, Kana; Inoue, Aimi; Takada, Koki; Ishihara, Yoshiaki; Tanigawa, Masaru; Hirao, Kazuki
2017-12-01
This study aims to examine the smallest detectable change (SDC) and test-retest reliability of the Center for Epidemiologic Studies Depression Scale (CES-D), General Self-Efficacy Scale (GSES), and 12-item General Health Questionnaire (GHQ-12). We tested 154 young adults at baseline and 2 weeks later. We calculated the intra-class correlation coefficients (ICCs) for test-retest reliability with a two-way random effects model for agreement. We then calculated the standard error of measurement (SEM) for agreement using the ICC formula. The SEM for agreement was used to calculate SDC values at the individual level (SDC ind ) and group level (SDC group ). The study participants included 137 young adults. The ICCs for all self-reported outcome measurement scales exceeded 0.70. The SEM of CES-D was 3.64, leading to an SDC ind of 10.10 points and SDC group of 0.86 points. The SEM of GSES was 1.56, leading to an SDC ind of 4.33 points and SDC group of 0.37 points. The SEM of GHQ-12 with bimodal scoring was 1.47, leading to an SDC ind of 4.06 points and SDC group of 0.35 points. The SEM of GHQ-12 with Likert scoring was 2.44, leading to an SDC ind of 6.76 points and SDC group of 0.58 points. To confirm that the change was not a result of measurement error, a score of self-reported outcome measurement scales would need to change by an amount greater than these SDC values. This has important implications for clinicians and epidemiologists when assessing outcomes. © 2017 John Wiley & Sons, Ltd.
Fault-tolerant clock synchronization validation methodology. [in computer systems
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Palumbo, Daniel L.; Johnson, Sally C.
1987-01-01
A validation method for the synchronization subsystem of a fault-tolerant computer system is presented. The high reliability requirement of flight-crucial systems precludes the use of most traditional validation methods. The method presented utilizes formal design proof to uncover design and coding errors and experimentation to validate the assumptions of the design proof. The experimental method is described and illustrated by validating the clock synchronization system of the Software Implemented Fault Tolerance computer. The design proof of the algorithm includes a theorem that defines the maximum skew between any two nonfaulty clocks in the system in terms of specific system parameters. Most of these parameters are deterministic. One crucial parameter is the upper bound on the clock read error, which is stochastic. The probability that this upper bound is exceeded is calculated from data obtained by the measurement of system parameters. This probability is then included in a detailed reliability analysis of the system.
A cascaded coding scheme for error control and its performance analysis
NASA Technical Reports Server (NTRS)
Lin, S.
1986-01-01
A coding scheme for error control in data communication systems is investigated. The scheme is obtained by cascading two error correcting codes, called the inner and the outer codes. The error performance of the scheme is analyzed for a binary symmetric channel with bit error rate epsilon < 1/2. It is shown that, if the inner and outer codes are chosen properly, extremely high reliability can be attained even for a high channel bit error rate. Various specific example schemes with inner codes ranging from high rates to very low rates and Reed-Solomon codes are considered, and their probabilities are evaluated. They all provide extremely high reliability even for very high bit error rates, say 0.1 to 0.01. Several example schemes are being considered by NASA for satellite and spacecraft down link error control.
Evaluation of a UMLS Auditing Process of Semantic Type Assignments
Gu, Huanying; Hripcsak, George; Chen, Yan; Morrey, C. Paul; Elhanan, Gai; Cimino, James J.; Geller, James; Perl, Yehoshua
2007-01-01
The UMLS is a terminological system that integrates many source terminologies. Each concept in the UMLS is assigned one or more semantic types from the Semantic Network, an upper level ontology for biomedicine. Due to the complexity of the UMLS, errors exist in the semantic type assignments. Finding assignment errors may unearth modeling errors. Even with sophisticated tools, discovering assignment errors requires manual review. In this paper we describe the evaluation of an auditing project of UMLS semantic type assignments. We studied the performance of the auditors who reviewed potential errors. We found that four auditors, interacting according to a multi-step protocol, identified a high rate of errors (one or more errors in 81% of concepts studied) and that results were sufficiently reliable (0.67 to 0.70) for the two most common types of errors. However, reliability was low for each individual auditor, suggesting that review of potential errors is resource-intensive. PMID:18693845
The development and validation of a custom built device for assessing frontal knee joint laxity.
Ismail, Shiek Abdullah; Simic, Milena; Clarke, Jillian L; Lopes, Thiago Jambo Alves; Pappas, Evangelos
2017-12-01
This study reports the development and validation of a quantitative technique of assessing frontal knee joint laxity through a custom built device named KLICP. The objectives of this study were to determine: (i) the intra- and inter-rater reliability and (ii) the validity of the device when compared to real time ultrasound. Twenty-five participants had their frontal knee joint laxity assessed by the KLICP, by manual varus/valgus tests and by ultrasound. Two raters independently assessed laxity manually by three repeated measurements, repeated at least 48h later. Results were validated by comparing them to the medial and lateral joint space opening measured by the ultrasound. Intraclass correlation coefficients and standard error of measurement reliability were calculated. Pearson's correlation coefficients were calculated to determine the correlation between the KLICP and the joint space. Intra-rater reliability (intra-session) for each rater was good on both sessions (0.91-0.98), intra-rater reliability (inter-sessions) was moderate to good (0.62-0.87), and inter-rater reliability (intra-session) was good (0.75-0.80). There is low agreement for intra-rater (inter-session) and for inter-rater (intra-session) reliability. The KLICP measurement has a significant positive fair to moderate correlation to the ultrasound measurement at the left (r: 0.61, p: 0.01) and right (r: 0.48, p: 0.02) knee in the valgus direction and at the left (r: 0.51, p: 0.01) and right (r: 0.39, p: 0.05) knee in the varus direction. There is low agreement between the KLICP and the RTU. Reliability and agreement was good only when measured for intra-rater, within session. Copyright © 2017 Elsevier B.V. All rights reserved.
Chen, Xin-Lin; Zhong, Liang-Huan; Wen, Yi; Liu, Tian-Wen; Li, Xiao-Ying; Hou, Zheng-Kun; Hu, Yue; Mo, Chuan-Wei; Liu, Feng-Bin
2017-09-15
This review aims to critically appraise and compare the measurement properties of inflammatory bowel disease (IBD)-specific health-related quality of life instruments. Medline, EMBASE and ISI Web of Knowledge were searched from their inception to May 2016. IBD-specific instruments for patients with Crohn's disease, ulcerative colitis or IBD were enrolled. The basic characteristics and domains of the instruments were collected. The methodological quality of measurement properties and measurement properties of the instruments were assessed. Fifteen IBD-specific instruments were included, which included twelve instruments for adult IBD patients and three for paediatric IBD patients. All of the instruments were developed in North American and European countries. The following common domains were identified: IBD-related symptoms, physical, emotional and social domain. The methodological quality was satisfactory for content validity; fair in internal consistency, reliability, structural validity, hypotheses testing and criterion validity; and poor in measurement error, cross-cultural validity and responsiveness. For adult IBD patients, the IBDQ-32 and its short version (SIBDQ) had good measurement properties and were the most widely used worldwide. For paediatric IBD patients, the IMPACT-III had good measurement properties and had more translated versions. Most methodological quality should be promoted, especially measurement error, cross-cultural validity and responsiveness. The IBDQ-32 was the most widely used instrument with good reliability and validity, followed by the SIBDQ and IMPACT-III. Further validation studies are necessary to support the use of other instruments.
Aldridge, Kristina; Boyadjiev, Simeon A.; Capone, George T.; DeLeon, Valerie B.; Richtsmeier, Joan T.
2015-01-01
The genetic basis for complex phenotypes is currently of great interest for both clinical investigators and basic scientists. In order to acquire a thorough understanding of the translation from genotype to phenotype, highly precise measures of phenotypic variation are required. New technologies, such as 3D photogrammetry are being implemented in phenotypic studies due to their ability to collect data rapidly and non-invasively. Before these systems can be broadly implemented the error associated with data collected from images acquired using these technologies must be assessed. This study investigates the precision, error, and repeatability associated with anthropometric landmark coordinate data collected from 3D digital photogrammetric images acquired with the 3dMDface System. Precision, error due to the imaging system, error due to digitization of the images, and repeatability are assessed in a sample of children and adults (N=15). Results show that data collected from images with the 3dMDface System are highly repeatable and precise. The average error associated with the placement of landmarks is sub-millimeter; both the error due to digitization and to the imaging system are very low. The few measures showing a higher degree of error include those crossing the labial fissure, which are influenced by even subtle movement of the mandible. These results suggest that 3D anthropometric data collected using the 3dMDface System are highly reliable and therefore useful for evaluation of clinical dysmorphology and surgery, analyses of genotype-phenotype correlations, and inheritance of complex phenotypes. PMID:16158436
Ebara, Takeshi; Azuma, Ryohei; Shoji, Naoto; Matsukawa, Tsuyoshi; Yamada, Yasuyuki; Akiyama, Tomohiro; Kurihara, Takahiro; Yamada, Shota
2017-11-25
Objective measurements using built-in smartphone sensors that can measure physical activity/inactivity in daily working life have the potential to provide a new approach to assessing workers' health effects. The aim of this study was to elucidate the characteristics and reliability of built-in step counting sensors on smartphones for development of an easy-to-use objective measurement tool that can be applied in ergonomics or epidemiological research. To evaluate the reliability of step counting sensors embedded in seven major smartphone models, the 6-minute walk test was conducted and the following analyses of sensor precision and accuracy were performed: 1) relationship between actual step count and step count detected by sensors, 2) reliability between smartphones of the same model, and 3) false detection rates when sitting during office work, while riding the subway, and driving. On five of the seven models, the inter-class correlations coefficient (ICC (3,1) ) showed high reliability with a range of 0.956-0.993. The other two models, however, had ranges of 0.443-0.504 and the relative error ratios of the sensor-detected step count to the actual step count were ±48.7%-49.4%. The level of agreement between the same models was ICC (3,1) : 0.992-0.998. The false detection rates differed between the sitting conditions. These results suggest the need for appropriate regulation of step counts measured by sensors, through means such as correction or calibration with a predictive model formula, in order to obtain the highly reliable measurement results that are sought in scientific investigation.
Multimodal assessment of visual attention using the Bethesda Eye & Attention Measure (BEAM).
Ettenhofer, Mark L; Hershaw, Jamie N; Barry, David M
2016-01-01
Computerized cognitive tests measuring manual response time (RT) and errors are often used in the assessment of visual attention. Evidence suggests that saccadic RT and errors may also provide valuable information about attention. This study was conducted to examine a novel approach to multimodal assessment of visual attention incorporating concurrent measurements of saccadic eye movements and manual responses. A computerized cognitive task, the Bethesda Eye & Attention Measure (BEAM) v.34, was designed to evaluate key attention networks through concurrent measurement of saccadic and manual RT and inhibition errors. Results from a community sample of n = 54 adults were analyzed to examine effects of BEAM attention cues on manual and saccadic RT and inhibition errors, internal reliability of BEAM metrics, relationships between parallel saccadic and manual metrics, and relationships of BEAM metrics to demographic characteristics. Effects of BEAM attention cues (alerting, orienting, interference, gap, and no-go signals) were consistent with previous literature examining key attention processes. However, corresponding saccadic and manual measurements were weakly related to each other, and only manual measurements were related to estimated verbal intelligence or years of education. This study provides preliminary support for the feasibility of multimodal assessment of visual attention using the BEAM. Results suggest that BEAM saccadic and manual metrics provide divergent measurements. Additional research will be needed to obtain comprehensive normative data, to cross-validate BEAM measurements with other indicators of neural and cognitive function, and to evaluate the utility of these metrics within clinical populations of interest.
NASA Astrophysics Data System (ADS)
Oda, Hirokuni; Xuan, Chuang
2014-10-01
development of pass-through superconducting rock magnetometers (SRM) has greatly promoted collection of paleomagnetic data from continuous long-core samples. The output of pass-through measurement is smoothed and distorted due to convolution of magnetization with the magnetometer sensor response. Although several studies could restore high-resolution paleomagnetic signal through deconvolution of pass-through measurement, difficulties in accurately measuring the magnetometer sensor response have hindered the application of deconvolution. We acquired reliable sensor response of an SRM at the Oregon State University based on repeated measurements of a precisely fabricated magnetic point source. In addition, we present an improved deconvolution algorithm based on Akaike's Bayesian Information Criterion (ABIC) minimization, incorporating new parameters to account for errors in sample measurement position and length. The new algorithm was tested using synthetic data constructed by convolving "true" paleomagnetic signal containing an "excursion" with the sensor response. Realistic noise was added to the synthetic measurement using Monte Carlo method based on measurement noise distribution acquired from 200 repeated measurements of a u-channel sample. Deconvolution of 1000 synthetic measurements with realistic noise closely resembles the "true" magnetization, and successfully restored fine-scale magnetization variations including the "excursion." Our analyses show that inaccuracy in sample measurement position and length significantly affects deconvolution estimation, and can be resolved using the new deconvolution algorithm. Optimized deconvolution of 20 repeated measurements of a u-channel sample yielded highly consistent deconvolution results and estimates of error in sample measurement position and length, demonstrating the reliability of the new deconvolution algorithm for real pass-through measurements.
Falaggis, Konstantinos; Towers, David P; Towers, Catherine E
2012-09-20
Multiwavelength interferometry (MWI) is a well established technique in the field of optical metrology. Previously, we have reported a theoretical analysis of the method of excess fractions that describes the mutual dependence of unambiguous measurement range, reliability, and the measurement wavelengths. In this paper wavelength, selection strategies are introduced that are built on the theoretical description and maximize the reliability in the calculated fringe order for a given measurement range, number of wavelengths, and level of phase noise. Practical implementation issues for an MWI interferometer are analyzed theoretically. It is shown that dispersion compensation is best implemented by use of reference measurements around absolute zero in the interferometer. Furthermore, the effects of wavelength uncertainty allow the ultimate performance of an MWI interferometer to be estimated.
Kevern, Mark A.; Beecher, Michael; Rao, Smita
2014-01-01
Context: Athletes who participate in throwing and racket sports consistently demonstrate adaptive changes in glenohumeral-joint internal and external rotation in the dominant arm. Measurements of these motions have demonstrated excellent intrarater and poor interrater reliability. Objective: To determine intrarater reliability, interrater reliability, and standard error of measurement for shoulder internal rotation, external rotation, and total arc of motion using an inclinometer in 3 testing procedures in National Collegiate Athletic Association Division I baseball and softball athletes. Design: Cross-sectional study. Setting: Athletic department. Patients or Other Participants Thirty-eight players participated in the study. Shoulder internal rotation, external rotation, and total arc of motion were measured by 2 investigators in 3 test positions. The standard supine position was compared with a side-lying test position, as well as a supine test position without examiner overpressure. Results: Excellent intrarater reliability was noted for all 3 test positions and ranges of motion, with intraclass correlation coefficient values ranging from 0.93 to 0.99. Results for interrater reliability were less favorable. Reliability for internal rotation was highest in the side-lying position (0.68) and reliability for external rotation and total arc was highest in the supine-without-overpressure position (0.774 and 0.713, respectively). The supine-with-overpressure position yielded the lowest interrater reliability results in all positions. The side-lying position had the most consistent results, with very little variation among intraclass correlation coefficient values for the various test positions. Conclusions: The results of our study clearly indicate that the side-lying test procedure is of equal or greater value than the traditional supine-with-overpressure method. PMID:25188316
Lohr, Christine; Braumann, Klaus-Michael; Reer, Ruediger; Schroeder, Jan; Schmidt, Tobias
2018-04-20
Tensiomyography™ (TMG) and MyotonPRO ® (MMT) are two non-invasive devices for monitoring muscle contractile and mechanical characteristics. This study aimed to evaluate the test-retest reliability of TMG and MMT parameters for measuring (TMG:) muscle displacement (D m ), contraction time (T c ), and velocity (V c ) and (MMT:) frequency (F), stiffness (S), and decrement (D) of the erector spinae muscles (ES) in healthy adults. A particular focus was set on the establishment of reliability measures for the previously barely evaluated secondary TMG parameter V c . Twenty-four subjects (13 female and 11 male, mean ± SD, 38.0 ± 12.0 years) were measured using TMG and MMT over 2 consecutive days. Absolute and relative reliability was calculated by standard error of measurement (SEM, SEM%), Minimum detectable change (MDC, MDC%), coefficient of variation (CV%) and intraclass correlation coefficient (ICC, 3.1) with a 95% confidence interval (CI). The ICCs for all variables and test-retest intervals ranged from 0.75 to 0.99 indicating a good to excellent relative reliability for both TMG and MMT, demonstrating the lowest values for TMG T c and between-day MMT D (ICC < 0.90). Absolute reliability was suitable for all parameters (CV 2-8%) except for D m (10-12%). V c demonstrated to be the most reliable and repeatable TMG parameter (ICC > 0.95, CV < 8%). The reliability for TMG V c could be established successfully. Its further applicability needs to be confirmed in future studies. MMT was found to be more reliable on repeated testing than the two other TMG parameters D m and T c .
Increasing reliability of Gauss-Kronrod quadrature by Eratosthenes' sieve method
NASA Astrophysics Data System (ADS)
Adam, Gh.; Adam, S.
2001-04-01
The reliability of the local error estimates returned by the Gauss-Kronrod quadrature rules can be raised up to the theoretical 100% rate of success, under error estimate sharpening, provided a number of natural validating conditions are required. The self-validating scheme of the local error estimates, which is easy to implement and adds little supplementary computing effort, strengthens considerably the correctness of the decisions within the automatic adaptive quadrature.
Exercise-Induced Hypoalgesia After Isometric Wall Squat Exercise: A Test-Retest Reliabilty Study.
Vaegter, Henrik Bjarke; Lyng, Kristian Damgaard; Yttereng, Fredrik Wannebo; Christensen, Mads Holst; Sørensen, Mathias Brandhøj; Graven-Nielsen, Thomas
2018-05-19
Isometric exercises decrease pressure pain sensitivity in exercising and nonexercising muscles known as exercise-induced hypoalgesia (EIH). No studies have assessed the test-retest reliability of EIH after isometric exercise. This study investigated the EIH on pressure pain thresholds (PPTs) after an isometric wall squat exercise. The relative and absolute test-retest reliability of the PPT as a test stimulus and the EIH response in exercising and nonexercising muscles were calculated. In two identical sessions, PPTs of the thigh and shoulder were assessed before and after three minutes of quiet rest and three minutes of wall squat exercise, respectively, in 35 healthy subjects. The relative test-retest reliability of PPT and EIH was determined using analysis of variance models, Person's r, and intraclass correlations (ICCs). The absolute test-retest reliability of EIH was determined based on PPT standard error of measurements and Cohen's kappa for agreement between sessions. Squat increased PPTs of exercising and nonexercising muscles by 16.8% ± 16.9% and 6.7% ± 12.9%, respectively (P < 0.001), with no significant differences between sessions. PPTs within and between sessions showed moderately strong correlations (r ≥ 0.74) and excellent (ICC ≥ 0.84) within-session (rest) and between-session test-retest reliability. EIH responses of exercising and nonexercising muscles showed no systematic errors between sessions; however, the relative test-retest reliability was low (ICCs = 0.03-0.43), and agreement in EIH responders and nonresponders between sessions was not significant (κ < 0.13, P > 0.43). A wall squat exercise increased PPTs compared with quiet rest; however, the relative and absolute reliability of the EIH response was poor. Future research is warranted to investigate the reliability of EIH in clinical pain populations.
Is the encoding of Reward Prediction Error reliable during development?
Keren, Hanna; Chen, Gang; Benson, Brenda; Ernst, Monique; Leibenluft, Ellen; Fox, Nathan A; Pine, Daniel S; Stringaris, Argyris
2018-05-16
Reward Prediction Errors (RPEs), defined as the difference between the expected and received outcomes, are integral to reinforcement learning models and play an important role in development and psychopathology. In humans, RPE encoding can be estimated using fMRI recordings, however, a basic measurement property of RPE signals, their test-retest reliability across different time scales, remains an open question. In this paper, we examine the 3-month and 3-year reliability of RPE encoding in youth (mean age at baseline = 10.6 ± 0.3 years), a period of developmental transitions in reward processing. We show that RPE encoding is differentially distributed between the positive values being encoded predominantly in the striatum and negative RPEs primarily encoded in the insula. The encoding of negative RPE values is highly reliable in the right insula, across both the long and the short time intervals. Insula reliability for RPE encoding is the most robust finding, while other regions, such as the striatum, are less consistent. Striatal reliability appeared significant as well once covarying for factors, which were possibly confounding the signal to noise ratio. By contrast, task activation during feedback in the striatum is highly reliable across both time intervals. These results demonstrate the valence-dependent differential encoding of RPE signals between the insula and striatum, and the consistency of RPE signals or lack thereof, during childhood and into adolescence. Characterizing the regions where the RPE signal in BOLD fMRI is a reliable marker is key for estimating reward-processing alterations in longitudinal designs, such as developmental or treatment studies. Copyright © 2018 Elsevier Inc. All rights reserved.
Software Fault Tolerance: A Tutorial
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo
2000-01-01
Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.
Zimowski, Michele; Moye, Jack; Dugoni, Bernard; Heim Viox, Melissa; Cohen, Hildie; Winfrey, Krishna
2017-02-01
The current study assessed whether home-based data collection by trained data collectors can produce high-quality physical measurement data in young children. The study assessed the quality of intra-examiner measurements of blood pressure, pulse rate and anthropometric dimensions using intra-examiner reliability and intra-examiner technical error of measurement (TEM). Non-clinical, primarily private homes of National Children's Study participants in twenty-two study locations across the USA. Children in four age groups: 5-7 months (n 91), 11-16 months (n 393), 23-28 months (n 1410) and 35-40 months (n 800). Absolute TEM ranged in value from 0·09 to 16·21, varying widely by age group and measure, as expected. Relative TEM spanned from 0·27 to 13·71 across age groups and physical measures. Reliabilities for anthropometric measurements by age group and measure ranged from 0·46 to >0·99 with most exceeding 0·90, suggesting that the large majority of anthropometric measures can be collected in a home-based setting on young children by trained data collectors. Reliabilities for blood pressure and pulse rate measurements by age group ranged from 0·21 to 0·74, implying these are less reliably measured with young children when taken in the data collection context described here. Reliability estimates >0·95 for weight, length, height, and thigh, waist and head circumference, and >0·90 for triceps and subscapular skinfolds, indicate that these measures can be collected in the field by trained data collectors without compromising data quality. These estimates can be used for interim evaluations of data collector training and measurement protocols.
van de Water, A T M; Benjamin, D R
2016-02-01
Systematic literature review. Diastasis of the rectus abdominis muscle (DRAM) has been linked with low back pain, abdominal and pelvic dysfunction. Measurement is used to either screen or to monitor DRAM width. Determining which methods are suitable for screening and monitoring DRAM is of clinical value. To identify the best methods to screen for DRAM presence and monitor DRAM width. AMED, Embase, Medline, PubMed and CINAHL databases were searched for measurement property studies of DRAM measurement methods. Population characteristics, measurement methods/procedures and measurement information were extracted from included studies. Quality of all studies was evaluated using 'quality rating criteria'. When possible, reliability generalisation was conducted to provide combined reliability estimations. Thirteen studies evaluated measurement properties of the 'finger width'-method, tape measure, calipers, ultrasound, CT and MRI. Ultrasound was most evaluated. Methodological quality of these studies varied widely. Pearson's correlations of r = 0.66-0.79 were found between calipers and ultrasound measurements. Calipers and ultrasound had Intraclass Correlation Coefficients (ICC) of 0.78-0.97 for test-retest, inter- and intra-rater reliability. The 'finger width'-method had weighted Kappa's of 0.73-0.77 for test-retest reliability, but moderate agreement (63%; weighted Kappa = 0.53) between raters. Comparing calipers and ultrasound, low measurement error was found (above the umbilicus), and the methods had good agreement (83%; weighted Kappa = 0.66) for discriminative purposes. The available information support ultrasound and calipers as adequate methods to assess DRAM. For other methods limited measurement information of low to moderate quality is available and further evaluation of their measurement properties is required. Copyright © 2015 Elsevier Ltd. All rights reserved.
Modification Site Localization in Peptides.
Chalkley, Robert J
2016-01-01
There are a large number of search engines designed to take mass spectrometry fragmentation spectra and match them to peptides from proteins in a database. These peptides could be unmodified, but they could also bear modifications that were added biologically or during sample preparation. As a measure of reliability for the peptide identification, software normally calculates how likely a given quality of match could have been achieved at random, most commonly through the use of target-decoy database searching (Elias and Gygi, Nat Methods 4(3): 207-214, 2007). Matching the correct peptide but with the wrong modification localization is not a random match, so results with this error will normally still be assessed as reliable identifications by the search engine. Hence, an extra step is required to determine site localization reliability, and the software approaches to measure this are the subject of this part of the chapter.
White-Heisel, Regina; Canfield, James P; Young-Hughes, Sadie
Perceiving imminent safe patient handling and movement (SPH&M) dangers may reduce musculoskeletal (MSK) injuries for nurses in the workplace. The purpose of this study is to develop and validate the 17-item Safe Patient Handling Perception Scale (SPHPS) as an evaluation instrument assessing perceptual risk of MSK injury based on SPH&M knowledge, practice, and resource accessibility in the workplace. Data were collected from a convenience sample (N = 117) of nursing employees at a Veteran Affairs Medical Center. Factor analysis identified three factors: knowledge, practice, and accessibility. The SPHPS demonstrated high levels of reliability, supported by acceptable alpha scores (SPHM knowledge [α = .866], SPHM practices [α = .901], and access to SPHM resources [α = .855]), in addition to the relatively low standard error of measurement scores (SEM). The study outcomes suggest that the SPHPS is a valid and reliable tool that can measure participants' perceived risk factors for MSK injuries.
Flosadottir, Vala; Roos, Ewa M.; Ageberg, Eva
2017-01-01
Background: The Activity Rating Scale (ARS) for disorders of the knee evaluates the level of activity by the frequency of participation in 4 separate activities with high demands on knee function, with a score ranging from 0 (none) to 16 (pivoting activities 4 times/wk). Purpose: To translate and cross-culturally adapt the ARS into Swedish and to assess measurement properties of the Swedish version of the ARS. Study Design: Cohort study (diagnosis); Level of evidence, 2. Methods: The COSMIN guidelines were followed. Participants (N = 100 [55 women]; mean age, 27 years) who were undergoing rehabilitation for a knee injury completed the ARS twice for test-retest reliability. The Knee injury and Osteoarthritis Outcome Score (KOOS), Tegner Activity Scale (TAS), and modernized Saltin-Grimby Physical Activity Level Scale (SGPALS) were administered at baseline to validate the ARS. Construct validity and responsiveness of the ARS were evaluated by testing predefined hypotheses regarding correlations between the ARS, KOOS, TAS, and SGPALS. The Cronbach alpha, intraclass correlation coefficients, absolute reliability, standard error of measurement, smallest detectable change, and Spearman rank-order correlation coefficients were calculated. Results: The ARS showed good internal consistency (α ≈ 0.96), good test-retest reliability (intraclass correlation coefficient >0.9), and no systematic bias between measurements. The standard error of measurement was less than 2 points, and the smallest detectable change was less than 1 point at the group level and less than 5 points at the individual level. More than 75% of the hypotheses were confirmed, indicating good construct validity and good responsiveness of the ARS. Conclusion: The Swedish version of the ARS is valid, reliable, and responsive for evaluating the level of activity based on the frequency of participation in high-demand knee sports activities in young adults with a knee injury. PMID:28979920
An approach to develop an algorithm to detect the climbing height in radial-axial ring rolling
NASA Astrophysics Data System (ADS)
Husmann, Simon; Hohmann, Magnus; Kuhlenkötter, Bernd
2017-10-01
Radial-axial ring rolling is the mainly used forming process to produce seamless rings, which are applied in miscellaneous industries like the energy sector, the aerospace technology or in the automotive industry. Due to the simultaneously forming in two opposite rolling gaps and the fact that ring rolling is a mass forming process, different errors could occur during the rolling process. Ring climbing is one of the most occurring process errors leading to a distortion of the ring's cross section and a deformation of the rings geometry. The conventional sensors of a radial-axial rolling machine could not detect this error. Therefore, it is a common strategy to roll a slightly bigger ring, so that random occurring process errors could be reduce afterwards by removing the additional material. The LPS installed an image processing system to the radial rolling gap of their ring rolling machine to enable the recognition and measurement of climbing rings and by this, to reduce the additional material. This paper presents the algorithm which enables the image processing system to detect the error of a climbing ring and ensures comparable reliable results for the measurement of the climbing height of the rings.
Tailoring a Human Reliability Analysis to Your Industry Needs
NASA Technical Reports Server (NTRS)
DeMott, D. L.
2016-01-01
Companies at risk of accidents caused by human error that result in catastrophic consequences include: airline industry mishaps, medical malpractice, medication mistakes, aerospace failures, major oil spills, transportation mishaps, power production failures and manufacturing facility incidents. Human Reliability Assessment (HRA) is used to analyze the inherent risk of human behavior or actions introducing errors into the operation of a system or process. These assessments can be used to identify where errors are most likely to arise and the potential risks involved if they do occur. Using the basic concepts of HRA, an evolving group of methodologies are used to meet various industry needs. Determining which methodology or combination of techniques will provide a quality human reliability assessment is a key element to developing effective strategies for understanding and dealing with risks caused by human errors. There are a number of concerns and difficulties in "tailoring" a Human Reliability Assessment (HRA) for different industries. Although a variety of HRA methodologies are available to analyze human error events, determining the most appropriate tools to provide the most useful results can depend on industry specific cultures and requirements. Methodology selection may be based on a variety of factors that include: 1) how people act and react in different industries, 2) expectations based on industry standards, 3) factors that influence how the human errors could occur such as tasks, tools, environment, workplace, support, training and procedure, 4) type and availability of data, 5) how the industry views risk & reliability, and 6) types of emergencies, contingencies and routine tasks. Other considerations for methodology selection should be based on what information is needed from the assessment. If the principal concern is determination of the primary risk factors contributing to the potential human error, a more detailed analysis method may be employed versus a requirement to provide a numerical value as part of a probabilistic risk assessment. Industries involved with humans operating large equipment or transport systems (ex. railroads or airlines) would have more need to address the man machine interface than medical workers administering medications. Human error occurs in every industry; in most cases the consequences are relatively benign and occasionally beneficial. In cases where the results can have disastrous consequences, the use of Human Reliability techniques to identify and classify the risk of human errors allows a company more opportunities to mitigate or eliminate these types of risks and prevent costly tragedies.
Evaluating segmentation error without ground truth.
Kohlberger, Timo; Singh, Vivek; Alvino, Chris; Bahlmann, Claus; Grady, Leo
2012-01-01
The automatic delineation of the boundaries of organs and other anatomical structures is a key component of many medical image processing systems. In this paper we present a generic learning approach based on a novel space of segmentation features, which can be trained to predict the overlap error and Dice coefficient of an arbitrary organ segmentation without knowing the ground truth delineation. We show the regressor to be much stronger a predictor of these error metrics than the responses of probabilistic boosting classifiers trained on the segmentation boundary. The presented approach not only allows us to build reliable confidence measures and fidelity checks, but also to rank several segmentation hypotheses against each other during online usage of the segmentation algorithm in clinical practice.
Bisi-Balogun, Adebisi; Rector, Michael
2017-09-01
We sought to develop a standardized protocol for ultrasound (US) measurements of plantar fascia (PF) width and cross-sectional area (CSA), which may serve as additional outcome variables during US examinations of both healthy asymptomatic PF and in plantar fasciopathy and determine its interrater and intrarater reliability. Ten healthy individuals (20 feet) were enrolled. Participants were assessed twice by two raters each to determine intrarater and interrater reliability. For each foot, three transverse scans of the central bundle of the PF were taken at its insertion at the medial calcaneal tubercle, identified in real time on the plantar surface of the foot, using a fine wire technique. Reliability was determined using intraclass correlation coefficients (ICC), standard errors of measurement (SEM), and limits of agreement (LOA) expressed as percentages of the mean. Reliability of PF width and CSA measurements was determined using PF width and CSA measurements from one sonogram measured once and the mean of three measurements from three sonograms each measured once. Ultrasound measurements of PF width and CSA showed a mean of 18.6 ± 2.0 mm and 69.20 ± 13.6 mm 2 respectively. Intra-reliability within both raters showed an ICC > 0.84 for width and ICC > 0.92 for CSA as well as a SEM% and LOA% < 10% for both width and CSA. Inter-rater reliability showed an ICC of 0.82 for width and 0.87 for CSA as well as a SEM% and LOA% < 10% for width and a SEM% < 10% and LOA% < 20% for CSA. Relative and absolute reliability within and between raters were higher when using the mean of three sonographs compared to one sonograph. Using this novel technique, PF CSA and width may be determined reliably using measurements from one sonogram or the mean of three sonograms. Measurement of PF CSA and width in addition to already established thickness and echogenicity measurements provides additional information on structural properties of the PF for clinicians and researchers in healthy and pathologic PF.
Melchiorri, Giovanni; Viero, Valerio; Triossi, Tamara; Padua, Elvira; Bonifazi, Marco
2017-11-01
This study investigated the applicability of a sport-specific test, the Shuttle Swim Test, in young water polo players to measure RSA. The aims were: to assess the reliability and to measure the responsiveness of the SST in young water polo athletes, and to provide age-related values of SST. Three hundred thirty-three elite athletes (18.3±5.1 years) were involved in the study. Of these, 99 were young people under 13 (13.1±0.5 years) who also underwent measurements for reliability and responsiveness of the SST The following six measures was used to assess anthropometric characteristics of the sample: height, weight, chest circumference, hip circumference, waist circumference, and arm span. Two performance measures were performed on dry land: push up and chin up. Reliability and responsiveness were measured by comparing the average speed of two trials: SST1 was 1.48±0.13 m·s-1 and SST2 1.47±.12 m·s-1. The SST showed good reliability in younger athletes (r=0.96). The Minimal Detectable Change is 0.06 m·s-1 (6 seconds of the total time) which corresponds to 3.6% of the average value measured, confirming the good responsiveness of the test. Coaches and researchers can use this value in the interpretation of the SST test results: changes below these values could be related to a measurement error. The various age-related values reported may help technicians to better interpret the performance of their athletes during competition.
Reliability and validity of ten consumer activity trackers.
Kooiman, Thea J M; Dontje, Manon L; Sprenger, Siska R; Krijnen, Wim P; van der Schans, Cees P; de Groot, Martijn
2015-01-01
Activity trackers can potentially stimulate users to increase their physical activity behavior. The aim of this study was to examine the reliability and validity of ten consumer activity trackers for measuring step count in both laboratory and free-living conditions. Healthy adult volunteers (n = 33) walked twice on a treadmill (4.8 km/h) for 30 min while wearing ten different activity trackers (i.e. Lumoback, Fitbit Flex, Jawbone Up, Nike+ Fuelband SE, Misfit Shine, Withings Pulse, Fitbit Zip, Omron HJ-203, Yamax Digiwalker SW-200 and Moves mobile application). In free-living conditions, 56 volunteers wore the same activity trackers for one working day. Test-retest reliability was analyzed with the Intraclass Correlation Coefficient (ICC). Validity was evaluated by comparing each tracker with the gold standard (Optogait system for laboratory and ActivPAL for free-living conditions), using paired samples t-tests, mean absolute percentage errors, correlations and Bland-Altman plots. Test-retest analysis revealed high reliability for most trackers except for the Omron (ICC .14), Moves app (ICC .37) and Nike+ Fuelband (ICC .53). The mean absolute percentage errors of the trackers in laboratory and free-living conditions respectively, were: Lumoback (-0.2, -0.4), Fibit Flex (-5.7, 3.7), Jawbone Up (-1.0, 1.4), Nike+ Fuelband (-18, -24), Misfit Shine (0.2, 1.1), Withings Pulse (-0.5, -7.9), Fitbit Zip (-0.3, 1.2), Omron (2.5, -0.4), Digiwalker (-1.2, -5.9), and Moves app (9.6, -37.6). Bland-Altman plots demonstrated that the limits of agreement varied from 46 steps (Fitbit Zip) to 2422 steps (Nike+ Fuelband) in the laboratory condition, and 866 steps (Fitbit Zip) to 5150 steps (Moves app) in the free-living condition. The reliability and validity of most trackers for measuring step count is good. The Fitbit Zip is the most valid whereas the reliability and validity of the Nike+ Fuelband is low.
Raper, Damian P; Witchalls, Jeremy; Philips, Elissa J; Knight, Emma; Drew, Michael K; Waddington, Gordon
2018-01-01
The use of microsensor technologies to conduct research and implement interventions in sports and exercise medicine has increased recently. The objective of this paper was to determine the validity and reliability of the ViPerform as a measure of load compared to vertical ground reaction force (GRF) as measured by force plates. Absolute reliability assessment, with concurrent validity. 10 professional triathletes ran 10 trials over force plates with the ViPerform mounted on the mid portion of the medial tibia. Calculated vertical ground reaction force data from the ViPerform was matched to the same stride on the force plate. Bland-Altman (BA) plot of comparative measure of agreement was used to assess the relationship between the calculated load from the accelerometer and the force plates. Reliability was calculated by intra-class correlation coefficients (ICC) with 95% confidence intervals. BA plot indicates minimal agreement between the measures derived from the force plate and ViPerform, with variation at an individual participant plot level. Reliability was excellent (ICC=0.877; 95% CI=0.825-0.917) in calculating the same vertical GRF in a repeated trial. Standard error of measure (SEM) equalled 99.83 units (95% CI=82.10-119.09), which, in turn, gave a minimum detectable change (MDC) value of 276.72 units (95% CI=227.32-330.07). The ViPerform does not calculate absolute values of vertical GRF similar to those measured by a force plate. It does provide a valid and reliable calculation of an athlete's lower limb load at constant velocity. Copyright © 2017 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Error trends in SASS winds as functions of atmospheric stability and sea surface temperature
NASA Technical Reports Server (NTRS)
Liu, W. T.
1983-01-01
Wind speed measurements obtained with the scatterometer instrument aboard the Seasat satellite are compared equivalent neutral wind measurements obtained from ship reports in the western N. Atlantic and eastern N. Pacific where the concentration of ship reports are high and the ranges of atmospheric stability and sea surface temperature are large. It is found that at low wind speeds the difference between satellite measurements and surface reports depends on sea surface temperature. At wind speeds higher than 8 m/s the dependence was greatly reduced. The removal of systematic errors due to fluctuations in atmospheric stability reduced the r.m.s. difference from 1.7 m/s to 0.8 m/s. It is suggested that further clarification of the effects of fluctuations in atmospheric stability on Seasat wind speed measurements should increase their reliability in the future.
Helmerhorst, Hendrik J F; Brage, Søren; Warren, Janet; Besson, Herve; Ekelund, Ulf
2012-08-31
Physical inactivity is one of the four leading risk factors for global mortality. Accurate measurement of physical activity (PA) and in particular by physical activity questionnaires (PAQs) remains a challenge. The aim of this paper is to provide an updated systematic review of the reliability and validity characteristics of existing and more recently developed PAQs and to quantitatively compare the performance between existing and newly developed PAQs.A literature search of electronic databases was performed for studies assessing reliability and validity data of PAQs using an objective criterion measurement of PA between January 1997 and December 2011. Articles meeting the inclusion criteria were screened and data were extracted to provide a systematic overview of measurement properties. Due to differences in reported outcomes and criterion methods a quantitative meta-analysis was not possible.In total, 31 studies testing 34 newly developed PAQs, and 65 studies examining 96 existing PAQs were included. Very few PAQs showed good results on both reliability and validity. Median reliability correlation coefficients were 0.62-0.71 for existing, and 0.74-0.76 for new PAQs. Median validity coefficients ranged from 0.30-0.39 for existing, and from 0.25-0.41 for new PAQs.Although the majority of PAQs appear to have acceptable reliability, the validity is moderate at best. Newly developed PAQs do not appear to perform substantially better than existing PAQs in terms of reliability and validity. Future PAQ studies should include measures of absolute validity and the error structure of the instrument.
2012-01-01
Physical inactivity is one of the four leading risk factors for global mortality. Accurate measurement of physical activity (PA) and in particular by physical activity questionnaires (PAQs) remains a challenge. The aim of this paper is to provide an updated systematic review of the reliability and validity characteristics of existing and more recently developed PAQs and to quantitatively compare the performance between existing and newly developed PAQs. A literature search of electronic databases was performed for studies assessing reliability and validity data of PAQs using an objective criterion measurement of PA between January 1997 and December 2011. Articles meeting the inclusion criteria were screened and data were extracted to provide a systematic overview of measurement properties. Due to differences in reported outcomes and criterion methods a quantitative meta-analysis was not possible. In total, 31 studies testing 34 newly developed PAQs, and 65 studies examining 96 existing PAQs were included. Very few PAQs showed good results on both reliability and validity. Median reliability correlation coefficients were 0.62–0.71 for existing, and 0.74–0.76 for new PAQs. Median validity coefficients ranged from 0.30–0.39 for existing, and from 0.25–0.41 for new PAQs. Although the majority of PAQs appear to have acceptable reliability, the validity is moderate at best. Newly developed PAQs do not appear to perform substantially better than existing PAQs in terms of reliability and validity. Future PAQ studies should include measures of absolute validity and the error structure of the instrument. PMID:22938557
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weston, Louise Marie
2007-09-01
A recent report on criticality accidents in nuclear facilities indicates that human error played a major role in a significant number of incidents with serious consequences and that some of these human errors may be related to the emotional state of the individual. A pre-shift test to detect a deleterious emotional state could reduce the occurrence of such errors in critical operations. The effectiveness of pre-shift testing is a challenge because of the need to gather predictive data in a relatively short test period and the potential occurrence of learning effects due to a requirement for frequent testing. This reportmore » reviews the different types of reliability and validity methods and testing and statistical analysis procedures to validate measures of emotional state. The ultimate value of a validation study depends upon the percentage of human errors in critical operations that are due to the emotional state of the individual. A review of the literature to identify the most promising predictors of emotional state for this application is highly recommended.« less
NASA Technical Reports Server (NTRS)
Strangman, Gary; Franceschini, Maria Angela; Boas, David A.; Sutton, J. P. (Principal Investigator)
2003-01-01
Near-infrared spectroscopy (NIRS) can be used to noninvasively measure changes in the concentrations of oxy- and deoxyhemoglobin in tissue. We have previously shown that while global changes can be reliably measured, focal changes can produce erroneous estimates of concentration changes (NeuroImage 13 (2001), 76). Here, we describe four separate sources for systematic error in the calculation of focal hemoglobin changes from NIRS data and use experimental methods and Monte Carlo simulations to examine the importance and mitigation methods of each. The sources of error are: (1). the absolute magnitudes and relative differences in pathlength factors as a function of wavelength, (2). the location and spatial extent of the absorption change with respect to the optical probe, (3). possible differences in the spatial distribution of hemoglobin species, and (4). the potential for simultaneous monitoring of multiple regions of activation. We found wavelength selection and optode placement to be important variables in minimizing such errors, and our findings indicate that appropriate experimental procedures could reduce each of these errors to a small fraction (<10%) of the observed concentration changes.