reliable test methods: Topics by Science.gov

Sample records for reliable test methods

A Comparison of the Approaches of Generalizability Theory and Item Response Theory in Estimating the Reliability of Test Scores for Testlet-Composed Tests

ERIC Educational Resources Information Center

Lee, Guemin; Park, In-Yong

2012-01-01

Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
Integrating Formal Methods and Testing 2002

NASA Technical Reports Server (NTRS)

Cukic, Bojan

2002-01-01

Traditionally, qualitative program verification methodologies and program testing are studied in separate research communities. None of them alone is powerful and practical enough to provide sufficient confidence in ultra-high reliability assessment when used exclusively. Significant advances can be made by accounting not only tho formal verification and program testing. but also the impact of many other standard V&V techniques, in a unified software reliability assessment framework. The first year of this research resulted in the statistical framework that, given the assumptions on the success of the qualitative V&V and QA procedures, significantly reduces the amount of testing needed to confidently assess reliability at so-called high and ultra-high levels (10-4 or higher). The coming years shall address the methodologies to realistically estimate the impacts of various V&V techniques to system reliability and include the impact of operational risk to reliability assessment. Combine formal correctness verification, process and product metrics, and other standard qualitative software assurance methods with statistical testing with the aim of gaining higher confidence in software reliability assessment for high-assurance applications. B) Quantify the impact of these methods on software reliability. C) Demonstrate that accounting for the effectiveness of these methods reduces the number of tests needed to attain certain confidence level. D) Quantify and justify the reliability estimate for systems developed using various methods.
Estimating Measures of Pass-Fail Reliability from Parallel Half-Tests.

ERIC Educational Resources Information Center

Woodruff, David J.; Sawyer, Richard L.

Two methods for estimating measures of pass-fail reliability are derived, by which both theta and kappa may be estimated from a single test administration. The methods require only a single test administration and are computationally simple. Both are based on the Spearman-Brown formula for estimating stepped-up reliability. The non-distributional…
Basic Concepts in Classical Test Theory: Tests Aren't Reliable, the Nature of Alpha, and Reliability Generalization as a Meta-analytic Method.

ERIC Educational Resources Information Center

Helms, LuAnn Sherbeck

This paper discusses the fact that reliability is about scores and not tests and how reliability limits effect sizes. The paper also explores the classical reliability coefficients of stability, equivalence, and internal consistency. Stability is concerned with how stable test scores will be over time, while equivalence addresses the relationship…
Wafer level reliability testing: An idea whose time has come

NASA Technical Reports Server (NTRS)

Trapp, O. D.

1987-01-01

Wafer level reliability testing has been nurtured in the DARPA supported workshops, held each autumn since 1982. The seeds planted in 1982 have produced an active crop of very large scale integration manufacturers applying wafer level reliability test methods. Computer Aided Reliability (CAR) is a new seed being nurtured. Users are now being awakened by the huge economic value of the wafer reliability testing technology.
Method matters: Understanding diagnostic reliability in DSM-IV and DSM-5.

PubMed

Chmielewski, Michael; Clark, Lee Anna; Bagby, R Michael; Watson, David

2015-08-01

Diagnostic reliability is essential for the science and practice of psychology, in part because reliability is necessary for validity. Recently, the DSM-5 field trials documented lower diagnostic reliability than past field trials and the general research literature, resulting in substantial criticism of the DSM-5 diagnostic criteria. Rather than indicating specific problems with DSM-5, however, the field trials may have revealed long-standing diagnostic issues that have been hidden due to a reliance on audio/video recordings for estimating reliability. We estimated the reliability of DSM-IV diagnoses using both the standard audio-recording method and the test-retest method used in the DSM-5 field trials, in which different clinicians conduct separate interviews. Psychiatric patients (N = 339) were diagnosed using the SCID-I/P; 218 were diagnosed a second time by an independent interviewer. Diagnostic reliability using the audio-recording method (N = 49) was "good" to "excellent" (M κ = .80) and comparable to the DSM-IV field trials estimates. Reliability using the test-retest method (N = 218) was "poor" to "fair" (M κ = .47) and similar to DSM-5 field-trials' estimates. Despite low test-retest diagnostic reliability, self-reported symptoms were highly stable. Moreover, there was no association between change in self-report and change in diagnostic status. These results demonstrate the influence of method on estimates of diagnostic reliability. (c) 2015 APA, all rights reserved).
A Latent Class Approach to Estimating Test-Score Reliability

ERIC Educational Resources Information Center

van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas

2011-01-01

This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…
Standard setting: comparison of two methods.

PubMed

George, Sanju; Haque, M Sayeed; Oyebode, Femi

2006-09-14

The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard-setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method. The norm-reference method of standard-setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method. The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% - 87%). The modified Angoff method had an inter-rater reliability of 0.81-0.82 and a test-retest reliability of 0.59-0.74. There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability.
Tracking reliability for space cabin-borne equipment in development by Crow model.

PubMed

Chen, J D; Jiao, S J; Sun, H L

2001-12-01

Objective. To study and track the reliability growth of manned spaceflight cabin-borne equipment in the course of its development. Method. A new technique of reliability growth estimation and prediction, which is composed of the Crow model and test data conversion (TDC) method was used. Result. The estimation and prediction value of the reliability growth conformed to its expectations. Conclusion. The method could dynamically estimate and predict the reliability of the equipment by making full use of various test information in the course of its development. It offered not only a possibility of tracking the equipment reliability growth, but also the reference for quality control in manned spaceflight cabin-borne equipment design and development process.
Overview of RICOR's reliability theoretical analysis, accelerated life demonstration test results and verification by field data

NASA Astrophysics Data System (ADS)

Vainshtein, Igor; Baruch, Shlomi; Regev, Itai; Segal, Victor; Filis, Avishai; Riabzev, Sergey

2018-05-01

The growing demand for EO applications that work around the clock 24hr/7days a week, such as in border surveillance systems, emphasizes the need for a highly reliable cryocooler having increased operational availability and optimized system's Integrated Logistic Support (ILS). In order to meet this need, RICOR developed linear and rotary cryocoolers which achieved successfully this goal. Cryocoolers MTTF was analyzed by theoretical reliability evaluation methods, demonstrated by normal and accelerated life tests at Cryocooler level and finally verified by field data analysis derived from Cryocoolers operating at system level. The following paper reviews theoretical reliability analysis methods together with analyzing reliability test results derived from standard and accelerated life demonstration tests performed at Ricor's advanced reliability laboratory. As a summary for the work process, reliability verification data will be presented as a feedback from fielded systems.
A systematic review of statistical methods used to test for reliability of medical instruments measuring continuous variables.

PubMed

Zaki, Rafdzah; Bulgiba, Awang; Nordin, Noorhaire; Azina Ismail, Noor

2013-06-01

Reliability measures precision or the extent to which test results can be replicated. This is the first ever systematic review to identify statistical methods used to measure reliability of equipment measuring continuous variables. This studyalso aims to highlight the inappropriate statistical method used in the reliability analysis and its implication in the medical practice. In 2010, five electronic databases were searched between 2007 and 2009 to look for reliability studies. A total of 5,795 titles were initially identified. Only 282 titles were potentially related, and finally 42 fitted the inclusion criteria. The Intra-class Correlation Coefficient (ICC) is the most popular method with 25 (60%) studies having used this method followed by the comparing means (8 or 19%). Out of 25 studies using the ICC, only 7 (28%) reported the confidence intervals and types of ICC used. Most studies (71%) also tested the agreement of instruments. This study finds that the Intra-class Correlation Coefficient is the most popular method used to assess the reliability of medical instruments measuring continuous outcomes. There are also inappropriate applications and interpretations of statistical methods in some studies. It is important for medical researchers to be aware of this issue, and be able to correctly perform analysis in reliability studies.
Improving the quality of discrete-choice experiments in health: how can we assess validity and reliability?

PubMed

Janssen, Ellen M; Marshall, Deborah A; Hauber, A Brett; Bridges, John F P

2017-12-01

The recent endorsement of discrete-choice experiments (DCEs) and other stated-preference methods by regulatory and health technology assessment (HTA) agencies has placed a greater focus on demonstrating the validity and reliability of preference results. Areas covered: We present a practical overview of tests of validity and reliability that have been applied in the health DCE literature and explore other study qualities of DCEs. From the published literature, we identify a variety of methods to assess the validity and reliability of DCEs. We conceptualize these methods to create a conceptual model with four domains: measurement validity, measurement reliability, choice validity, and choice reliability. Each domain consists of three categories that can be assessed using one to four procedures (for a total of 24 tests). We present how these tests have been applied in the literature and direct readers to applications of these tests in the health DCE literature. Based on a stakeholder engagement exercise, we consider the importance of study characteristics beyond traditional concepts of validity and reliability. Expert commentary: We discuss study design considerations to assess the validity and reliability of a DCE, consider limitations to the current application of tests, and discuss future work to consider the quality of DCEs in healthcare.
Feasibility and Reliability of Physical Fitness Tests in Older Adults with Intellectual Disability: A Pilot Study

ERIC Educational Resources Information Center

Hilgenkamp, Thessa I. M.; van Wijck, Ruud; Evenhuis, Heleen M.

2012-01-01

Background: Physical fitness is relevant for wellbeing and health, but knowledge on the feasibility and reliability of instruments to measure physical fitness for older adults with intellectual disability is lacking. Methods: Feasibility and test-retest reliability of a physical fitness test battery (Box and Block Test, Response Time Test, walking…
Interhemispheric Inhibition Measurement Reliability in Stroke: A Pilot Study

PubMed Central

Cassidy, Jessica M.; Chu, Haitao; Chen, Mo; Kimberley, Teresa J.; Carey, James R.

2016-01-01

Objective Reliable transcranial magnetic stimulation (TMS) measures for probing corticomotor excitability are important when assessing the physiological effects of non-invasive brain stimulation. The primary objective of this study was to examine test-retest reliability of an interhemispheric inhibition (IHI) index measurement in stroke. Materials and Methods Ten subjects with chronic stroke (≥ 6 months) completed two IHI testing sessions per week for three weeks (six testing sessions total). A single investigator measured IHI in the contra- to-ipsilesional primary motor cortex direction and in the opposite direction using bilateral paired-pulse TMS. Weekly sessions were separated by 24 hours with a 1-week washout period separating testing weeks. To determine if motor-evoked potential (MEP) quantification method affected measurement reliability, IHI indices computed from both MEP amplitude and area responses were found. Reliability was assessed with two-way, mixed intraclass correlation coefficients (ICC(3,k)). Standard error of measurement and minimal detectable difference statistics were also determined. Results With the exception of the initial testing week, IHI indices measured in the contra-to-ipsilesional hemisphere direction demonstrated moderate to excellent reliability (ICC = 0.725 – 0.913). Ipsi-to-contralesional IHI indices depicted poor or invalid reliability estimates throughout the three-week testing duration (ICC= −1.153 – 0.105). The overlap of ICC 95% confidence intervals suggested that IHI indices using MEP amplitude vs. area measures did not differ with respect to reliability. Conclusions IHI indices demonstrated varying magnitudes of reliability irrespective of MEP quantification method. Several strategies for improving IHI index measurement reliability are discussed. PMID:27333364
[A reliability growth assessment method and its application in the development of equipment in space cabin].

PubMed

Chen, J D; Sun, H L

1999-04-01

Objective. To assess and predict reliability of an equipment dynamically by making full use of various test informations in the development of products. Method. A new reliability growth assessment method based on army material system analysis activity (AMSAA) model was developed. The method is composed of the AMSAA model and test data conversion technology. Result. The assessment and prediction results of a space-borne equipment conform to its expectations. Conclusion. It is suggested that this method should be further researched and popularized.
Test-re-test reliability and inter-rater reliability of a digital pelvic inclinometer in young, healthy males and females.

PubMed

Beardsley, Chris; Egerton, Tim; Skinner, Brendon

2016-01-01

Objective. The purpose of this study was to investigate the reliability of a digital pelvic inclinometer (DPI) for measuring sagittal plane pelvic tilt in 18 young, healthy males and females. Method. The inter-rater reliability and test-re-test reliabilities of the DPI for measuring pelvic tilt in standing on both the right and left sides of the pelvis were measured by two raters carrying out two rating sessions of the same subjects, three weeks apart. Results. For measuring pelvic tilt, inter-rater reliability was designated as good on both sides (ICC = 0.81-0.88), test-re-test reliability within a single rating session was designated as good on both sides (ICC = 0.88-0.95), and test-re-test reliability between two rating sessions was designated as moderate on the left side (ICC = 0.65) and good on the right side (ICC = 0.85). Conclusion. Inter-rater reliability and test-re-test reliability within a single rating session of the DPI in measuring pelvic tilt were both good, while test-re-test reliability between rating sessions was moderate-to-good. Caution is required regarding the interpretation of the test-re-test reliability within a single rating session, as the raters were not blinded. Further research is required to establish validity.
One-year test-retest reliability of intrinsic connectivity network fMRI in older adults

PubMed Central

Guo, Cong C.; Kurth, Florian; Zhou, Juan; Mayer, Emeran A.; Eickhoff, Simon B; Kramer, Joel H.; Seeley, William W.

2014-01-01

“Resting-state” or task-free fMRI can assess intrinsic connectivity network (ICN) integrity in health and disease, suggesting a potential for use of these methods as disease-monitoring biomarkers. Numerous analytical options are available, including model-driven ROI-based correlation analysis and model-free, independent component analysis (ICA). High test-retest reliability will be a necessary feature of a successful ICN biomarker, yet available reliability data remains limited. Here, we examined ICN fMRI test-retest reliability in 24 healthy older subjects scanned roughly one year apart. We focused on the salience network, a disease-relevant ICN not previously subjected to reliability analysis. Most ICN analytical methods proved reliable (intraclass coefficients > 0.4) and could be further improved by wavelet analysis. Seed-based ROI correlation analysis showed high map-wise reliability, whereas graph theoretical measures and temporal concatenation group ICA produced the most reliable individual unit-wise outcomes. Including global signal regression in ROI-based correlation analyses reduced reliability. Our study provides a direct comparison between the most commonly used ICN fMRI methods and potential guidelines for measuring intrinsic connectivity in aging control and patient populations over time. PMID:22446491
Research on Horizontal Accuracy Method of High Spatial Resolution Remotely Sensed Orthophoto Image

NASA Astrophysics Data System (ADS)

Xu, Y. M.; Zhang, J. X.; Yu, F.; Dong, S.

2018-04-01

At present, in the inspection and acceptance of high spatial resolution remotly sensed orthophoto image, the horizontal accuracy detection is testing and evaluating the accuracy of images, which mostly based on a set of testing points with the same accuracy and reliability. However, it is difficult to get a set of testing points with the same accuracy and reliability in the areas where the field measurement is difficult and the reference data with high accuracy is not enough. So it is difficult to test and evaluate the horizontal accuracy of the orthophoto image. The uncertainty of the horizontal accuracy has become a bottleneck for the application of satellite borne high-resolution remote sensing image and the scope of service expansion. Therefore, this paper proposes a new method to test the horizontal accuracy of orthophoto image. This method using the testing points with different accuracy and reliability. These points' source is high accuracy reference data and field measurement. The new method solves the horizontal accuracy detection of the orthophoto image in the difficult areas and provides the basis for providing reliable orthophoto images to the users.
Estimating the Effect of Changes in Criterion Score Reliability on the Power of the "F" Test of Equality of Means

ERIC Educational Resources Information Center

Feldt, Leonard S.

2011-01-01

This article presents a simple, computer-assisted method of determining the extent to which increases in reliability increase the power of the "F" test of equality of means. The method uses a derived formula that relates the changes in the reliability coefficient to changes in the noncentrality of the relevant "F" distribution. A readily available…
Reliability and Validity of Information about Student Achievement: Comparing Large-Scale and Classroom Testing Contexts

ERIC Educational Resources Information Center

Cizek, Gregory J.

2009-01-01

Reliability and validity are two characteristics that must be considered whenever information about student achievement is collected. However, those characteristics--and the methods for evaluating them--differ in large-scale testing and classroom testing contexts. This article presents the distinctions between reliability and validity in the two…

Note: An online testing method for lifetime projection of high power light-emitting diode under accelerated reliability test.

PubMed

Chen, Qi; Chen, Quan; Luo, Xiaobing

2014-09-01

In recent years, due to the fast development of high power light-emitting diode (LED), its lifetime prediction and assessment have become a crucial issue. Although the in situ measurement has been widely used for reliability testing in laser diode community, it has not been applied commonly in LED community. In this paper, an online testing method for LED life projection under accelerated reliability test was proposed and the prototype was built. The optical parametric data were collected. The systematic error and the measuring uncertainty were calculated to be within 0.2% and within 2%, respectively. With this online testing method, experimental data can be acquired continuously and sufficient amount of data can be gathered. Thus, the projection fitting accuracy can be improved (r(2) = 0.954) and testing duration can be shortened.
Reliability of resting-state microstate features in electroencephalography.

PubMed

Khanna, Arjun; Pascual-Leone, Alvaro; Farzan, Faranak

2014-01-01

Electroencephalographic (EEG) microstate analysis is a method of identifying quasi-stable functional brain states ("microstates") that are altered in a number of neuropsychiatric disorders, suggesting their potential use as biomarkers of neurophysiological health and disease. However, use of EEG microstates as neurophysiological biomarkers requires assessment of the test-retest reliability of microstate analysis. We analyzed resting-state, eyes-closed, 30-channel EEG from 10 healthy subjects over 3 sessions spaced approximately 48 hours apart. We identified four microstate classes and calculated the average duration, frequency, and coverage fraction of these microstates. Using Cronbach's α and the standard error of measurement (SEM) as indicators of reliability, we examined: (1) the test-retest reliability of microstate features using a variety of different approaches; (2) the consistency between TAAHC and k-means clustering algorithms; and (3) whether microstate analysis can be reliably conducted with 19 and 8 electrodes. The approach of identifying a single set of "global" microstate maps showed the highest reliability (mean Cronbach's α > 0.8, SEM ≈ 10% of mean values) compared to microstates derived by each session or each recording. There was notably low reliability in features calculated from maps extracted individually for each recording, suggesting that the analysis is most reliable when maps are held constant. Features were highly consistent across clustering methods (Cronbach's α > 0.9). All features had high test-retest reliability with 19 and 8 electrodes. High test-retest reliability and cross-method consistency of microstate features suggests their potential as biomarkers for assessment of the brain's neurophysiological health.
Interrater Reliability in Large-Scale Assessments--Can Teachers Score National Tests Reliably without External Controls?

ERIC Educational Resources Information Center

Pantzare, Anna Lind

2015-01-01

In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examines if teachers' ratings of national tests in mathematics can be reliable without using monitoring, training, or other methods of external quality…
Assessment of a condition-specific quality-of-life measure for patients with developmentally absent teeth: validity and reliability testing.

PubMed

Akram, A J; Ireland, A J; Postlethwaite, K C; Sandy, J R; Jerreat, A S

2013-11-01

This article describes the process of validity and reliability testing of a condition-specific quality-of-life measure for patients with hypodontia presenting for orthodontic treatment. The development of the instrument is described in a previous article. Royal Devon and Exeter NHS Foundation Trust & Musgrove Park Hospital, Taunton. The child perception questionnaire was used as a standard against which to test criterion validity. The Bland and Altman method was used to check agreement between the two questionnaires. Construct validity was tested using principal component analysis on the four sections of the questionnaire. Test-retest reliability was tested using intraclass correlation coefficient and Bland and Altman method. Cronbach's alpha was used to test internal consistency reliability. Overall the questionnaire showed good reliability, criterion and construct validity. This together with previous evidence of good face and content validity suggests that the instrument may prove useful in clinical practice and further research. This study has demonstrated that the newly developed condition-specific quality-of-life questionnaire is both valid and reliable for use in young patients with hypodontia. © 2013 John Wiley & Sons A/S. Published by Blackwell Publishing Ltd.
Reliability of the Fox-walk test in patients with rheumatoid arthritis.

PubMed

Verberkt, Cornelia Antonia; Fridén, Cecilia; Grooten, Wilhelmus Johannes Andreas; Opava, Christina H

2012-01-01

The Fox-walk test is a new method used to estimate aerobic capacity outside a clinical environment, which may be useful in the implementation of daily health-enhancing physical activity. The aim of our study was to investigate the reliability of the test in people with rheumatoid arthritis (RA). Fifteen participants performed the Fox-walk test three times with weekly intervals. The intraclass correlation coefficient (ICC), the standard error of measurement (SEM) and the smallest detectable change (SDC) were used to estimate the reliability. General health perception, lower limb pain and fatigue were measured to determine their potential influence on the reliability. There were no systematic differences between the three test occasions (p = 0.190) and the reliability was almost perfect (ICC = 0.982). None of the covariates influenced the reliability. The SEM was 0.999 ml/kg/min or 3.4% and the SDC was 2.769 ml/kg/min or 9.4%. These findings demonstrate that the Fox-walk test is reliable in people with RA and enables differentiation between people with RA and monitoring progress. The validity of the test among people with RA is still to be determined. • The Fox-walk test is a new method to estimate aerobic capacity and could be performed walking or running. • The test is self administered without expensive equipment and is available in 150 public places in Sweden and several other European countries. • The Fox-walk test is a reliable test for use among people with rheumatoid arthritis monitoring the progress of their physical activity.
Test Reliability at the Individual Level

PubMed Central

Hu, Yueqin; Nesselroade, John R.; Erbacher, Monica K.; Boker, Steven M.; Burt, S. Alexandra; Keel, Pamela K.; Neale, Michael C.; Sisk, Cheryl L.; Klump, Kelly

2016-01-01

Reliability has a long history as one of the key psychometric properties of a test. However, a given test might not measure people equally reliably. Test scores from some individuals may have considerably greater error than others. This study proposed two approaches using intraindividual variation to estimate test reliability for each person. A simulation study suggested that the parallel tests approach and the structural equation modeling approach recovered the simulated reliability coefficients. Then in an empirical study, where forty-five females were measured daily on the Positive and Negative Affect Schedule (PANAS) for 45 consecutive days, separate estimates of reliability were generated for each person. Results showed that reliability estimates of the PANAS varied substantially from person to person. The methods provided in this article apply to tests measuring changeable attributes and require repeated measures across time on each individual. This article also provides a set of parallel forms of PANAS. PMID:28936107
Establishing Inter- and Intrarater Reliability for High-Stakes Testing Using Simulation.

PubMed

Kardong-Edgren, Suzan; Oermann, Marilyn H; Rizzolo, Mary Anne; Odom-Maryon, Tamara

This article reports one method to develop a standardized training method to establish the inter- and intrarater reliability of a group of raters for high-stakes testing. Simulation is used increasingly for high-stakes testing, but without research into the development of inter- and intrarater reliability for raters. Eleven raters were trained using a standardized methodology. Raters scored 28 student videos over a six-week period. Raters then rescored all videos over a two-day period to establish both intra- and interrater reliability. One rater demonstrated poor intrarater reliability; a second rater failed all students. Kappa statistics improved from the moderate to substantial agreement range with the exclusion of the two outlier raters' scores. There may be faculty who, for different reasons, should not be included in high-stakes testing evaluations. All faculty are content experts, but not all are expert evaluators.
Selection and Reporting of Statistical Methods to Assess Reliability of a Diagnostic Test: Conformity to Recommended Methods in a Peer-Reviewed Journal

PubMed Central

Park, Ji Eun; Han, Kyunghwa; Sung, Yu Sub; Chung, Mi Sun; Koo, Hyun Jung; Yoon, Hee Mang; Choi, Young Jun; Lee, Seung Soo; Kim, Kyung Won; Shin, Youngbin; An, Suah; Cho, Hyo-Min

2017-01-01

Objective To evaluate the frequency and adequacy of statistical analyses in a general radiology journal when reporting a reliability analysis for a diagnostic test. Materials and Methods Sixty-three studies of diagnostic test accuracy (DTA) and 36 studies reporting reliability analyses published in the Korean Journal of Radiology between 2012 and 2016 were analyzed. Studies were judged using the methodological guidelines of the Radiological Society of North America-Quantitative Imaging Biomarkers Alliance (RSNA-QIBA), and COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative. DTA studies were evaluated by nine editorial board members of the journal. Reliability studies were evaluated by study reviewers experienced with reliability analysis. Results Thirty-one (49.2%) of the 63 DTA studies did not include a reliability analysis when deemed necessary. Among the 36 reliability studies, proper statistical methods were used in all (5/5) studies dealing with dichotomous/nominal data, 46.7% (7/15) of studies dealing with ordinal data, and 95.2% (20/21) of studies dealing with continuous data. Statistical methods were described in sufficient detail regarding weighted kappa in 28.6% (2/7) of studies and regarding the model and assumptions of intraclass correlation coefficient in 35.3% (6/17) and 29.4% (5/17) of studies, respectively. Reliability parameters were used as if they were agreement parameters in 23.1% (3/13) of studies. Reproducibility and repeatability were used incorrectly in 20% (3/15) of studies. Conclusion Greater attention to the importance of reporting reliability, thorough description of the related statistical methods, efforts not to neglect agreement parameters, and better use of relevant terminology is necessary. PMID:29089821
Longitudinal Reliability of Self-Reported Age at Menarche in Adolescent Girls: Variability across Time and Setting

ERIC Educational Resources Information Center

Dorn, Lorah D.; Sontag-Padilla, Lisa M.; Pabst, Stephanie; Tissot, Abbigail; Susman, Elizabeth J.

2013-01-01

Age at menarche is critical in research and clinical settings, yet there is a dearth of studies examining its reliability in adolescents. We examined age at menarche during adolescence, specifically, (a) average method reliability across 3 years, (b) test-retest reliability between time points and methods, (c) intraindividual variability of…
Reliability of temporal summation and diffuse noxious inhibitory control

PubMed Central

Cathcart, Stuart; Winefield, Anthony H; Rolan, Paul; Lushington, Kurt

2009-01-01

BACKGROUND: The test-retest reliability of temporal summation (TS) and diffuse noxious inhibitory control (DNIC) has not been reported to date. Establishing such reliability would support the possibility of future experimental studies examining factors affecting TS and DNIC. Similarly, the use of manual algometry to induce TS, or an occlusion cuff to induce DNIC of TS to mechanical stimuli, has not been reported to date. Such devices may offer a simpler method than current techniques for inducing TS and DNIC, affording assessment at more anatomical locations and in more varied research settings. METHOD: The present study assessed the test-retest reliability of TS and DNIC using the above techniques. Sex differences on these measures were also investigated. RESULTS: Repeated measures ANOVA indicated successful induction of TS and DNIC, with no significant differences across test-retest occasions. Sex effects were not significant for any measure or interaction. Intraclass correlations indicated high test-retest reliability for all measures; however, there was large interindividual variation between test and retest measurements. CONCLUSION: The present results indicate acceptable within-session test-retest reliability of TS and DNIC. The results support the possibility of future experimental studies examining factors affecting TS and DNIC. PMID:20011713
Reliability and Validity of the Chinese (Mandarin) Tinnitus Handicap Inventory

PubMed Central

Meng, Zhaoli; Zheng, Yun; Wang, Kai; Kong, Xiudan; Tao, Yong; Xu, Ke; Liu, Guanjian

2012-01-01

Objectives The Tinnitus Handicap Inventory (THI) is a commonly used self-reporting tinnitus questionnaire. We undertook this study to determine the reliability and validity of the Chinese-Mandarin version of the Tinnitus Handicap Inventory (THI-CM) for measuring tinnitus-related handicaps. Methods We tested the test-retest reliability, internal reliability, and construct validity of the THI-CM. Two-hundred patients seeking treatment for primary or secondary tinnitus in Southwest China were asked to complete THI-CM prior to clinical evaluation. Patients were evaluated by a clinician using standard methods, and 40 patients were asked to complete THI-CM a second time 14±3 days after the initial interview. Results The test-retest reliability of THI-CM was high (Pearson correlation, 0.98), as was the internal reliability (Cronbach's α, 0.93). Factor analysis indicated that THI-CM has a unifactorial structure. Conclusion The THI-CM version is reliable. The total score in THI-CM can be used to measure tinnitus-related handicaps in Mandarin-speaking populations. PMID:22468196
Performance Assessment of Internal Quality Control (IQC) Products in Blood Transfusion Compatibility Testing in China

PubMed Central

Li, Jing-Jing; Gao, Qi; Liu, Zhi-Dong; Kang, Qiong-Hua; Hou, Yi-Jun; Zhang, Luo-Chuan; Hu, Xiao-Mei; Li, Jie; Zhang, Juan

2015-01-01

Internal quality control (IQC) is a critical component of laboratory quality management, and IQC products can determine the reliability of testing results. In China, given the fact that most blood transfusion compatibility laboratories do not employ IQC products or do so minimally, there is a lack of uniform and standardized IQC methods. To explore the reliability of IQC products and methods, we studied 697 results from IQC samples in our laboratory from 2012 to 2014. The results showed that the sensitivity and specificity of the IQCs in anti-B testing were 100% and 99.7%, respectively. The sensitivity and specificity of the IQCs in forward blood typing, anti-A testing, irregular antibody screening, and cross-matching were all 100%. The reliability analysis indicated that 97% of anti-B testing results were at a 99% confidence level, and 99.9% of forward blood typing, anti-A testing, irregular antibody screening, and cross-matching results were at a 99% confidence level. Therefore, our IQC products and methods are highly sensitive, specific, and reliable. Our study paves the way for the establishment of a uniform and standardized IQC method for pre-transfusion compatibility testing in China and other parts of the world. PMID:26488582
Why the Major Field Test in Business Does Not Report Subscores: Reliability and Construct Validity Evidence. Research Report. ETS RR-12-11

ERIC Educational Resources Information Center

Ling, Guangming

2012-01-01

To assess the value of individual students' subscores on the Major Field Test in Business (MFT Business), I examined the test's internal structure with factor analysis and structural equation model methods, and analyzed the subscore reliabilities using the augmented scores method. Analyses of the internal structure suggested that the MFT Business…
Measuring Quadriceps strength in adults with severe or moderate intellectual and visual disabilities: Feasibility and reliability.

PubMed

Dijkhuizen, Annemarie; Douma, Rob K; Krijnen, Wim P; van der Schans, Cees P; Waninge, Aly

2018-05-30

A feasible and reliable instrument to measure strength in persons with severe intellectual and visual disabilities (SIVD) is lacking. The aim of our study was to determine feasibility, learning period and reliability of three strength tests. Twenty-nine participants with SIVD performed the Minimum Sit-to-Stand Height test (MSST), the Leg Extension test (LE) and the 30 seconds Chair-Stand test (30sCS), once per week for 5 weeks. Feasibility was determined by the percentage of successful measurements; learning effect by using paired t test between two consecutive measurements; test-retest reliability by intraclass correlation coefficient and Limits of Agreement and, correlations by Pearson correlations. A sufficient feasibility and learning period of the tests was shown. The methods had sufficient test-retest reliability and moderate-to-sufficient correlations. The MSST, the LE, and the 30sCS are feasible tests for measuring muscle strength in persons with SIVD, having sufficient test re-test reliability. © 2018 John Wiley & Sons Ltd.
Statistical Tests of Reliability of NDE

NASA Technical Reports Server (NTRS)

Baaklini, George Y.; Klima, Stanley J.; Roth, Don J.; Kiser, James D.

1987-01-01

Capabilities of advanced material-testing techniques analyzed. Collection of four reports illustrates statistical method for characterizing flaw-detecting capabilities of sophisticated nondestructive evaluation (NDE). Method used to determine reliability of several state-of-the-art NDE techniques for detecting failure-causing flaws in advanced ceramic materials considered for use in automobiles, airplanes, and space vehicles.
Reliability of fitness tests using methods and time periods common in sport and occupational management.

PubMed

Burnstein, Bryan D; Steele, Russell J; Shrier, Ian

2011-01-01

Fitness testing is used frequently in many areas of physical activity, but the reliability of these measurements under real-world, practical conditions is unknown. To evaluate the reliability of specific fitness tests using the methods and time periods used in the context of real-world sport and occupational management. Cohort study. Eighteen different Cirque du Soleil shows. Cirque du Soleil physical performers who completed 4 consecutive tests (6-month intervals) and were free of injury or illness at each session (n = 238 of 701 physical performers). Performers completed 6 fitness tests on each assessment date: dynamic balance, Harvard step test, handgrip, vertical jump, pull-ups, and 60-second jump test. We calculated the intraclass coefficient (ICC) and limits of agreement between baseline and each time point and the ICC over all 4 time points combined. Reliability was acceptable (ICC > 0.6) over an 18-month time period for all pairwise comparisons and all time points together for the handgrip, vertical jump, and pull-up assessments. The Harvard step test and 60-second jump test had poor reliability (ICC < 0.6) between baseline and other time points. When we excluded the baseline data and calculated the ICC for 6-month, 12-month, and 18-month time points, both the Harvard step test and 60-second jump test demonstrated acceptable reliability. Dynamic balance was unreliable in all contexts. Limit-of-agreement analysis demonstrated considerable intraindividual variability for some tests and a learning effect by administrators on others. Five of the 6 tests in this battery had acceptable reliability over an 18-month time frame, but the values for certain individuals may vary considerably from time to time for some tests. Specific tests may require a learning period for administrators.
Identifying and classifying hyperostosis frontalis interna via computerized tomography.

PubMed

May, Hila; Peled, Nathan; Dar, Gali; Hay, Ori; Abbas, Janan; Masharawi, Youssef; Hershkovitz, Israel

2010-12-01

The aim of this study was to recognize the radiological characteristics of hyperostosis frontalis interna (HFI) and to establish a valid and reliable method for its identification and classification. A reliability test was carried out on 27 individuals who had undergone a head computerized tomography (CT) scan. Intra-observer reliability was obtained by examining the images three times, by the same researcher, with a 2-week interval between each sample ranking. The inter-observer test was performed by three independent researchers. A validity test was carried out using two methods for identifying and classifying HFI: 46 cadaver skullcaps were ranked twice via computerized tomography scans and then by direct observation. Reliability and validity were calculated using Kappa test (SPSS 15.0). Reliability tests of ranking HFI via CT scans demonstrated good results (K > 0.7). As for validity, a very good consensus was obtained between the CT and direct observation, when moderate and advanced types of HFI were present (K = 0.82). The suggested classification method for HFI, using CT, demonstrated a sensitivity of 84%, specificity of 90.5%, and positive predictive value of 91.3%. In conclusion, volume rendering is a reliable and valid tool for identifying HFI. The suggested three-scale classification is most suitable for radiological diagnosis of the phenomena. Considering the increasing awareness of HFI as an early indicator of a developing malady, this study may assist radiologists in identifying and classifying the phenomena.
A Comparison of Two Methods of Determining Interrater Reliability

ERIC Educational Resources Information Center

Fleming, Judith A.; Taylor, Janeen McCracken; Carran, Deborah

2004-01-01

This article offers an alternative methodology for practitioners and researchers to use in establishing interrater reliability for testing purposes. The majority of studies on interrater reliability use a traditional methodology where by two raters are compared using a Pearson product-moment correlation. This traditional method of estimating…
Reliability of Test Scores in Nonparametric Item Response Theory.

ERIC Educational Resources Information Center

Sijtsma, Klaas; Molenaar, Ivo W.

1987-01-01

Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four "classical" lower bounds to reliability. (Author/JAZ)
Software Reliability 2002

NASA Technical Reports Server (NTRS)

Wallace, Dolores R.

2003-01-01

In FY01 we learned that hardware reliability models need substantial changes to account for differences in software, thus making software reliability measurements more effective, accurate, and easier to apply. These reliability models are generally based on familiar distributions or parametric methods. An obvious question is 'What new statistical and probability models can be developed using non-parametric and distribution-free methods instead of the traditional parametric method?" Two approaches to software reliability engineering appear somewhat promising. The first study, begin in FY01, is based in hardware reliability, a very well established science that has many aspects that can be applied to software. This research effort has investigated mathematical aspects of hardware reliability and has identified those applicable to software. Currently the research effort is applying and testing these approaches to software reliability measurement, These parametric models require much project data that may be difficult to apply and interpret. Projects at GSFC are often complex in both technology and schedules. Assessing and estimating reliability of the final system is extremely difficult when various subsystems are tested and completed long before others. Parametric and distribution free techniques may offer a new and accurate way of modeling failure time and other project data to provide earlier and more accurate estimates of system reliability.

Comparing reliabilities of strip and conventional patch testing.

PubMed

Dickel, Heinrich; Geier, Johannes; Kreft, Burkhard; Pfützner, Wolfgang; Kuss, Oliver

2017-06-01

The standardized protocol for performing the strip patch test has proven to be valid, but evidence on its reliability is still missing. To estimate the parallel-test reliability of the strip patch test as compared with the conventional patch test. In this multicentre, prospective, randomized, investigator-blinded reliability study, 132 subjects were enrolled. Simultaneous duplicate strip and conventional patch tests were performed with the Finn Chambers ® on Scanpor ® tape test system and the patch test preparations nickel sulfate 5% pet., potassium dichromate 0.5% pet., and lanolin alcohol 30% pet. Reliability was estimated by the use of Cohen's kappa coefficient. Parallel-test reliability values of the three standard patch test preparations turned out to be acceptable, with slight advantages for the strip patch test. The differences in reliability were 9% (95%CI: -8% to 26%) for nickel sulfate and 23% (95%CI: -16% to 63%) for potassium dichromate, both favouring the strip patch test. The standardized strip patch test method for the detection of allergic contact sensitization in patients with suspected allergic contact dermatitis is reliable. Its application in routine clinical practice can be recommended, especially if the conventional patch test result is presumably false negative. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Efficiency tests of samplers for microbiological aerosols, a review

NASA Technical Reports Server (NTRS)

Henningson, E.; Faengmark, I.

1984-01-01

To obtain comparable results from studies using a variety of samplers of microbiological aerosols with different collection performances for various particle sizes, methods reported in the literature were surveyed, evaluated, and tabulated for testing the efficiency of the samplers. It is concluded that these samplers were not thoroughly tested, using reliable methods. Tests were conducted in static air chambers and in various outdoor and work environments. Results are not reliable as it is difficult to achieve stable and reproducible conditions in these test systems. Testing in a wind tunnel is recommended.
Methodology to improve design of accelerated life tests in civil engineering projects.

PubMed

Lin, Jing; Yuan, Yongbo; Zhou, Jilai; Gao, Jie

2014-01-01

For reliability testing an Energy Expansion Tree (EET) and a companion Energy Function Model (EFM) are proposed and described in this paper. Different from conventional approaches, the EET provides a more comprehensive and objective way to systematically identify external energy factors affecting reliability. The EFM introduces energy loss into a traditional Function Model to identify internal energy sources affecting reliability. The combination creates a sound way to enumerate the energies to which a system may be exposed during its lifetime. We input these energies into planning an accelerated life test, a Multi Environment Over Stress Test. The test objective is to discover weak links and interactions among the system and the energies to which it is exposed, and design them out. As an example, the methods are applied to the pipe in subsea pipeline. However, they can be widely used in other civil engineering industries as well. The proposed method is compared with current methods.
The reliability of WorkWell Systems Functional Capacity Evaluation: a systematic review

PubMed Central

2014-01-01

Background Functional capacity evaluation (FCE) determines a person’s ability to perform work-related tasks and is a major component of the rehabilitation process. The WorkWell Systems (WWS) FCE (formerly known as Isernhagen Work Systems FCE) is currently the most commonly used FCE tool in German rehabilitation centres. Our systematic review investigated the inter-rater, intra-rater and test-retest reliability of the WWS FCE. Methods We performed a systematic literature search of studies on the reliability of the WWS FCE and extracted item-specific measures of inter-rater, intra-rater and test-retest reliability from the identified studies. Intraclass correlation coefficients ≥ 0.75, percentages of agreement ≥ 80%, and kappa coefficients ≥ 0.60 were categorised as acceptable, otherwise they were considered non-acceptable. The extracted values were summarised for the five performance categories of the WWS FCE, and the results were classified as either consistent or inconsistent. Results From 11 identified studies, 150 item-specific reliability measures were extracted. 89% of the extracted inter-rater reliability measures, all of the intra-rater reliability measures and 96% of the test-retest reliability measures of the weight handling and strength tests had an acceptable level of reliability, compared to only 67% of the test-retest reliability measures of the posture/mobility tests and 56% of the test-retest reliability measures of the locomotion tests. Both of the extracted test-retest reliability measures of the balance test were acceptable. Conclusions Weight handling and strength tests were found to have consistently acceptable reliability. Further research is needed to explore the reliability of the other tests as inconsistent findings or a lack of data prevented definitive conclusions. PMID:24674029
FY12 End of Year Report for NEPP DDR2 Reliability

NASA Technical Reports Server (NTRS)

Guertin, Steven M.

2013-01-01

This document reports the status of the NASA Electronic Parts and Packaging (NEPP) Double Data Rate 2 (DDR2) Reliability effort for FY2012. The task expanded the focus of evaluating reliability effects targeted for device examination. FY11 work highlighted the need to test many more parts and to examine more operating conditions, in order to provide useful recommendations for NASA users of these devices. This year's efforts focused on development of test capabilities, particularly focusing on those that can be used to determine overall lot quality and identify outlier devices, and test methods that can be employed on components for flight use. Flight acceptance of components potentially includes considerable time for up-screening (though this time may not currently be used for much reliability testing). Manufacturers are much more knowledgeable about the relevant reliability mechanisms for each of their devices. We are not in a position to know what the appropriate reliability tests are for any given device, so although reliability testing could be focused for a given device, we are forced to perform a large campaign of reliability tests to identify devices with degraded reliability. With the available up-screening time for NASA parts, it is possible to run many device performance studies. This includes verification of basic datasheet characteristics. Furthermore, it is possible to perform significant pattern sensitivity studies. By doing these studies we can establish higher reliability of flight components. In order to develop these approaches, it is necessary to develop test capability that can identify reliability outliers. To do this we must test many devices to ensure outliers are in the sample, and we must develop characterization capability to measure many different parameters. For FY12 we increased capability for reliability characterization and sample size. We increased sample size this year by moving from loose devices to dual inline memory modules (DIMMs) with an approximate reduction of 20 to 50 times in terms of per device under test (DUT) cost. By increasing sample size we have improved our ability to characterize devices that may be considered reliability outliers. This report provides an update on the effort to improve DDR2 testing capability. Although focused on DDR2, the methods being used can be extended to DDR and DDR3 with relative ease.
Reliability evaluation of microgrid considering incentive-based demand response

NASA Astrophysics Data System (ADS)

Huang, Ting-Cheng; Zhang, Yong-Jun

2017-07-01

Incentive-based demand response (IBDR) can guide customers to adjust their behaviour of electricity and curtail load actively. Meanwhile, distributed generation (DG) and energy storage system (ESS) can provide time for the implementation of IBDR. The paper focus on the reliability evaluation of microgrid considering IBDR. Firstly, the mechanism of IBDR and its impact on power supply reliability are analysed. Secondly, the IBDR dispatch model considering customer’s comprehensive assessment and the customer response model are developed. Thirdly, the reliability evaluation method considering IBDR based on Monte Carlo simulation is proposed. Finally, the validity of the above models and method is studied through numerical tests on modified RBTS Bus6 test system. Simulation results demonstrated that IBDR can improve the reliability of microgrid.
Developing a reliable signal wire attachment method for rail.

DOT National Transportation Integrated Search

2014-11-01

The goal of this project was to develop a better attachment method for rail signal wires to improve the reliability of signaling : systems. EWI conducted basic research into the failure mode of current attachment methods and developed and tested a ne...
Resting-state test-retest reliability of a priori defined canonical networks over different preprocessing steps.

PubMed

Varikuti, Deepthi P; Hoffstaedter, Felix; Genon, Sarah; Schwender, Holger; Reid, Andrew T; Eickhoff, Simon B

2017-04-01

Resting-state functional connectivity analysis has become a widely used method for the investigation of human brain connectivity and pathology. The measurement of neuronal activity by functional MRI, however, is impeded by various nuisance signals that reduce the stability of functional connectivity. Several methods exist to address this predicament, but little consensus has yet been reached on the most appropriate approach. Given the crucial importance of reliability for the development of clinical applications, we here investigated the effect of various confound removal approaches on the test-retest reliability of functional-connectivity estimates in two previously defined functional brain networks. Our results showed that gray matter masking improved the reliability of connectivity estimates, whereas denoising based on principal components analysis reduced it. We additionally observed that refraining from using any correction for global signals provided the best test-retest reliability, but failed to reproduce anti-correlations between what have been previously described as antagonistic networks. This suggests that improved reliability can come at the expense of potentially poorer biological validity. Consistent with this, we observed that reliability was proportional to the retained variance, which presumably included structured noise, such as reliable nuisance signals (for instance, noise induced by cardiac processes). We conclude that compromises are necessary between maximizing test-retest reliability and removing variance that may be attributable to non-neuronal sources.
Resting-state test-retest reliability of a priori defined canonical networks over different preprocessing steps

PubMed Central

Varikuti, Deepthi P.; Hoffstaedter, Felix; Genon, Sarah; Schwender, Holger; Reid, Andrew T.; Eickhoff, Simon B.

2016-01-01

Resting-state functional connectivity analysis has become a widely used method for the investigation of human brain connectivity and pathology. The measurement of neuronal activity by functional MRI, however, is impeded by various nuisance signals that reduce the stability of functional connectivity. Several methods exist to address this predicament, but little consensus has yet been reached on the most appropriate approach. Given the crucial importance of reliability for the development of clinical applications, we here investigated the effect of various confound removal approaches on the test-retest reliability of functional-connectivity estimates in two previously defined functional brain networks. Our results showed that grey matter masking improved the reliability of connectivity estimates, whereas de-noising based on principal components analysis reduced it. We additionally observed that refraining from using any correction for global signals provided the best test-retest reliability, but failed to reproduce anti-correlations between what have been previously described as antagonistic networks. This suggests that improved reliability can come at the expense of potentially poorer biological validity. Consistent with this, we observed that reliability was proportional to the retained variance, which presumably included structured noise, such as reliable nuisance signals (for instance, noise induced by cardiac processes). We conclude that compromises are necessary between maximizing test-retest reliability and removing variance that may be attributable to non-neuronal sources. PMID:27550015
Techniques for control of long-term reliability of complex integrated circuits. I - Reliability assurance by test vehicle qualification.

NASA Technical Reports Server (NTRS)

Van Vonno, N. W.

1972-01-01

Development of an alternate approach to the conventional methods of reliability assurance for large-scale integrated circuits. The product treated is a large-scale T squared L array designed for space applications. The concept used is that of qualification of product by evaluation of the basic processing used in fabricating the product, providing an insight into its potential reliability. Test vehicles are described which enable evaluation of device characteristics, surface condition, and various parameters of the two-level metallization system used. Evaluation of these test vehicles is performed on a lot qualification basis, with the lot consisting of one wafer. Assembled test vehicles are evaluated by high temperature stress at 300 C for short time durations. Stressing at these temperatures provides a rapid method of evaluation and permits a go/no go decision to be made on the wafer lot in a timely fashion.
A comparison of manual anthropometric measurements with Kinect-based scanned measurements in terms of precision and reliability.

PubMed

Bragança, Sara; Arezes, Pedro; Carvalho, Miguel; Ashdown, Susan P; Castellucci, Ignacio; Leão, Celina

2018-01-01

Collecting anthropometric data for real-life applications demands a high degree of precision and reliability. It is important to test new equipment that will be used for data collectionOBJECTIVE:Compare two anthropometric data gathering techniques - manual methods and a Kinect-based 3D body scanner - to understand which of them gives more precise and reliable results. The data was collected using a measuring tape and a Kinect-based 3D body scanner. It was evaluated in terms of precision by considering the regular and relative Technical Error of Measurement and in terms of reliability by using the Intraclass Correlation Coefficient, Reliability Coefficient, Standard Error of Measurement and Coefficient of Variation. The results obtained showed that both methods presented better results for reliability than for precision. Both methods showed relatively good results for these two variables, however, manual methods had better results for some body measurements. Despite being considered sufficiently precise and reliable for certain applications (e.g. apparel industry), the 3D scanner tested showed, for almost every anthropometric measurement, a different result than the manual technique. Many companies design their products based on data obtained from 3D scanners, hence, understanding the precision and reliability of the equipment used is essential to obtain feasible results.
Bayesian methods in reliability

NASA Astrophysics Data System (ADS)

Sander, P.; Badoux, R.

1991-11-01

The present proceedings from a course on Bayesian methods in reliability encompasses Bayesian statistical methods and their computational implementation, models for analyzing censored data from nonrepairable systems, the traits of repairable systems and growth models, the use of expert judgment, and a review of the problem of forecasting software reliability. Specific issues addressed include the use of Bayesian methods to estimate the leak rate of a gas pipeline, approximate analyses under great prior uncertainty, reliability estimation techniques, and a nonhomogeneous Poisson process. Also addressed are the calibration sets and seed variables of expert judgment systems for risk assessment, experimental illustrations of the use of expert judgment for reliability testing, and analyses of the predictive quality of software-reliability growth models such as the Weibull order statistics.
Experiments in fault tolerant software reliability

NASA Technical Reports Server (NTRS)

Mcallister, David F.; Tai, K. C.; Vouk, Mladen A.

1987-01-01

The reliability of voting was evaluated in a fault-tolerant software system for small output spaces. The effectiveness of the back-to-back testing process was investigated. Version 3.0 of the RSDIMU-ATS, a semi-automated test bed for certification testing of RSDIMU software, was prepared and distributed. Software reliability estimation methods based on non-random sampling are being studied. The investigation of existing fault-tolerance models was continued and formulation of new models was initiated.
Test-Retest Reliability of the Salutogenic Wellness Promotion Scale (SWPS)

ERIC Educational Resources Information Center

Anderson, L. M.; Moore, J. B.; Hayden, B. M.; Becker, C. M.

2014-01-01

Objective: This study examined the temporal stability (i.e. test-retest reliability) of the Salutogenic Wellness Promotion Scale (SWPS) using intraclass correlation coefficients (ICC). Current intraclass results were also compared to previously published interclass correlations to support the use of the intraclass method for test-retest…
Designing the Nuclear Energy Attitude Scale.

ERIC Educational Resources Information Center

Calhoun, Lawrence; And Others

1988-01-01

Presents a refined method for designing a valid and reliable Likert-type scale to test attitudes toward the generation of electricity from nuclear energy. Discusses various tests of validity that were used on the nuclear energy scale. Reports results of administration and concludes that the test is both reliable and valid. (CW)
Empirical methods for assessing meaningful neuropsychological change following epilepsy surgery.

PubMed

Sawrie, S M; Chelune, G J; Naugle, R I; Lüders, H O

1996-11-01

Traditional methods for assessing the neurocognitive effects of epilepsy surgery are confounded by practice effects, test-retest reliability issues, and regression to the mean. This study employs 2 methods for assessing individual change that allow direct comparison of changes across both individuals and test measures. Fifty-one medically intractable epilepsy patients completed a comprehensive neuropsychological battery twice, approximately 8 months apart, prior to any invasive monitoring or surgical intervention. First, a Reliable Change (RC) index score was computed for each test score to take into account the reliability of that measure, and a cutoff score was empirically derived to establish the limits of statistically reliable change. These indices were subsequently adjusted for expected practice effects. The second approach used a regression technique to establish "change norms" along a common metric that models both expected practice effects and regression to the mean. The RC index scores provide the clinician with a statistical means of determining whether a patient's retest performance is "significantly" changed from baseline. The regression norms for change allow the clinician to evaluate the magnitude of a given patient's change on 1 or more variables along a common metric that takes into account the reliability and stability of each test measure. Case data illustrate how these methods provide an empirically grounded means for evaluating neurocognitive outcomes following medical interventions such as epilepsy surgery.
Evaluation of Reliability Coefficients for Two-Level Models via Latent Variable Analysis

ERIC Educational Resources Information Center

Raykov, Tenko; Penev, Spiridon

2010-01-01

A latent variable analysis procedure for evaluation of reliability coefficients for 2-level models is outlined. The method provides point and interval estimates of group means' reliability, overall reliability of means, and conditional reliability. In addition, the approach can be used to test simple hypotheses about these parameters. The…
Reliability based design including future tests and multiagent approaches

NASA Astrophysics Data System (ADS)

Villanueva, Diane

The initial stages of reliability-based design optimization involve the formulation of objective functions and constraints, and building a model to estimate the reliability of the design with quantified uncertainties. However, even experienced hands often overlook important objective functions and constraints that affect the design. In addition, uncertainty reduction measures, such as tests and redesign, are often not considered in reliability calculations during the initial stages. This research considers two areas that concern the design of engineering systems: 1) the trade-off of the effect of a test and post-test redesign on reliability and cost and 2) the search for multiple candidate designs as insurance against unforeseen faults in some designs. In this research, a methodology was developed to estimate the effect of a single future test and post-test redesign on reliability and cost. The methodology uses assumed distributions of computational and experimental errors with re-design rules to simulate alternative future test and redesign outcomes to form a probabilistic estimate of the reliability and cost for a given design. Further, it was explored how modeling a future test and redesign provides a company an opportunity to balance development costs versus performance by simultaneously designing the design and the post-test redesign rules during the initial design stage. The second area of this research considers the use of dynamic local surrogates, or surrogate-based agents, to locate multiple candidate designs. Surrogate-based global optimization algorithms often require search in multiple candidate regions of design space, expending most of the computation needed to define multiple alternate designs. Thus, focusing on solely locating the best design may be wasteful. We extended adaptive sampling surrogate techniques to locate multiple optima by building local surrogates in sub-regions of the design space to identify optima. The efficiency of this method was studied, and the method was compared to other surrogate-based optimization methods that aim to locate the global optimum using two two-dimensional test functions, a six-dimensional test function, and a five-dimensional engineering example.
Predicting Job Performance for the Visually Impaired: Validity of the Fine Finger Dexterity Work Task.

ERIC Educational Resources Information Center

Giesen, J. Martin; And Others

The study was designed to determine the reliability and criterion validity of a psychomotor performance test (the Fine Finger Dexterity Work Task Unit) with 40 partially or totally blind adults. Reliability was established by using the test-retest method. A supervisory rating was developed and the reliability established by using the split-half…
Effects of Analytical and Holistic Scoring Patterns on Scorer Reliability in Biology Essay Tests

ERIC Educational Resources Information Center

Ebuoh, Casmir N.

2018-01-01

Literature revealed that the patterns/methods of scoring essay tests had been criticized for not being reliable and this unreliability is more likely to be more in internal examinations than in the external examinations. The purpose of this study is to find out the effects of analytical and holistic scoring patterns on scorer reliability in…

Reliability of Fitness Tests Using Methods and Time Periods Common in Sport and Occupational Management

PubMed Central

Burnstein, Bryan D.; Steele, Russell J.; Shrier, Ian

2011-01-01

Context: Fitness testing is used frequently in many areas of physical activity, but the reliability of these measurements under real-world, practical conditions is unknown. Objective: To evaluate the reliability of specific fitness tests using the methods and time periods used in the context of real-world sport and occupational management. Design: Cohort study. Setting: Eighteen different Cirque du Soleil shows. Patients or Other Participants: Cirque du Soleil physical performers who completed 4 consecutive tests (6-month intervals) and were free of injury or illness at each session (n = 238 of 701 physical performers). Intervention(s): Performers completed 6 fitness tests on each assessment date: dynamic balance, Harvard step test, handgrip, vertical jump, pull-ups, and 60-second jump test. Main Outcome Measure(s): We calculated the intraclass coefficient (ICC) and limits of agreement between baseline and each time point and the ICC over all 4 time points combined. Results: Reliability was acceptable (ICC > 0.6) over an 18-month time period for all pairwise comparisons and all time points together for the handgrip, vertical jump, and pull-up assessments. The Harvard step test and 60-second jump test had poor reliability (ICC < 0.6) between baseline and other time points. When we excluded the baseline data and calculated the ICC for 6-month, 12-month, and 18-month time points, both the Harvard step test and 60-second jump test demonstrated acceptable reliability. Dynamic balance was unreliable in all contexts. Limit-of-agreement analysis demonstrated considerable intraindividual variability for some tests and a learning effect by administrators on others. Conclusions: Five of the 6 tests in this battery had acceptable reliability over an 18-month time frame, but the values for certain individuals may vary considerably from time to time for some tests. Specific tests may require a learning period for administrators. PMID:22488138
Software reliability perspectives

NASA Technical Reports Server (NTRS)

Wilson, Larry; Shen, Wenhui

1987-01-01

Software which is used in life critical functions must be known to be highly reliable before installation. This requires a strong testing program to estimate the reliability, since neither formal methods, software engineering nor fault tolerant methods can guarantee perfection. Prior to the final testing software goes through a debugging period and many models have been developed to try to estimate reliability from the debugging data. However, the existing models are poorly validated and often give poor performance. This paper emphasizes the fact that part of their failures can be attributed to the random nature of the debugging data given to these models as input, and it poses the problem of correcting this defect as an area of future research.
Intertester and intratester reliability of a movement control test battery for patients with knee osteoarthritis and controls

PubMed Central

Kaukinen, P.T.; Arokoski, J.P.; Huber, E.O.; Luomajoki, H.A.

2017-01-01

Objectives: To develop a test battery of movement control (MC) tests and assess its intertester and intratester reliability. Methods: 29 subjects with knee OA with mean age of 64.7 (SD 8.7) years and 12 controls without either knee pain or previous diagnosis of OA (mean age 36.6 (SD 16.2) years) were included. Two experienced physiotherapists rated the filmed test performance of six MC tests blinded to the patients and to each other on 3-point scale as correct, incorrect or failed. Weighted kappa coefficient (wK) with 95% confidence interval (95%CI) and the percentage of agreement were calculated for each test. Results: One-leg stance, one-leg squat 30 degrees and step down tests showed moderate to excellent inter- and intratester reliability with wK ranging between 0.43-0.85 for intertester and 0.51-0.80 for intratester reliability. The reliability of the 90 degrees squat test, small squat and step up tests was poor (wK ranging between 0.09-0.50). Conclusions: One-leg stance test, one-leg squat 30 degrees and step down test are reliable in the subjects with knee OA and controls. Further studies are needed to evaluate the discriminative validity of the reliable tests. PMID:28860422
Reliability and Validity of the Footprint Assessment Method Using Photoshop CS5 Software.

PubMed

Gutiérrez-Vilahú, Lourdes; Massó-Ortigosa, Núria; Costa-Tutusaus, Lluís; Guerra-Balic, Myriam

2015-05-01

Several sophisticated methods of footprint analysis currently exist. However, it is sometimes useful to apply standard measurement methods of recognized evidence with an easy and quick application. We sought to assess the reliability and validity of a new method of footprint assessment in a healthy population using Photoshop CS5 software (Adobe Systems Inc, San Jose, California). Forty-two footprints, corresponding to 21 healthy individuals (11 men with a mean ± SD age of 20.45 ± 2.16 years and 10 women with a mean ± SD age of 20.00 ± 1.70 years) were analyzed. Footprints were recorded in static bipedal standing position using optical podography and digital photography. Three trials for each participant were performed. The Hernández-Corvo, Chippaux-Smirak, and Staheli indices and the Clarke angle were calculated by manual method and by computerized method using Photoshop CS5 software. Test-retest was used to determine reliability. Validity was obtained by intraclass correlation coefficient (ICC). The reliability test for all of the indices showed high values (ICC, 0.98-0.99). Moreover, the validity test clearly showed no difference between techniques (ICC, 0.99-1). The reliability and validity of a method to measure, assess, and record the podometric indices using Photoshop CS5 software has been demonstrated. This provides a quick and accurate tool useful for the digital recording of morphostatic foot study parameters and their control.
Interobserver Reliability of the Total Body Score System for Quantifying Human Decomposition.

PubMed

Dabbs, Gretchen R; Connor, Melissa; Bytheway, Joan A

2016-03-01

Several authors have tested the accuracy of the Total Body Score (TBS) method for quantifying decomposition, but none have examined the reliability of the method as a scoring system by testing interobserver error rates. Sixteen participants used the TBS system to score 59 observation packets including photographs and written descriptions of 13 human cadavers in different stages of decomposition (postmortem interval: 2-186 days). Data analysis used a two-way random model intraclass correlation in SPSS (v. 17.0). The TBS method showed "almost perfect" agreement between observers, with average absolute correlation coefficients of 0.990 and average consistency correlation coefficients of 0.991. While the TBS method may have sources of error, scoring reliability is not one of them. Individual component scores were examined, and the influences of education and experience levels were investigated. Overall, the trunk component scores were the least concordant. Suggestions are made to improve the reliability of the TBS method. © 2016 American Academy of Forensic Sciences.
Polish Adult Reading Test (PART) - construction of Polish test for estimating the level of premorbid intelligence in schizophrenia.

PubMed

Karakuła-Juchnowicz, Hanna; Stecka, Mariola

2017-08-29

In view of unavailability in Poland of the standardized methods to measure PIQ, the aim of the work was to develop a Polish test to assess the premorbid level of intelligence - PART(Polish AdultReading Test) and to measureits psychometric properties, such as validity, reliability as well as standardization in the group of schizophrenia patients. The principles of PART construction were based on the idea of popular worldwide National Adult Reading Test by Hazel Nelson. The research comprised a group of 122 subjects (65 schizophrenia patients and 57 healthy people), aged 18-60 years, matched for age and gender. PART appears to be a method with high internal consistency and reliability measured by test-retest, inter-rater reliability, and the method with acceptable diagnostic and prognostic validity. The standardized procedures of PART have been investigated and described. Considering the psychometric values of PART and a short time of its performance, the test may be a useful diagnostic instrument in the assessment of premorbid level of intelligence in a group of schizophrenic patients.
Methodology to Improve Design of Accelerated Life Tests in Civil Engineering Projects

PubMed Central

Lin, Jing; Yuan, Yongbo; Zhou, Jilai; Gao, Jie

2014-01-01

For reliability testing an Energy Expansion Tree (EET) and a companion Energy Function Model (EFM) are proposed and described in this paper. Different from conventional approaches, the EET provides a more comprehensive and objective way to systematically identify external energy factors affecting reliability. The EFM introduces energy loss into a traditional Function Model to identify internal energy sources affecting reliability. The combination creates a sound way to enumerate the energies to which a system may be exposed during its lifetime. We input these energies into planning an accelerated life test, a Multi Environment Over Stress Test. The test objective is to discover weak links and interactions among the system and the energies to which it is exposed, and design them out. As an example, the methods are applied to the pipe in subsea pipeline. However, they can be widely used in other civil engineering industries as well. The proposed method is compared with current methods. PMID:25111800
Reliability of the Berg Balance Scale as a Clinical Measure of Balance in Community-Dwelling Older Adults with Mild to Moderate Alzheimer Disease: A Pilot Study.

PubMed

Muir-Hunter, Susan W; Graham, Laura; Montero Odasso, Manuel

2015-08-01

To measure test-retest and interrater reliability of the Berg Balance Scale (BBS) in community-dwelling adults with mild to moderate Alzheimer disease (AD). Method : A sample of 15 adults (mean age 80.20 [SD 5.03] years) with AD performed three balance tests: the BBS, timed up-and-go test (TUG), and Functional Reach Test (FRT). Both relative reliability, using the intra-class correlation coefficient (ICC), and absolute reliability, using standard error of measurement (SEM) and minimal detectable change (MDC95) values, were calculated; Bland-Altman plots were constructed to evaluate inter-tester agreement. The test-retest interval was 1 week. Results : For the BBS, relative reliability values were 0.95 (95% CI, 0.85-0.98) for test-retest reliability and 0.72 (95% CI, 0.31-0.91) for interrater reliability; SEM was 6.01 points and MDC95 was 16.66 points; and interrater agreement was 16.62 points. The BBS performed better in test-retest reliability than the TUG and FRT, tests with established reliability in AD. Between 33% and 50% of participants required cueing beyond standardized instructions because they were unable to remember test instructions. Conclusions : The BBS achieved relative reliability values that support its clinical utility, but MDC95 and agreement values indicate the scale has performance limitations in AD. Further research to optimize balance assessment for people with AD is required.
Feasibility and Reliability of Two Different Walking Tests in People with Severe Intellectual and Sensory Disabilities

ERIC Educational Resources Information Center

Waninge, A.; Evenhuis, I. J.; van Wijck, R.; van der Schans, C. P.

2011-01-01

Background: The purpose of this study is to describe feasibility and test-retest reliability of the six-minute walking distance test (6MWD) and an adapted shuttle run test (aSRT) in persons with severe intellectual and sensory (multiple) disabilities. Materials and Methods: Forty-seven persons with severe multiple disabilities, with Gross Motor…
Reliability and Responsiveness of the Movement Assessment Battery for Children--Second Edition Test in Children with Developmental Coordination Disorder

ERIC Educational Resources Information Center

Wuang, Yee-Pay; Su, Jui-Hsing; Su, Chwen-Yng

2012-01-01

Aim: To examine the internal consistency, test-retest reliability, and responsiveness of the Movement Assessment Battery for Children--Second Edition (MABC-2) Test for children with developmental coordination disorder (DCD). Method: One hundred and forty-four Taiwanese children with DCD aged 6 to 12 years (87 males, 57 females) were tested on…
Content validity and reliability of test of gross motor development in Chilean children

PubMed Central

Cano-Cappellacci, Marcelo; Leyton, Fernanda Aleitte; Carreño, Joshua Durán

2016-01-01

ABSTRACT OBJECTIVE To validate a Spanish version of the Test of Gross Motor Development (TGMD-2) for the Chilean population. METHODS Descriptive, transversal, non-experimental validity and reliability study. Four translators, three experts and 92 Chilean children, from five to 10 years, students from a primary school in Santiago, Chile, have participated. The Committee of Experts has carried out translation, back-translation and revision processes to determine the translinguistic equivalence and content validity of the test, using the content validity index in 2013. In addition, a pilot implementation was achieved to determine test reliability in Spanish, by using the intraclass correlation coefficient and Bland-Altman method. We evaluated whether the results presented significant differences by replacing the bat with a racket, using T-test. RESULTS We obtained a content validity index higher than 0.80 for language clarity and relevance of the TGMD-2 for children. There were significant differences in the object control subtest when comparing the results with bat and racket. The intraclass correlation coefficient for reliability inter-rater, intra-rater and test-retest reliability was greater than 0.80 in all cases. CONCLUSIONS The TGMD-2 has appropriate content validity to be applied in the Chilean population. The reliability of this test is within the appropriate parameters and its use could be recommended in this population after the establishment of normative data, setting a further precedent for the validation in other Latin American countries. PMID:26815160
Review on pen-and-paper-based observational methods for assessing ergonomic risk factors of computer work.

PubMed

Rahman, Mohd Nasrull Abdol; Mohamad, Siti Shafika

2017-01-01

Computer works are associated with Musculoskeletal Disorders (MSDs). There are several methods have been developed to assess computer work risk factor related to MSDs. This review aims to give an overview of current techniques available for pen-and-paper-based observational methods in assessing ergonomic risk factors of computer work. We searched an electronic database for materials from 1992 until 2015. The selected methods were focused on computer work, pen-and-paper observational methods, office risk factors and musculoskeletal disorders. This review was developed to assess the risk factors, reliability and validity of pen-and-paper observational method associated with computer work. Two evaluators independently carried out this review. Seven observational methods used to assess exposure to office risk factor for work-related musculoskeletal disorders were identified. The risk factors involved in current techniques of pen and paper based observational tools were postures, office components, force and repetition. From the seven methods, only five methods had been tested for reliability. They were proven to be reliable and were rated as moderate to good. For the validity testing, from seven methods only four methods were tested and the results are moderate. Many observational tools already exist, but no single tool appears to cover all of the risk factors including working posture, office component, force, repetition and office environment at office workstations and computer work. Although the most important factor in developing tool is proper validation of exposure assessment techniques, the existing observational method did not test reliability and validity. Futhermore, this review could provide the researchers with ways on how to improve the pen-and-paper-based observational method for assessing ergonomic risk factors of computer work.
The role of test-retest reliability in measuring individual and group differences in executive functioning.

PubMed

Paap, Kenneth R; Sawi, Oliver

2016-12-01

Studies testing for individual or group differences in executive functioning can be compromised by unknown test-retest reliability. Test-retest reliabilities across an interval of about one week were obtained from performance in the antisaccade, flanker, Simon, and color-shape switching tasks. There is a general trade-off between the greater reliability of single mean RT measures, and the greater process purity of measures based on contrasts between mean RTs in two conditions. The individual differences in RT model recently developed by Miller and Ulrich was used to evaluate the trade-off. Test-retest reliability was statistically significant for 11 of the 12 measures, but was of moderate size, at best, for the difference scores. The test-retest reliabilities for the Simon and flanker interference scores were lower than those for switching costs. Standard practice evaluates the reliability of executive-functioning measures using split-half methods based on data obtained in a single day. Our test-retest measures of reliability are lower, especially for difference scores. These reliability measures must also take into account possible day effects that classical test theory assumes do not occur. Measures based on single mean RTs tend to have acceptable levels of reliability and convergent validity, but are "impure" measures of specific executive functions. The individual differences in RT model shows that the impurity problem is worse than typically assumed. However, the "purer" measures based on difference scores have low convergent validity that is partly caused by deficiencies in test-retest reliability. Copyright © 2016 Elsevier B.V. All rights reserved.
Effects of test method and participant musical training on preference ratings of stimuli with different reverberation times.

PubMed

Lawless, Martin S; Vigeant, Michelle C

2017-10-01

Selecting an appropriate listening test design for concert hall research depends on several factors, including listening test method and participant critical-listening experience. Although expert listeners afford more reliable data, their perceptions may not be broadly representative. The present paper contains two studies that examined the validity and reliability of the data obtained from two listening test methods, a successive and a comparative method, and two types of participants, musicians and non-musicians. Participants rated their overall preference of auralizations generated from eight concert hall conditions with a range of reverberation times (0.0-7.2 s). Study 1, with 34 participants, assessed the two methods. The comparative method yielded similar results and reliability as the successive method. Additionally, the comparative method was rated as less difficult and more preferable. For study 2, an additional 37 participants rated the stimuli using the comparative method only. An analysis of variance of the responses from both studies revealed that musicians are better than non-musicians at discerning their preferences across stimuli. This result was confirmed with a k-means clustering analysis on the entire dataset that revealed five preference groups. Four groups exhibited clear preferences to the stimuli, while the fifth group, predominantly comprising non-musicians, demonstrated no clear preference.
Reliability evaluation methodology for NASA applications

NASA Technical Reports Server (NTRS)

Taneja, Vidya S.

1992-01-01

Liquid rocket engine technology has been characterized by the development of complex systems containing large number of subsystems, components, and parts. The trend to even larger and more complex system is continuing. The liquid rocket engineers have been focusing mainly on performance driven designs to increase payload delivery of a launch vehicle for a given mission. In otherwords, although the failure of a single inexpensive part or component may cause the failure of the system, reliability in general has not been considered as one of the system parameters like cost or performance. Up till now, quantification of reliability has not been a consideration during system design and development in the liquid rocket industry. Engineers and managers have long been aware of the fact that the reliability of the system increases during development, but no serious attempts have been made to quantify reliability. As a result, a method to quantify reliability during design and development is needed. This includes application of probabilistic models which utilize both engineering analysis and test data. Classical methods require the use of operating data for reliability demonstration. In contrast, the method described in this paper is based on similarity, analysis, and testing combined with Bayesian statistical analysis.
Measuring reliable change in cognition using the Edinburgh Cognitive and Behavioural ALS Screen (ECAS).

PubMed

Crockford, Christopher; Newton, Judith; Lonergan, Katie; Madden, Caoifa; Mays, Iain; O'Sullivan, Meabhdh; Costello, Emmet; Pinto-Grau, Marta; Vajda, Alice; Heverin, Mark; Pender, Niall; Al-Chalabi, Ammar; Hardiman, Orla; Abrahams, Sharon

2018-02-01

Cognitive impairment affects approximately 50% of people with amyotrophic lateral sclerosis (ALS). Research has indicated that impairment may worsen with disease progression. The Edinburgh Cognitive and Behavioural ALS Screen (ECAS) was designed to measure neuropsychological functioning in ALS, with its alternate forms (ECAS-A, B, and C) allowing for serial assessment over time. The aim of the present study was to establish reliable change scores for the alternate forms of the ECAS, and to explore practice effects and test-retest reliability of the ECAS's alternate forms. Eighty healthy participants were recruited, with 57 completing two and 51 completing three assessments. Participants were administered alternate versions of the ECAS serially (A-B-C) at four-month intervals. Intra-class correlation analysis was employed to explore test-retest reliability, while analysis of variance was used to examine the presence of practice effects. Reliable change indices (RCI) and regression-based methods were utilized to establish change scores for the ECAS alternate forms. Test-retest reliability was excellent for ALS Specific, ALS Non-Specific, and ECAS Total scores of the combined ECAS A, B, and C (all > .90). No significant practice effects were observed over the three testing sessions. RCI and regression-based methods produced similar change scores. The alternate forms of the ECAS possess excellent test-retest reliability in a healthy control sample, with no significant practice effects. The use of conservative RCI scores is recommended. Therefore, a change of ≥8, ≥4, and ≥9 for ALS Specific, ALS Non-Specific, and ECAS Total score is required for reliable change.
Development of a diagnostic test set to assess agreement in breast pathology: practical application of the Guidelines for Reporting Reliability and Agreement Studies (GRRAS).

PubMed

Oster, Natalia V; Carney, Patricia A; Allison, Kimberly H; Weaver, Donald L; Reisch, Lisa M; Longton, Gary; Onega, Tracy; Pepe, Margaret; Geller, Berta M; Nelson, Heidi D; Ross, Tyler R; Tosteson, Aanna N A; Elmore, Joann G

2013-02-05

Diagnostic test sets are a valuable research tool that contributes importantly to the validity and reliability of studies that assess agreement in breast pathology. In order to fully understand the strengths and weaknesses of any agreement and reliability study, however, the methods should be fully reported. In this paper we provide a step-by-step description of the methods used to create four complex test sets for a study of diagnostic agreement among pathologists interpreting breast biopsy specimens. We use the newly developed Guidelines for Reporting Reliability and Agreement Studies (GRRAS) as a basis to report these methods. Breast tissue biopsies were selected from the National Cancer Institute-funded Breast Cancer Surveillance Consortium sites. We used a random sampling stratified according to woman's age (40-49 vs. ≥50), parenchymal breast density (low vs. high) and interpretation of the original pathologist. A 3-member panel of expert breast pathologists first independently interpreted each case using five primary diagnostic categories (non-proliferative changes, proliferative changes without atypia, atypical ductal hyperplasia, ductal carcinoma in situ, and invasive carcinoma). When the experts did not unanimously agree on a case diagnosis a modified Delphi method was used to determine the reference standard consensus diagnosis. The final test cases were stratified and randomly assigned into one of four unique test sets. We found GRRAS recommendations to be very useful in reporting diagnostic test set development and recommend inclusion of two additional criteria: 1) characterizing the study population and 2) describing the methods for reference diagnosis, when applicable.
The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Finelli, George B.

1991-01-01

This paper affirms that the quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or fault-tolerant software. The classical methods of estimating reliability are shown to lead to exhorbitant amounts of testing when applied to life-critical software. Reliability growth models are examined and also shown to be incapable of overcoming the need for excessive amounts of testing. The key assumption of software fault tolerance separately programmed versions fail independently is shown to be problematic. This assumption cannot be justified by experimentation in the ultrareliability region and subjective arguments in its favor are not sufficiently strong to justify it as an axiom. Also, the implications of the recent multiversion software experiments support this affirmation.
Assessment of NDE reliability data

NASA Technical Reports Server (NTRS)

Yee, B. G. W.; Couchman, J. C.; Chang, F. H.; Packman, D. F.

1975-01-01

Twenty sets of relevant nondestructive test (NDT) reliability data were identified, collected, compiled, and categorized. A criterion for the selection of data for statistical analysis considerations was formulated, and a model to grade the quality and validity of the data sets was developed. Data input formats, which record the pertinent parameters of the defect/specimen and inspection procedures, were formulated for each NDE method. A comprehensive computer program was written and debugged to calculate the probability of flaw detection at several confidence limits by the binomial distribution. This program also selects the desired data sets for pooling and tests the statistical pooling criteria before calculating the composite detection reliability. An example of the calculated reliability of crack detection in bolt holes by an automatic eddy current method is presented.
Estimating the Reliability of a Test Battery Composite or a Test Score Based on Weighted Item Scoring

ERIC Educational Resources Information Center

Feldt, Leonard S.

2004-01-01

In some settings, the validity of a battery composite or a test score is enhanced by weighting some parts or items more heavily than others in the total score. This article describes methods of estimating the total score reliability coefficient when differential weights are used with items or parts.

TCOPPE school environmental audit tool: assessing safety and walkability of school environments.

PubMed

Lee, Chanam; Kim, Hyung Jin; Dowdy, Diane M; Hoelscher, Deanna M; Ory, Marcia G

2013-09-01

Several environmental audit instruments have been developed for assessing streets, parks and trails, but none for schools. This paper introduces a school audit tool that includes 3 subcomponents: 1) street audit, 2) school site audit, and 3) map audit. It presents the conceptual basis and the development process of this instrument, and the methods and results of the reliability assessments. Reliability tests were conducted by 2 trained auditors on 12 study schools (high-low income and urban-suburban-rural settings). Kappa statistics (categorical, factual items) and ICC (Likert-scale, perceptual items) were used to assess a) interrater, b) test-retest, and c) peak vs. off-peak hour reliability tests. For the interrater reliability test, the average Kappa was 0.839 and the ICC was 0.602. For the test-retest reliability, the average Kappa was 0.903 and the ICC was 0.774. The peak-off peak reliability was 0.801. Rural schools showed the most consistent results in the peak-off peak and test-retest assessments. For interrater tests, urban schools showed the highest ICC, and rural schools showed the highest Kappa. Most items achieved moderate to high levels of reliabilities in all study schools. With proper training, this audit can be used to assess school environments reliably for research, outreach, and policy-support purposes.
An alternative to the balance error scoring system: using a low-cost balance board to improve the validity/reliability of sports-related concussion balance testing.

PubMed

Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J

2014-05-01

Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.
Reliability and Validity of the Footprint Assessment Method Using Photoshop CS5 Software in Young People with Down Syndrome.

PubMed

Gutiérrez-Vilahú, Lourdes; Massó-Ortigosa, Núria; Rey-Abella, Ferran; Costa-Tutusaus, Lluís; Guerra-Balic, Myriam

2016-05-01

People with Down syndrome present skeletal abnormalities in their feet that can be analyzed by commonly used gold standard indices (the Hernández-Corvo index, the Chippaux-Smirak index, the Staheli arch index, and the Clarke angle) based on footprint measurements. The use of Photoshop CS5 software (Adobe Systems Software Ireland Ltd, Dublin, Ireland) to measure footprints has been validated in the general population. The present study aimed to assess the reliability and validity of this footprint assessment technique in the population with Down syndrome. Using optical podography and photography, 44 footprints from 22 patients with Down syndrome (11 men [mean ± SD age, 23.82 ± 3.12 years] and 11 women [mean ± SD age, 24.82 ± 6.81 years]) were recorded in a static bipedal standing position. A blinded observer performed the measurements using a validated manual method three times during the 4-month study, with 2 months between measurements. Test-retest was used to check the reliability of the Photoshop CS5 software measurements. Validity and reliability were obtained by intraclass correlation coefficient (ICC). The reliability test for all of the indices showed very good values for the Photoshop CS5 method (ICC, 0.982-0.995). Validity testing also found no differences between the techniques (ICC, 0.988-0.999). The Photoshop CS5 software method is reliable and valid for the study of footprints in young people with Down syndrome.
Long-term Mechanical Circulatory Support System reliability recommendation by the National Clinical Trial Initiative subcommittee.

PubMed

Lee, James

2009-01-01

The Long-Term Mechanical Circulatory Support (MCS) System Reliability Recommendation was published in the American Society for Artificial Internal Organs (ASAIO) Journal and the Annals of Thoracic Surgery in 1998. At that time, it was stated that the document would be periodically reviewed to assess its timeliness and appropriateness within 5 years. Given the wealth of clinical experience in MCS systems, a new recommendation has been drafted by consensus of a group of representatives from the medical community, academia, industry, and government. The new recommendation describes a reliability test methodology and provides detailed reliability recommendations. In addition, the new recommendation provides additional information and clinical data in appendices that are intended to assist the reliability test engineer in the development of a reliability test that is expected to give improved predictions of clinical reliability compared with past test methods. The appendices are available for download at the ASAIO journal web site at www.asaiojournal.com.
Aerospace reliability applied to biomedicine.

NASA Technical Reports Server (NTRS)

Lalli, V. R.; Vargo, D. J.

1972-01-01

An analysis is presented that indicates that the reliability and quality assurance methodology selected by NASA to minimize failures in aerospace equipment can be applied directly to biomedical devices to improve hospital equipment reliability. The Space Electric Rocket Test project is used as an example of NASA application of reliability and quality assurance (R&QA) methods. By analogy a comparison is made to show how these same methods can be used in the development of transducers, instrumentation, and complex systems for use in medicine.
Design Evaluation of High Reliability Lithium Batteries

NASA Technical Reports Server (NTRS)

Buchman, R. C.; Helgeson, W. D.; Istephanous, N. S.

1985-01-01

Within one year, a lithium battery design can be qualified for device use through the application of accelerated discharge testing, calorimetry measurements, real time tests and other supplemental testing. Materials and corrosion testing verify that the battery components remain functional during expected battery life. By combining these various methods, a high reliability lithium battery can be manufactured for applications which require zero defect battery performance.
Reliability of the Quality of Upper Extremity Skills Test for Children with Cerebral Palsy Aged 2 to 12 Years

ERIC Educational Resources Information Center

Thorley, Megan; Lannin, Natasha; Cusick, Anne; Novak, Iona; Boyd, Roslyn

2012-01-01

Aim: To investigate reliability of the Quality of Upper Extremity Skills Test (QUEST) scores for children with cerebral palsy (CP) aged 2-12 years. Method: Thirty-one QUESTs from 24 children with CP were rated once by two raters and twice by one rater. Internal consistency of total scores, inter- and intra-rater reliability findings for total,…
Scale for positive aspects of caregiving experience: development, reliability, and factor structure.

PubMed

Kate, N; Grover, S; Kulhara, P; Nehra, R

2012-06-01

OBJECTIVE. To develop an instrument (Scale for Positive Aspects of Caregiving Experience [SPACE]) that evaluates positive caregiving experience and assess its psychometric properties. METHODS. Available scales which assess some aspects of positive caregiving experience were reviewed and a 50-item questionnaire with a 5-point rating was constructed. In all, 203 primary caregivers of patients with severe mental disorders were asked to complete the questionnaire. Internal consistency, test-retest reliability, cross-language reliability, split-half reliability, and face validity were evaluated. Principal component factor analysis was run to assess the factorial validity of the scale. RESULTS. The scale developed as part of the study was found to have good internal consistency, test-retest reliability, cross-language reliability, split-half reliability, and face validity. Principal component factor analysis yielded a 4-factor structure, which also had good test-retest reliability and cross-language reliability. There was a strong correlation between the 4 factors obtained. CONCLUSION. The SPACE developed as part of this study has good psychometric properties.
Reliability Issues and Solutions in Flexible Electronics Under Mechanical Fatigue

NASA Astrophysics Data System (ADS)

Yi, Seol-Min; Choi, In-Suk; Kim, Byoung-Joon; Joo, Young-Chang

2018-07-01

Flexible devices are of significant interest due to their potential expansion of the application of smart devices into various fields, such as energy harvesting, biological applications and consumer electronics. Due to the mechanically dynamic operations of flexible electronics, their mechanical reliability must be thoroughly investigated to understand their failure mechanisms and lifetimes. Reliability issue caused by bending fatigue, one of the typical operational limitations of flexible electronics, has been studied using various test methodologies; however, electromechanical evaluations which are essential to assess the reliability of electronic devices for flexible applications had not been investigated because the testing method was not established. By employing the in situ bending fatigue test, we has studied the failure mechanism for various conditions and parameters, such as bending strain, fatigue area, film thickness, and lateral dimensions. Moreover, various methods for improving the bending reliability have been developed based on the failure mechanism. Nanostructures such as holes, pores, wires and composites of nanoparticles and nanotubes have been suggested for better reliability. Flexible devices were also investigated to find the potential failures initiated by complex structures under bending fatigue strain. In this review, the recent advances in test methodology, mechanism studies, and practical applications are introduced. Additionally, perspectives including the future advance to stretchable electronics are discussed based on the current achievements in research.
The Validity and Reliability Test of the Indonesian Version of Gastroesophageal Reflux Disease Quality of Life (GERD-QOL) Questionnaire.

PubMed

Siahaan, Laura A; Syam, Ari F; Simadibrata, Marcellus; Setiati, Siti

2017-01-01

to obtain a valid and reliable GERD-QOL questionnaire for Indonesian application. at the initial stage, the GERD-QOL questionnaire was first translated into Indonesian language and the translated questionnaire was subsequently translated back into the original language (back-to-back translation). The results were evaluated by the researcher team and therefore, an Indonesian version of GERD-QOL questionnaire was developed. Ninety-one patients who had been clinically diagnosed with GERD based on the Montreal criteria were interviewed using the Indonesian version of GERD-QOL questionnaire and the SF 36 questionnaire. The validity was evaluated using a method of construct validity and external validity, and reliability can be tested by the method of internal consistency and test retest. the Indonesian version of GERD-QOL questionnaire had a good internal consistency reliability with a Cronbach Alpha of 0.687-0.842 and a good test retest reliability with an intra-class correlation coefficient of 0.756-0.936; p<0.05). The questionnaire had also been demonstrated to have a good validity with a proven high correlation to each question of SF-36 (p<0.05). the Indonesian version of GERD-QOL questionnaire has been proven valid and reliable to evaluate the quality of life of GERD patients.
Reliability Issues and Solutions in Flexible Electronics Under Mechanical Fatigue

NASA Astrophysics Data System (ADS)

Yi, Seol-Min; Choi, In-Suk; Kim, Byoung-Joon; Joo, Young-Chang

2018-03-01

Flexible devices are of significant interest due to their potential expansion of the application of smart devices into various fields, such as energy harvesting, biological applications and consumer electronics. Due to the mechanically dynamic operations of flexible electronics, their mechanical reliability must be thoroughly investigated to understand their failure mechanisms and lifetimes. Reliability issue caused by bending fatigue, one of the typical operational limitations of flexible electronics, has been studied using various test methodologies; however, electromechanical evaluations which are essential to assess the reliability of electronic devices for flexible applications had not been investigated because the testing method was not established. By employing the in situ bending fatigue test, we has studied the failure mechanism for various conditions and parameters, such as bending strain, fatigue area, film thickness, and lateral dimensions. Moreover, various methods for improving the bending reliability have been developed based on the failure mechanism. Nanostructures such as holes, pores, wires and composites of nanoparticles and nanotubes have been suggested for better reliability. Flexible devices were also investigated to find the potential failures initiated by complex structures under bending fatigue strain. In this review, the recent advances in test methodology, mechanism studies, and practical applications are introduced. Additionally, perspectives including the future advance to stretchable electronics are discussed based on the current achievements in research.
The precision and torque production of common hip adductor squeeze tests used in elite football.

PubMed

Light, N; Thorborg, K

2016-11-01

Decreased hip adductor strength is a known risk factor for groin injury in footballers, with clinicians testing adductor strength in various positions and using different protocols. Understanding how reliable and how much torque different adductor squeeze tests produce will facilitate choosing the most appropriate method for future testing. In this study, the reliability and torque production of three common adductor squeeze tests were investigated. Test-retest reliability and cross-sectional comparison. Twenty elite level footballers (16-33 years) without previous or current groin pain were recruited. Relative and absolute test-retest reliability, and torque production of three adductor squeeze tests (long-lever in abduction, short-lever in adduction and short-lever in abduction/external rotation) were investigated. Each participant performed a series of isometric strength tests measured by hand-held dynamometry in each position, on two test days separated by two weeks. No systematic variation was seen for any of the tests when using the mean of three measures (ICC=0.84-0.97, MDC%=6.6-19.5). The smallest variation was observed when taking the mean of three repetitions in the long-lever position (ICC=0.97, MDC%=6.6). The long-lever test also yielded the highest mean torque values, which were 69% and 11% higher than the short-lever in adduction test and short-lever in abduction/external rotation test respectively (p<0.001). All three tests described in this study are reliable methods of measuring adductor squeeze strength. However, the test performed in the long-lever position seems the most promising as it displays high test-retest precision and the highest adductor torque production. Copyright © 2015 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Laser notching ceramics for reliable fracture toughness testing

DOE PAGES

Barth, Holly D.; Elmer, John W.; Freeman, Dennis C.; ...

2015-09-19

A new method for notching ceramics was developed using a picosecond laser for fracture toughness testing of alumina samples. The test geometry incorporated a single-edge-V-notch that was notched using picosecond laser micromachining. This method has been used in the past for cutting ceramics, and is known to remove material with little to no thermal effect on the surrounding material matrix. This study showed that laser-assisted-machining for fracture toughness testing of ceramics was reliable, quick, and cost effective. In order to assess the laser notched single-edge-V-notch beam method, fracture toughness results were compared to results from other more traditional methods, specificallymore » surface-crack in flexure and the chevron notch bend tests. Lastly, the results showed that picosecond laser notching produced precise notches in post-failure measurements, and that the measured fracture toughness results showed improved consistency compared to traditional fracture toughness methods.« less
Historical Development of Asphalt Content Determination by the Ignition Method

DOT National Transportation Integrated Search

1996-01-01

This study was conducted to develop a reliable, detailed test procedure for determining asphalt cement (AC) content by the ignition method. The goal was to minimize the overall test time as well as technician time, and to produce a test method with a...
MEASURING SPORT-SPECIFIC PHYSICAL ABILITIES IN MALE GYMNASTS: THE MEN'S GYMNASTICS FUNCTIONAL MEASUREMENT TOOL

PubMed Central

Kenyon, Lisa K.; Elliott, James M; Cheng, M. Samuel

2016-01-01

Purpose/Background Despite the availability of various field-tests for many competitive sports, a reliable and valid test specifically developed for use in men's gymnastics has not yet been developed. The Men's Gymnastics Functional Measurement Tool (MGFMT) was designed to assess sport-specific physical abilities in male competitive gymnasts. The purpose of this study was to develop the MGFMT by establishing a scoring system for individual test items and to initiate the process of establishing test-retest reliability and construct validity. Methods A total of 83 competitive male gymnasts ages 7-18 underwent testing using the MGFMT. Thirty of these subjects underwent re-testing one week later in order to assess test-retest reliability. Construct validity was assessed using a simple regression analysis between total MGFMT scores and the gymnasts’ USA-Gymnastics competitive level to calculate the coefficient of determination (r2). Test-retest reliability was analyzed using Model 1 Intraclass correlation coefficients (ICC). Statistical significance was set at the p<0.05 level. Results The relationship between total MGFMT scores and subjects’ current USA-Gymnastics competitive level was found to be good (r2 = 0.63). Reliability testing of the MGFMT composite test score showed excellent test-retest reliability over a one-week period (ICC = 0.97). Test-retest reliability of the individual component tests ranged from good to excellent (ICC = 0.75-0.97). Conclusions The results of this study provide initial support for the construct validity and test-retest reliability of the MGFMT. Level of Evidence Level 3 PMID:27999723
The influence of various test plans on mission reliability. [for Shuttle Spacelab payloads

NASA Technical Reports Server (NTRS)

Stahle, C. V.; Gongloff, H. R.; Young, J. P.; Keegan, W. B.

1977-01-01

Methods have been developed for the evaluation of cost effective vibroacoustic test plans for Shuttle Spacelab payloads. The shock and vibration environments of components have been statistically represented, and statistical decision theory has been used to evaluate the cost effectiveness of five basic test plans with structural test options for two of the plans. Component, subassembly, and payload testing have been performed for each plan along with calculations of optimum test levels and expected costs. The tests have been ranked according to both minimizing expected project costs and vibroacoustic reliability. It was found that optimum costs may vary up to $6 million with the lowest plan eliminating component testing and maintaining flight vibration reliability via subassembly tests at high acoustic levels.
A systematic review of publications assessing reliability and validity of the Behavioral Risk Factor Surveillance System (BRFSS), 2004–2011

PubMed Central

2013-01-01

Background In recent years response rates on telephone surveys have been declining. Rates for the behavioral risk factor surveillance system (BRFSS) have also declined, prompting the use of new methods of weighting and the inclusion of cell phone sampling frames. A number of scholars and researchers have conducted studies of the reliability and validity of the BRFSS estimates in the context of these changes. As the BRFSS makes changes in its methods of sampling and weighting, a review of reliability and validity studies of the BRFSS is needed. Methods In order to assess the reliability and validity of prevalence estimates taken from the BRFSS, scholarship published from 2004–2011 dealing with tests of reliability and validity of BRFSS measures was compiled and presented by topics of health risk behavior. Assessments of the quality of each publication were undertaken using a categorical rubric. Higher rankings were achieved by authors who conducted reliability tests using repeated test/retest measures, or who conducted tests using multiple samples. A similar rubric was used to rank validity assessments. Validity tests which compared the BRFSS to physical measures were ranked higher than those comparing the BRFSS to other self-reported data. Literature which undertook more sophisticated statistical comparisons was also ranked higher. Results Overall findings indicated that BRFSS prevalence rates were comparable to other national surveys which rely on self-reports, although specific differences are noted for some categories of response. BRFSS prevalence rates were less similar to surveys which utilize physical measures in addition to self-reported data. There is very little research on reliability and validity for some health topics, but a great deal of information supporting the validity of the BRFSS data for others. Conclusions Limitations of the examination of the BRFSS were due to question differences among surveys used as comparisons, as well as mode of data collection differences. As the BRFSS moves to incorporating cell phone data and changing weighting methods, a review of reliability and validity research indicated that past BRFSS landline only data were reliable and valid as measured against other surveys. New analyses and comparisons of BRFSS data which include the new methodologies and cell phone data will be needed to ascertain the impact of these changes on estimates in the future. PMID:23522349
Reliability and Validity of the TIMPSI for Infants With Spinal Muscular Atrophy Type I

PubMed Central

Krosschell, Kristin J.; Maczulski, Jo Anne; Scott, Charles; King, Wendy; Hartman, Jill T.; Case, Laura E.; Viazzo-Trussell, Donata; Wood, Janine; Roman, Carolyn A.; Hecker, Eva; Meffert, Marianne; Léveillé, Maude; Kienitz, Krista; Swoboda, Kathryn J.

2014-01-01

Purpose This study examined the reliability and validity of the Test of Infant Motor Performance Screening Items (TIMPSI) in infants with type I spinal muscular atrophy (SMA). Methods After training, 12 evaluators scored 4 videos of infants with type I SMA to assess interrater reliability. Intrarater and test-retest reliability was further assessed for 9 evaluators during a SMA type I clinical trial, with 9 evaluators testing a total of 38 infants twice. Relatedness of the TIMPSI score to ability to reach and ventilatory support was also examined. Results Excellent interrater video score reliability was noted (intraclass correlation coefficient, 0.97–0.98). Intrarater reliability was excellent (intraclass correlation coefficient, 0.91–0.98) and test-retest reliability ranged from r = 0.82 to r = 0.95. The TIMPSI score was related to the ability to reach (P ≤ .05). Conclusion The TIMPSI can reliably be used to assess motor function in infants with type I SMA. In addition, the TIMPSI scores are related to the ability to reach, an important functional skill in children with type I SMA. PMID:23542189
Short-term test-retest-reliability of conditioned pain modulation using the cold-heat-pain method in healthy subjects and its correlation to parameters of standardized quantitative sensory testing.

PubMed

Gehling, Julia; Mainka, Tina; Vollert, Jan; Pogatzki-Zahn, Esther M; Maier, Christoph; Enax-Krumova, Elena K

2016-08-05

Conditioned Pain Modulation (CPM) is often used to assess human descending pain inhibition. Nine different studies on the test-retest-reliability of different CPM paradigms have been published, but none of them has investigated the commonly used heat-cold-pain method. The results vary widely and therefore, reliability measures cannot be extrapolated from one CPM paradigm to another. Aim of the present study was to analyse the test-retest-reliability of the common heat-cold-pain method and its correlation to pain thresholds. We tested the short-term test-retest-reliability within 40 ± 19.9 h using a cold-water immersion (10 °C, left hand) as conditioning stimulus (CS) and heat pain (43-49 °C, pain intensity 60 ± 5 on the 101-point numeric rating scale, right forearm) as test stimulus (TS) in 25 healthy right-handed subjects (12females, 31.6 ± 14.1 years). The TS was applied 30s before (TSbefore), during (TSduring) and after (TSafter) the 60s CS. The difference between the pain ratings for TSbefore and TSduring represents the early CPM-effect, between TSbefore and TSafter the late CPM-effect. Quantitative sensory testing (QST, DFNS protocol) was performed on both sessions before the CPM assessment. paired t-tests, Intraclass correlation coefficient (ICC), standard error of measurement (SEM), smallest real difference (SRD), Pearson's correlation, Bland-Altman analysis, significance level p < 0.05 with Bonferroni correction for multiple comparisons, when necessary. Pain ratings during CPM correlated significantly (ICC: 0.411…0.962) between both days, though ratings for TSafter were lower on day 2 (p < 0.005). The early (day 1: 16.7 ± 11.7; day 2: 19.5 ± 11.9; ICC: 0.618, SRD: 20.2) and late (day 1: 1.7 ± 9.2; day 2: 7.6 ± 11.5; ICC: 0.178, SRD: 27.0) CPM effect did not differ significantly between both days. Both early and late CPM-effects did not correlate with the pain thresholds. The short-term test-retest-reliability of the early CPM-effect using the heat-cold-pain method in healthy subjects achieved satisfying results in terms of the ICC. The SRD of the early CPM effect showed that an individual change of > 20 NRS can be attributed to a real change rather than chance. The late CPM-effect was weaker and not reliable.
Establishing survey validity and reliability for American Indians through "think aloud" and test-retest methods.

PubMed

Hauge, Cindy Horst; Jacobs-Knight, Jacque; Jensen, Jamie L; Burgess, Katherine M; Puumala, Susan E; Wilton, Georgiana; Hanson, Jessica D

2015-06-01

The purpose of this study was to use a mixed-methods approach to determine the validity and reliability of measurements used within an alcohol-exposed pregnancy prevention program for American Indian women. To develop validity, content experts provided input into the survey measures, and a "think aloud" methodology was conducted with 23 American Indian women. After revising the measurements based on this input, a test-retest was conducted with 79 American Indian women who were randomized to complete either the original measurements or the new, modified measurements. The test-retest revealed that some of the questions performed better for the modified version, whereas others appeared to be more reliable for the original version. The mixed-methods approach was a useful methodology for gathering feedback on survey measurements from American Indian participants and in indicating specific survey questions that needed to be modified for this population. © The Author(s) 2015.

Concurrent validity of different functional and neuroproteomic pain assessment methods in the rat osteoarthritis monosodium iodoacetate (MIA) model.

PubMed

Otis, Colombe; Gervais, Julie; Guillot, Martin; Gervais, Julie-Anne; Gauvin, Dominique; Péthel, Catherine; Authier, Simon; Dansereau, Marc-André; Sarret, Philippe; Martel-Pelletier, Johanne; Pelletier, Jean-Pierre; Beaudry, Francis; Troncy, Eric

2016-06-23

Lack of validity in osteoarthritis pain models and assessment methods is suspected. Our goal was to 1) assess the repeatability and reproducibility of measurement and the influence of environment, and acclimatization, to different pain assessment outcomes in normal rats, and 2) test the concurrent validity of the most reliable methods in relation to the expression of different spinal neuropeptides in a chemical model of osteoarthritic pain. Repeatability and inter-rater reliability of reflexive nociceptive mechanical thresholds, spontaneous static weight-bearing, treadmill, rotarod, and operant place escape/avoidance paradigm (PEAP) were assessed by the intraclass correlation coefficient (ICC). The most reliable acclimatization protocol was determined by comparing coefficients of variation. In a pilot comparative study, the sensitivity and responsiveness to treatment of the most reliable methods were tested in the monosodium iodoacetate (MIA) model over 21 days. Two MIA (2 mg) groups (including one lidocaine treatment group) and one sham group (0.9 % saline) received an intra-articular (50 μL) injection. No effect of environment (observer, inverted circadian cycle, or exercise) was observed; all tested methods except mechanical sensitivity (ICC <0.3), offered good repeatability (ICC ≥0.7). The most reliable acclimatization protocol included five assessments over two weeks. MIA-related osteoarthritic change in pain was demonstrated with static weight-bearing, punctate tactile allodynia evaluation, treadmill exercise and operant PEAP, the latter being the most responsive to analgesic intra-articular lidocaine. Substance P and calcitonin gene-related peptide were higher in MIA groups compared to naive (adjusted P (adj-P) = 0.016) or sham-treated (adj-P = 0.029) rats. Repeated post-MIA lidocaine injection resulted in 34 times lower downregulation for spinal substance P compared to MIA alone (adj-P = 0.029), with a concomitant increase of 17 % in time spent on the PEAP dark side (indicative of increased comfort). This study of normal rats and rats with pain established the most reliable and sensitive pain assessment methods and an optimized acclimatization protocol. Operant PEAP testing was more responsive to lidocaine analgesia than other tests used, while neuropeptide spinal concentration is an objective quantification method attractive to support and validate different centralized pain functional assessment methods.
The intra- and inter-rater reliability of five clinical muscle performance tests in patients with and without neck pain

PubMed Central

2013-01-01

Background This study investigates the reliability of muscle performance tests using cost- and time-effective methods similar to those used in clinical practice. When conducting reliability studies, great effort goes into standardising test procedures to facilitate a stable outcome. Therefore, several test trials are often performed. However, when muscle performance tests are applied in the clinical setting, clinicians often only conduct a muscle performance test once as repeated testing may produce fatigue and pain, thus variation in test results. We aimed to investigate whether cervical muscle performance tests, which have shown promising psychometric properties, would remain reliable when examined under conditions similar to those of daily clinical practice. Methods The intra-rater (between-day) and inter-rater (within-day) reliability was assessed for five cervical muscle performance tests in patients with (n = 33) and without neck pain (n = 30). The five tests were joint position error, the cranio-cervical flexion test, the neck flexor muscle endurance test performed in supine and in a 45°-upright position and a new neck extensor test. Results Intra-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.48-0.82), the cranio-cervical flexion test (ICC ≥ 0.69), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.68) and in a 45°-upright position (ICC ≥ 0.41) with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement (ICC = 0.14-0.41). Likewise, inter-rater reliability ranged from moderate to almost perfect agreement for joint position error (ICC ≥ 0.51-0.75), the cranio-cervical flexion test (ICC ≥ 0.85), the neck flexor muscle endurance test performed in supine (ICC ≥ 0.70) and in a 45°-upright position (ICC ≥ 0.56). However, only slight to fair agreement was found for the neck extensor test (ICC = 0.19-0.25). Conclusions Intra- and inter-rater reliability ranged from moderate to almost perfect agreement with the exception of a new test (neck extensor test), which ranged from slight to moderate agreement. The significant variability observed suggests that tests like the neck extensor test and the neck flexor muscle endurance test performed in a 45°-upright position are too unstable to be used when evaluating neck muscle performance. PMID:24299621
Test Assembly Implications for Providing Reliable and Valid Subscores

ERIC Educational Resources Information Center

Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J.

2017-01-01

This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…
Testing the feasibility of eliciting preferences for health states from adolescents using direct methods.

PubMed

Crump, R Trafford; Lau, Ryan; Cox, Elizabeth; Currie, Gillian; Panepinto, Julie

2018-06-22

Measuring adolescents' preferences for health states can play an important role in evaluating the delivery of pediatric healthcare. However, formal evaluation of the common direct preference elicitation methods for health states has not been done with adolescents. Therefore, the purpose of this study is to test how these methods perform in terms of their feasibility, reliability, and validity for measuring health state preferences in adolescents. This study used a web-based survey of adolescents, 18 years of age or younger, living in the United States. The survey included four health states, each comprised of six attributes. Preferences for these health states were elicited using the visual analogue scale, time trade-off, and standard gamble. The feasibility, test-retest reliability, and construct validity of each of these preference elicitation methods were tested and compared. A total of 144 participants were included in this study. Using a web-based survey format to elicit preferences for health states from adolescents was feasible. A majority of participants completed all three elicitation methods, ranked those methods as being easy, with very few requiring assistance from someone else. However, all three elicitation methods demonstrated weak test-retest reliability, with Kendall's tau-a values ranging from 0.204 to 0.402. Similarly, all three methods demonstrated poor construct validity, with 9-50% of all rankings aligning with our expectations. There were no significant differences across age groups. Using a web-based survey format to elicit preferences for health states from adolescents is feasible. However, the reliability and construct validity of the methods used to elicit these preferences when using this survey format are poor. Further research into the effects of a web-based survey approach to eliciting preferences for health states from adolescents is needed before health services researchers or pediatric clinicians widely employ these methods.
The Reliability and Validity of the Computerized Double Inclinometer in Measuring Lumbar Mobility

PubMed Central

MacDermid, Joy Christine; Arumugam, Vanitha; Vincent, Joshua Israel; Carroll, Krista L

2014-01-01

Study Design : Repeated measures reliability/validity study. Objectives : To determine the concurrent validity, test-retest, inter-rater and intra-rater reliability of lumbar flexion and extension measurements using the Tracker M.E. computerized dual inclinometer (CDI) in comparison to the modified-modified Schober (MMS) Summary of Background : Numerous studies have evaluated the reliability and validity of the various methods of measuring spinal motion, but the results are inconsistent. Differences in equipment and techniques make it difficult to correlate results. Methods : Twenty subjects with back pain and twenty without back pain were selected through convenience sampling. Two examiners measured sagittal plane lumbar range of motion for each subject. Two separate tests with the CDI and one test with the MMS were conducted. Each test consisted of three trials. Instrument and examiner order was randomly assigned. Intra-class correlations (ICCs 2, 2 and 2, 2) and Pearson correlation coefficients (r) were used to calculate reliability and concurrent validity respectively. Results : Intra-trial reliability was high to very high for both the CDI (ICCs 0.85 - 0.96) and MMS (ICCs 0.84 - 0.98). However, the reliability was poor to moderate, when the CDI unit had to be repositioned either by the same rate (ICCs 0.16 - 0.59) or a different rater (ICCs 0.45 - 0.52). Inter-rater reliability for the MMS was moderate to high (ICCs 0.75 - 0.82) which bettered the moderate correlation obtained for the CDI (ICCs 0.45 - 0.52). Correlations between the CDI and MMS were poor for flexion (0.32; p<0.05) and poor to moderate (-0.42 - -0.51; p<0.05) for extension measurements. Conclusion : When using the CDI, an average of subsequent tests is required to obtain moderate reliability. The MMS was highly reliable than the CDI. The MMS and the CDI measure lumbar movement on a different metric that are not highly related to each other. PMID:25352928
Reliability of movement control tests in the lumbar spine

PubMed Central

Luomajoki, Hannu; Kool, Jan; de Bruin, Eling D; Airaksinen, Olavi

2007-01-01

Background Movement control dysfunction [MCD] reduces active control of movements. Patients with MCD might form an important subgroup among patients with non specific low back pain. The diagnosis is based on the observation of active movements. Although widely used clinically, only a few studies have been performed to determine the test reliability. The aim of this study was to determine the inter- and intra-observer reliability of movement control dysfunction tests of the lumbar spine. Methods We videoed patients performing a standardized test battery consisting of 10 active movement tests for motor control in 27 patients with non specific low back pain and 13 patients with other diagnoses but without back pain. Four physiotherapists independently rated test performances as correct or incorrect per observation, blinded to all other patient information and to each other. The study was conducted in a private physiotherapy outpatient practice in Reinach, Switzerland. Kappa coefficients, percentage agreements and confidence intervals for inter- and intra-rater results were calculated. Results The kappa values for inter-tester reliability ranged between 0.24 – 0.71. Six tests out of ten showed a substantial reliability [k > 0.6]. Intra-tester reliability was between 0.51 – 0.96, all tests but one showed substantial reliability [k > 0.6]. Conclusion Physiotherapists were able to reliably rate most of the tests in this series of motor control tasks as being performed correctly or not, by viewing films of patients with and without back pain performing the task. PMID:17850669
Influences on the Test-Retest Reliability of Functional Connectivity MRI and its Relationship with Behavioral Utility.

PubMed

Noble, Stephanie; Spann, Marisa N; Tokoglu, Fuyuze; Shen, Xilin; Constable, R Todd; Scheinost, Dustin

2017-11-01

Best practices are currently being developed for the acquisition and processing of resting-state magnetic resonance imaging data used to estimate brain functional organization-or "functional connectivity." Standards have been proposed based on test-retest reliability, but open questions remain. These include how amount of data per subject influences whole-brain reliability, the influence of increasing runs versus sessions, the spatial distribution of reliability, the reliability of multivariate methods, and, crucially, how reliability maps onto prediction of behavior. We collected a dataset of 12 extensively sampled individuals (144 min data each across 2 identically configured scanners) to assess test-retest reliability of whole-brain connectivity within the generalizability theory framework. We used Human Connectome Project data to replicate these analyses and relate reliability to behavioral prediction. Overall, the historical 5-min scan produced poor reliability averaged across connections. Increasing the number of sessions was more beneficial than increasing runs. Reliability was lowest for subcortical connections and highest for within-network cortical connections. Multivariate reliability was greater than univariate. Finally, reliability could not be used to improve prediction; these findings are among the first to underscore this distinction for functional connectivity. A comprehensive understanding of test-retest reliability, including its limitations, supports the development of best practices in the field. © The Author 2017. Published by Oxford University Press.
Evaluating test-retest reliability in patient-reported outcome measures for older people: A systematic review.

PubMed

Park, Myung Sook; Kang, Kyung Ja; Jang, Sun Joo; Lee, Joo Yun; Chang, Sun Ju

2018-03-01

This study aimed to evaluate the components of test-retest reliability including time interval, sample size, and statistical methods used in patient-reported outcome measures in older people and to provide suggestions on the methodology for calculating test-retest reliability for patient-reported outcomes in older people. This was a systematic literature review. MEDLINE, Embase, CINAHL, and PsycINFO were searched from January 1, 2000 to August 10, 2017 by an information specialist. This systematic review was guided by both the Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist and the guideline for systematic review published by the National Evidence-based Healthcare Collaborating Agency in Korea. The methodological quality was assessed by the Consensus-based Standards for the selection of health Measurement Instruments checklist box B. Ninety-five out of 12,641 studies were selected for the analysis. The median time interval for test-retest reliability was 14days, and the ratio of sample size for test-retest reliability to the number of items in each measure ranged from 1:1 to 1:4. The most frequently used statistical methods for continuous scores was intraclass correlation coefficients (ICCs). Among the 63 studies that used ICCs, 21 studies presented models for ICC calculations and 30 studies reported 95% confidence intervals of the ICCs. Additional analyses using 17 studies that reported a strong ICC (>0.09) showed that the mean time interval was 12.88days and the mean ratio of the number of items to sample size was 1:5.37. When researchers plan to assess the test-retest reliability of patient-reported outcome measures for older people, they need to consider an adequate time interval of approximately 13days and the sample size of about 5 times the number of items. Particularly, statistical methods should not only be selected based on the types of scores of the patient-reported outcome measures, but should also be described clearly in the studies that report the results of test-retest reliability. Copyright © 2017 Elsevier Ltd. All rights reserved.
Methodology for the development of normative data for Spanish-speaking pediatric populations.

PubMed

Rivera, D; Arango-Lasprilla, J C

2017-01-01

To describe the methodology utilized to calculate reliability and the generation of norms for 10 neuropsychological tests for children in Spanish-speaking countries. The study sample consisted of over 4,373 healthy children from nine countries in Latin America (Chile, Cuba, Ecuador, Guatemala, Honduras, Mexico, Paraguay, Peru, and Puerto Rico) and Spain. Inclusion criteria for all countries were to have between 6 to 17 years of age, an Intelligence Quotient of≥80 on the Test of Non-Verbal Intelligence (TONI-2), and score of <19 on the Children's Depression Inventory. Participants completed 10 neuropsychological tests. Reliability and norms were calculated for all tests. Test-retest analysis showed excellent or good- reliability on all tests (r's>0.55; p's<0.001) except M-WCST perseverative errors whose coefficient magnitude was fair. All scores were normed using multiple linear regressions and standard deviations of residual values. Age, age2, sex, and mean level of parental education (MLPE) were included as predictors in the models by country. The non-significant variables (p > 0.05) were removed and the analysis were run again. This is the largest Spanish-speaking children and adolescents normative study in the world. For the generation of normative data, the method based on linear regression models and the standard deviation of residual values was used. This method allows determination of the specific variables that predict test scores, helps identify and control for collinearity of predictive variables, and generates continuous and more reliable norms than those of traditional methods.
Automated Psychological Testing: Method of Administration, Need for Approval, and Measures of Anxiety.

ERIC Educational Resources Information Center

Davis, Caroline; Cowles, Michael

1989-01-01

Computerized and paper-and-pencil versions of four standard personality inventories administered to 147 undergraduates were compared for: (1) test-retest reliability; (2) scores; (3) trait anxiety; (4) interaction between method and social desirability; and (5) preferences concerning method of testing. Doubts concerning the efficacy of…
A Statistical Testing Approach for Quantifying Software Reliability; Application to an Example System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chu, Tsong-Lun; Varuttamaseni, Athi; Baek, Joo-Seok

The U.S. Nuclear Regulatory Commission (NRC) encourages the use of probabilistic risk assessment (PRA) technology in all regulatory matters, to the extent supported by the state-of-the-art in PRA methods and data. Although much has been accomplished in the area of risk-informed regulation, risk assessment for digital systems has not been fully developed. The NRC established a plan for research on digital systems to identify and develop methods, analytical tools, and regulatory guidance for (1) including models of digital systems in the PRAs of nuclear power plants (NPPs), and (2) incorporating digital systems in the NRC's risk-informed licensing and oversight activities.more » Under NRC's sponsorship, Brookhaven National Laboratory (BNL) explored approaches for addressing the failures of digital instrumentation and control (I and C) systems in the current NPP PRA framework. Specific areas investigated included PRA modeling digital hardware, development of a philosophical basis for defining software failure, and identification of desirable attributes of quantitative software reliability methods. Based on the earlier research, statistical testing is considered a promising method for quantifying software reliability. This paper describes a statistical software testing approach for quantifying software reliability and applies it to the loop-operating control system (LOCS) of an experimental loop of the Advanced Test Reactor (ATR) at Idaho National Laboratory (INL).« less
A simple video-based timing system for on-ice team testing in ice hockey: a technical report.

PubMed

Larson, David P; Noonan, Benjamin C

2014-09-01

The purpose of this study was to describe and evaluate a newly developed on-ice timing system for team evaluation in the sport of ice hockey. We hypothesized that this new, simple, inexpensive, timing system would prove to be highly accurate and reliable. Six adult subjects (age 30.4 ± 6.2 years) performed on ice tests of acceleration and conditioning. The performance times of the subjects were recorded using a handheld stopwatch, photocell, and high-speed (240 frames per second) video. These results were then compared to allow for accuracy calculations of the stopwatch and video as compared with filtered photocell timing that was used as the "gold standard." Accuracy was evaluated using maximal differences, typical error/coefficient of variation (CV), and intraclass correlation coefficients (ICCs) between the timing methods. The reliability of the video method was evaluated using the same variables in a test-retest analysis both within and between evaluators. The video timing method proved to be both highly accurate (ICC: 0.96-0.99 and CV: 0.1-0.6% as compared with the photocell method) and reliable (ICC and CV within and between evaluators: 0.99 and 0.08%, respectively). This video-based timing method provides a very rapid means of collecting a high volume of very accurate and reliable on-ice measures of skating speed and conditioning, and can easily be adapted to other testing surfaces and parameters.
National audit of continence care: laying the foundation.

PubMed

Mian, Sarah; Wagg, Adrian; Irwin, Penny; Lowe, Derek; Potter, Jonathan; Pearson, Michael

2005-12-01

National audit provides a basis for establishing performance against national standards, benchmarking against other service providers and improving standards of care. For effective audit, clinical indicators are required that are valid, feasible to apply and reliable. This study describes the methods used to develop clinical indicators of continence care in preparation for a national audit. To describe the methods used to develop and test clinical indicators of continence care with regard to validity, feasibility and reliability. A multidisciplinary working group developed clinical indicators that measured the structure, process and outcome of care as well as case-mix variables. Literature searching, consensus workshops and a Delphi process were used to develop the indicators. The indicators were tested in 15 secondary care sites, 15 primary care sites and 15 long-term care settings. The process of development produced indicators that received a high degree of consensus within the Delphi process. Testing of the indicators demonstrated an internal reliability of 0.7 and an external reliability of 0.6. Data collection required significant investment in terms of staff time and training. The method used produced indicators that achieved a high degree of acceptance from health care professionals. The reliability of data collection was high for this audit and was similar to the level seen in other successful national audits. Data collection for the indicators was feasible to collect, however, issues of time and staffing were identified as limitations to such data collection. The study has described a systematic method for developing clinical indicators for national audit. The indicators proved robust and reliable in primary and secondary care as well as long-term care settings.
Reliability of measuring hip abductor strength following total knee arthroplasty using a hand-held dynamometer.

PubMed

Schache, Margaret B; McClelland, Jodie A; Webster, Kate E

2016-01-01

To investigate the test-retest reliability of measuring hip abductor strength in patients with total knee arthroplasty (TKA) using a hand-held dynamometer (HHD) with two different types of resistance: belt and manual resistance. Test-retest reliability of 30 subjects (17 female, 13 male, 71.9 ± 7.4 years old), 9.2 ± 2.7 days post TKA was measured using belt and therapist resistance. Retest reliability was calculated with intra-class coefficients (ICC3,1) and 95% confidence intervals (CI) for both the group average and the individual scores. A paired t-test assessed whether a difference existed between the belt and therapist methods of resistance. ICCs were 0.82 and 0.80 for the belt and therapist resisted methods, respectively. Hip abductor strength increases of 8 N (14%) for belt resisted and 14 N (17%) for therapist resisted measurements of the group average exceeded the 95% CI and may represent real change. For individuals, hip abductor strength increases of 33 N (72%) (belt resisted) and 57 N (79%) (therapist resisted) could be interpreted as real change. Hip abductor strength can be reliably measured using HHD in the clinical setting with the described protocol. Belt resistance demonstrated slightly higher test-retest reliability. Reliable measurement of hip abductor muscle strength in patients with TKA is important to ensure deficiencies are addressed in rehabilitation programs and function is maximized. Hip abductor strength can be reliably measured with a hand-held dynamometer in the clinical setting using manual or belt resistance.
Reliability and Validity of the Computerized Revised Token Test: Comparison of Reading and Listening Versions in Persons with and without Aphasia

ERIC Educational Resources Information Center

McNeil, Malcolm R.; Pratt, Sheila R.; Szuminsky, Neil; Sung, Jee Eun; Fossett, Tepanta R. D.; Fassbinder, Wiltrud; Lim, Kyoung Yuel

2015-01-01

Purpose: This study assessed the reliability and validity of intermodality associations and differences in persons with aphasia (PWA) and healthy controls (HC) on a computerized listening and 3 reading versions of the Revised Token Test (RTT; McNeil & Prescott, 1978). Method: Thirty PWA and 30 HC completed the test versions, including a…
The Six-Minute Walk Test for Adults with Intellectual Disability: A Study of Validity and Reliability

ERIC Educational Resources Information Center

Nasuti, Gabriella; Stuart-Hill, Lynneth; Temple, Viviene A.

2013-01-01

Background: The Six-Minute Walk Test (6MWT) has been used with clinical and healthy populations to assess functional capacity and cardiovascular fitness. The aim of this study was to determine the test-retest reliability of a modified-6MWT as well as concurrent validity of walk distance with peak oxygen uptake (VO[subscript 2] peak). Method:…
Reliability and Validity Testing of the Physical Resilience Measure

ERIC Educational Resources Information Center

Resnick, Barbara; Galik, Elizabeth; Dorsey, Susan; Scheve, Ann; Gutkin, Susan

2011-01-01

Objective: The purpose of this study was to test reliability and validity of the Physical Resilience Scale. Methods: A single-group repeated measure design was used and 130 older adults from three different housing sites participated. Participants completed the Physical Resilience Scale, Hardy-Gill Resilience Scale, 14-item Resilience Scale,…
Reliability and Availability Evaluation Program Manual.

DTIC Science & Technology

1982-11-01

research and development. The manual’s purpose was to provide a practical method for making reliability measurements, measurements directly related to... Research , Development, Test and Evaluation. RMA Reliability, Maintainability and Availability. R&R Repair and Refurbishment, Repair and Replacement, etc...length. phenomena such as mechanical wear and A number of researchers in the reliability chemical deterioration. Maintenance should field 14-pages 402
Instruments for Water Quality Monitoring

ERIC Educational Resources Information Center

Ballinger, Dwight G.

1972-01-01

Presents information regarding available instruments for industries and agencies who must monitor numerous aquatic parameters. Charts denote examples of parameters sampled, testing methods, range and accuracy of test methods, cost analysis, and reliability of instruments. (BL)
Reliability and Validity of the Evidence-Based Practice Confidence (EPIC) Scale

ERIC Educational Resources Information Center

Salbach, Nancy M.; Jaglal, Susan B.; Williams, Jack I.

2013-01-01

Introduction: The reliability, minimal detectable change (MDC), and construct validity of the evidence-based practice confidence (EPIC) scale were evaluated among physical therapists (PTs) in clinical practice. Methods: A longitudinal mail survey was conducted. Internal consistency and test-retest reliability were estimated using Cronbach's alpha…

Evaluation of Validity and Reliability for Hierarchical Scales Using Latent Variable Modeling

ERIC Educational Resources Information Center

Raykov, Tenko; Marcoulides, George A.

2012-01-01

A latent variable modeling method is outlined, which accomplishes estimation of criterion validity and reliability for a multicomponent measuring instrument with hierarchical structure. The approach provides point and interval estimates for the scale criterion validity and reliability coefficients, and can also be used for testing composite or…
[Comprehensive weighted recognition method for hydrological abrupt change: With the runoff series of Jiajiu hydrological station in Lancang River as an example].

PubMed

Gu, Hai Ting; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi

2018-04-01

Abrupt change is an important manifestation of hydrological process with dramatic variation in the context of global climate change, the accurate recognition of which has great significance to understand hydrological process changes and carry out the actual hydrological and water resources works. The traditional method is not reliable at both ends of the samples. The results of the methods are often inconsistent. In order to solve the problem, we proposed a comprehensive weighted recognition method for hydrological abrupt change based on weighting by comparing of 12 commonly used methods for testing change points. The reliability of the method was verified by Monte Carlo statistical test. The results showed that the efficiency of the 12 methods was influenced by the factors including coefficient of variation (Cv), deviation coefficient (Cs) before the change point, mean value difference coefficient, Cv difference coefficient and Cs difference coefficient, but with no significant relationship with the mean value of the sequence. Based on the performance of each method, the weight of each test method was given following the results from statistical test. The sliding rank sum test method and the sliding run test method had the highest weight, whereas the RS test method had the lowest weight. By this means, the change points with the largest comprehensive weight could be selected as the final result when the results of the different methods were inconsistent. This method was used to analyze the daily maximum sequence of Jiajiu station in the lower reaches of the Lancang River (1-day, 3-day, 5-day, 7-day and 1-month). The results showed that each sequence had obvious jump variation in 2004, which was in agreement with the physical causes of hydrological process change and water conservancy construction. The rationality and reliability of the proposed method was verified.
Research on Novel Algorithms for Smart Grid Reliability Assessment and Economic Dispatch

NASA Astrophysics Data System (ADS)

Luo, Wenjin

In this dissertation, several studies of electric power system reliability and economy assessment methods are presented. To be more precise, several algorithms in evaluating power system reliability and economy are studied. Furthermore, two novel algorithms are applied to this field and their simulation results are compared with conventional results. As the electrical power system develops towards extra high voltage, remote distance, large capacity and regional networking, the application of a number of new technique equipments and the electric market system have be gradually established, and the results caused by power cut has become more and more serious. The electrical power system needs the highest possible reliability due to its complication and security. In this dissertation the Boolean logic Driven Markov Process (BDMP) method is studied and applied to evaluate power system reliability. This approach has several benefits. It allows complex dynamic models to be defined, while maintaining its easy readability as conventional methods. This method has been applied to evaluate IEEE reliability test system. The simulation results obtained are close to IEEE experimental data which means that it could be used for future study of the system reliability. Besides reliability, modern power system is expected to be more economic. This dissertation presents a novel evolutionary algorithm named as quantum evolutionary membrane algorithm (QEPS), which combines the concept and theory of quantum-inspired evolutionary algorithm and membrane computation, to solve the economic dispatch problem in renewable power system with on land and offshore wind farms. The case derived from real data is used for simulation tests. Another conventional evolutionary algorithm is also used to solve the same problem for comparison. The experimental results show that the proposed method is quick and accurate to obtain the optimal solution which is the minimum cost for electricity supplied by wind farm system.
A Method for Measuring the Hardness of the Surface Layer on Hot Forging Dies Using a Nanoindenter

NASA Astrophysics Data System (ADS)

Mencin, P.; van Tyne, C. J.; Levy, B. S.

2009-11-01

The properties and characteristics of the surface layer of forging dies are critical for understanding and controlling wear. However, the surface layer is very thin, and appropriate property measurements are difficult to obtain. The objective of the present study is to determine if nanoindenter testing provides a reliable method, which could be used to measure the surface hardness in forging die steels. To test the reliability of nanoindenter testing, nanoindenter values for two quenched and tempered steels (FX and H13) are compared to microhardness and macrohardness values. These steels were heat treated for various times to produce specimens with different values of hardness. The heat-treated specimens were tested using three different instruments—a Rockwell hardness tester for macrohardness, a Vickers hardness tester for microhardness, and a nanoindenter tester for fine scale evaluation of hardness. The results of this study indicate that nanoindenter values obtained using a Nanoindenter XP Machine with a Berkovich indenter reliably correlate with Rockwell C macrohardness values, and with Vickers HV microhardness values. Consequently, nanoindenter testing can provide reliable results for analyzing the surface layer of hot forging dies.
Moment Method and Pixel-by-Pixel Method: Complementary Mode Identification I. Testing FG Vir-like pulsation modes

NASA Astrophysics Data System (ADS)

Zima, W.; Kolenberg, K.; Briquet, M.; Breger, M.

2004-06-01

We have carried out a Hare-and-Hound test to determine the reliability of the Moment Method (Briquet & Aerts 2003) and the Pixel-by-Pixel Method (Mantegazza 2000) for the identification of pulsation modes in Delta Scuti stars. For this purpose we calculated synthetic line profiles, exhibiting six pulsation modes of low degree and with input parameters initially unknown to us. The aim was to test and increase the quality of the mode identification by applying both methods independently and by using a combined technique. Our results show that, whereas the azimuthal order m and its sign can be fixed by both methods, the degree l is not determined unambiguously. Both identification methods show a better reliability if multiple modes are fitted simultaneously. In particular, the inclination angle is better determined. We have to emphasize that the outcome of this test is only meaningful for stars having pulsational velocities below 0.2 vsini. This is the first part of a series of articles, in which we will test these spectroscopic identification methods.
Reliability and validity of the closed kinetic chain upper extremity stability test.

PubMed

Lee, Dong-Rour; Kim, Laurentius Jongsoon

2015-04-01

[Purpose] The purpose of this study was to examine the reliability and validity of the Closed Kinetic Chain Upper Extremity Stability (CKCUES) test. [Subjects and Methods] A sample of 40 subjects (20 males, 20 females) with and without pain in the upper limbs was recruited. The subjects were tested twice, three days apart to assess the reliability of the CKCUES test. The CKCUES test was performed four times, and the average was calculated using the data of the last 3 tests. In order to test the validity of the CKCUES test, peak torque of internal/external shoulder rotation was measured using an isokinetic dynamometer, and maximum grip strength was measured using a hand dynamometer, and their Pearson correlation coefficients with the average values of the CKCUES test were calculated. [Results] The reliability of the CKCUES test was very high (ICC=0.97). The correlations between the CKCUES test and maximum grip strength (r=0.78-0.79), and the peak torque of internal/external shoulder rotation (r=0.87-0.94) were high indicating its validity. [Conclusion] The reliability and validity of the CKCUES test were high. The CKCUES test is expected to be used for clinical tests on upper limb stability at low price.
HARBO, a simple computer-aided observation method for recording work postures.

PubMed

Wiktorin, C; Mortimer, M; Ekenvall, L; Kilbom, A; Hjelm, E W

1995-12-01

The aim of the study was to present an observation method focusing on the positions of the hands relative to the body and to evaluate whether this simple observation technique gives a reliable estimate of the total time spent in each of five work postures during one workday. In the first part of the study the interobserver reliability of the observation method was tested with eight blue-collar workers. In the second part the observed time spent with work above the shoulder level was tested in relation to an upper-arm position analyzer, and observed time spent in work below knuckle level was tested in relation to a trunk flexion analyzer, both with 72 blue-collar workers. The interobserver reliability for full-day registrations was high. The intraclass correlation coefficients ranged from 0.99 to 1.00. The observed duration of work with hands above shoulder level correlated well with the measured duration of pronounced arm elevation (> 75 degrees). The product moment correlation coefficient was 0.97. The observed duration of work with hands below knuckle level correlated well with the measured duration of pronounced trunk flexion angles (> 40 degrees). The product moment correlation coefficient was 0.98. The present observation method, designed to make postural observations continuously for several hours, is easy to learn and seems reliable.
The effect of density gradients on hydrometers

NASA Astrophysics Data System (ADS)

Heinonen, Martti; Sillanpää, Sampo

2003-05-01

Hydrometers are simple but effective instruments for measuring the density of liquids. In this work, we studied the effect of non-uniform density of liquid on a hydrometer reading. The effect induced by vertical temperature gradients was investigated theoretically and experimentally. A method for compensating for the effect mathematically was developed and tested with experimental data obtained with the MIKES hydrometer calibration system. In the tests, the method was found reliable. However, the reliability depends on the available information on the hydrometer dimensions and density gradients.
Use of Very Weak Radiation Sources to Determine Aircraft Runway Position

NASA Technical Reports Server (NTRS)

Drinkwater, Fred J., III; Kibort, Bernard R.

1965-01-01

Various methods of providing runway information in the cockpit during the take-off and landing roll have been proposed. The most reliable method has been to use runway distance markers when visible. Flight tests were used to evaluate the feasibility of using weak radio-active sources to trigger a runway distance counter in the cockpit. The results of these tests indicate that a weak radioactive source would provide a reliable signal by which this indicator could be operated.
Reliability of a functional test battery evaluating functionality, proprioception, and strength in recreational athletes with functional ankle instability.

PubMed

Sekir, U; Yildiz, Y; Hazneci, B; Ors, F; Saka, T; Aydin, T

2008-12-01

In contrast to the single evaluation methods used in the past, the combination of multiple tests allows one to obtain a global assessment of the ankle joint. The aim of this study was to determine the reliability of the different tests in a functional test battery. Twenty-four male recreational athletes with unilateral functional ankle instability (FAI) were recruited for this study. One component of the test battery included five different functional ability tests. These tests included a single limb hopping course, single-legged and triple-legged hop for distance, and six and cross six meter hop for time. The ankle joint position sense and one leg standing test were used for evaluation of proprioception and sensorimotor control. The isokinetic strengths of the ankle invertor and evertor muscles were evaluated at a velocity of 120 degrees /s. The reliability of the test battery was assessed by calculating the intraclass correlation coefficient (ICC). Each subject was tested two times, with an interval of 3-5 days between the test sessions. The ICCs for ankle functional and proprioceptive ability showed high reliability (ICCs ranging from 0.94 to 0.98). Additionally, isokinetic ankle joint inversion and eversion strength measurements represented good to high reliability (ICCs between 0.82 and 0.98). The functional test battery investigated in this study proved to be a reliable tool for the assessment of athletes with functional ankle instability. Therefore, clinicians may obtain reliable information from the functional test battery during the assessment of ankle joint performance in patients with functional ankle instability.
Quantitative metal magnetic memory reliability modeling for welded joints

NASA Astrophysics Data System (ADS)

Xing, Haiyan; Dang, Yongbin; Wang, Ben; Leng, Jiancheng

2016-03-01

Metal magnetic memory(MMM) testing has been widely used to detect welded joints. However, load levels, environmental magnetic field, and measurement noises make the MMM data dispersive and bring difficulty to quantitative evaluation. In order to promote the development of quantitative MMM reliability assessment, a new MMM model is presented for welded joints. Steel Q235 welded specimens are tested along the longitudinal and horizontal lines by TSC-2M-8 instrument in the tensile fatigue experiments. The X-ray testing is carried out synchronously to verify the MMM results. It is found that MMM testing can detect the hidden crack earlier than X-ray testing. Moreover, the MMM gradient vector sum K vs is sensitive to the damage degree, especially at early and hidden damage stages. Considering the dispersion of MMM data, the K vs statistical law is investigated, which shows that K vs obeys Gaussian distribution. So K vs is the suitable MMM parameter to establish reliability model of welded joints. At last, the original quantitative MMM reliability model is first presented based on the improved stress strength interference theory. It is shown that the reliability degree R gradually decreases with the decreasing of the residual life ratio T, and the maximal error between prediction reliability degree R 1 and verification reliability degree R 2 is 9.15%. This presented method provides a novel tool of reliability testing and evaluating in practical engineering for welded joints.
Intra-rater reliability and agreement of various methods of measurement to assess dorsiflexion in the Weight Bearing Dorsiflexion Lunge Test (WBLT) among female athletes.

PubMed

Langarika-Rocafort, Argia; Emparanza, José Ignacio; Aramendi, José F; Castellano, Julen; Calleja-González, Julio

2017-01-01

To examine the intra-observer reliability and agreement between five methods of measurement for dorsiflexion during Weight Bearing Dorsiflexion Lunge Test and to assess the degree of agreement between three methods in female athletes. Repeated measurements study design. Volleyball club. Twenty-five volleyball players. Dorsiflexion was evaluated using five methods: heel-wall distance, first toe-wall distance, inclinometer at tibia, inclinometer at Achilles tendon and the dorsiflexion angle obtained by a simple trigonometric function. For the statistical analysis, agreement was studied using the Bland-Altman method, the Standard Error of Measurement and the Minimum Detectable Change. Reliability analysis was performed using the Intraclass Correlation Coefficient. Measurement methods using the inclinometer had more than 6° of measurement error. The angle calculated by trigonometric function had 3.28° error. The reliability of inclinometer based methods had ICC values < 0.90. Distance based methods and trigonometric angle measurement had an ICC values > 0.90. Concerning the agreement between methods, there was from 1.93° to 14.42° bias, and from 4.24° to 7.96° random error. To assess DF angle in WBLT, the angle calculated by a trigonometric function is the most repeatable method. The methods of measurement cannot be used interchangeably. Copyright © 2016 Elsevier Ltd. All rights reserved.
Overview of Non-Volatile Testing and Screening Methods

NASA Technical Reports Server (NTRS)

Irom, Farokh

2001-01-01

Testing methods for memories and non-volatile memories have become increasingly sophisticated as they become denser and more complex. High frequency and faster rewrite times as well as smaller feature sizes have led to many testing challenges. This paper outlines several testing issues posed by novel memories and approaches to testing for radiation and reliability effects. We discuss methods for measurements of Total Ionizing Dose (TID).
Method of Testing and Predicting Failures of Electronic Mechanical Systems

NASA Technical Reports Server (NTRS)

Iverson, David L.; Patterson-Hine, Frances A.

1996-01-01

A method employing a knowledge base of human expertise comprising a reliability model analysis implemented for diagnostic routines is disclosed. The reliability analysis comprises digraph models that determine target events created by hardware failures human actions, and other factors affecting the system operation. The reliability analysis contains a wealth of human expertise information that is used to build automatic diagnostic routines and which provides a knowledge base that can be used to solve other artificial intelligence problems.
Reliability and Engineering of Thin-Film Photovoltaic Modules. Research forum proceedings

NASA Technical Reports Server (NTRS)

Ross, R. G., Jr. (Editor); Royal, E. L. (Editor)

1985-01-01

A Research Forum on Reliability and Engineering of Thin Film Photovoltaic Modules, under sponsorship of the Jet Propulsion Laboratory's Flat Plate Solar Array (FSA) Project and the U.S. Department of Energy, was held in Washington, D.C., on March 20, 1985. Reliability attribute investigations of amorphous silicon cells, submodules, and modules were the subjects addressed by most of the Forum presentations. Included among the reliability research investigations reported were: Arrhenius-modeled accelerated stress tests on a Si cells, electrochemical corrosion, light induced effects and their potential effects on stability and reliability measurement methods, laser scribing considerations, and determination of degradation rates and mechanisms from both laboratory and outdoor exposure tests.
Reliability approach to rotating-component design. [fatigue life and stress concentration

NASA Technical Reports Server (NTRS)

Kececioglu, D. B.; Lalli, V. R.

1975-01-01

A probabilistic methodology for designing rotating mechanical components using reliability to relate stress to strength is explained. The experimental test machines and data obtained for steel to verify this methodology are described. A sample mechanical rotating component design problem is solved by comparing a deterministic design method with the new design-by reliability approach. The new method shows that a smaller size and weight can be obtained for specified rotating shaft life and reliability, and uses the statistical distortion-energy theory with statistical fatigue diagrams for optimum shaft design. Statistical methods are presented for (1) determining strength distributions for steel experimentally, (2) determining a failure theory for stress variations in a rotating shaft subjected to reversed bending and steady torque, and (3) relating strength to stress by reliability.
Vacuum decay container/closure integrity testing technology. Part 2. Comparison to dye ingress tests.

PubMed

Wolf, Heinz; Stauffer, Tony; Chen, Shu-Chen Y; Lee, Yoojin; Forster, Ronald; Ludzinski, Miron; Kamat, Madhav; Mulhall, Brian; Guazzo, Dana Morton

2009-01-01

Part 1 of this series demonstrated that a container closure integrity test performed according to ASTM F2338-09 Standard Test Method for Nondestructive Detection of Leaks in Packages by Vacuum Decay Method using a VeriPac 325/LV vacuum decay leak tester by Packaging Technologies & Inspection, LLC (PTI) is capable of detecting leaks > or = 5.0 microm (nominal diameter) in rigid, nonporous package systems, such as prefilled glass syringes. The current study compared USP, Ph.Eur. and ISO dye ingress integrity test methods to PTI's vacuum decay technology for the detection of these same 5-, 10-, and 15-microm laser-drilled hole defects in 1-mL glass prefilled syringes. The study was performed at three test sites using several inspectors and a variety of inspection conditions. No standard dye ingress method was found to reliably identify all holed syringes. Modifications to these standard dye tests' challenge conditions increased the potential for dye ingress, and adjustments to the visual inspection environment improved dye ingress detection. However, the risk of false positive test results with dye ingress tests remained. In contrast, the nondestructive vacuum decay leak test method reliably identified syringes with holes > or = 5.0 microm.
Prognostics-based qualification of high-power white LEDs using Lévy process approach

NASA Astrophysics Data System (ADS)

Yung, Kam-Chuen; Sun, Bo; Jiang, Xiaopeng

2017-01-01

Due to their versatility in a variety of applications and the growing market demand, high-power white light-emitting diodes (LEDs) have attracted considerable attention. Reliability qualification testing is an essential part of the product development process to ensure the reliability of a new LED product before its release. However, the widely used IES-TM-21 method does not provide comprehensive reliability information. For more accurate and effective qualification, this paper presents a novel method based on prognostics techniques. Prognostics is an engineering technology predicting the future reliability or determining the remaining useful lifetime (RUL) of a product by assessing the extent of deviation or degradation from its expected normal operating conditions. A Lévy subordinator of a mixed Gamma and compound Poisson process is used to describe the actual degradation process of LEDs characterized by random sporadic small jumps of degradation degree, and the reliability function is derived for qualification with different distribution forms of jump sizes. The IES LM-80 test results reported by different LED vendors are used to develop and validate the qualification methodology. This study will be helpful for LED manufacturers to reduce the total test time and cost required to qualify the reliability of an LED product.
Reliability of ultrasound thickness measurement of the abdominal muscles during clinical isometric endurance tests.

PubMed

ShahAli, Shabnam; Arab, Amir Massoud; Talebian, Saeed; Ebrahimi, Esmaeil; Bahmani, Andia; Karimi, Noureddin; Nabavi, Hoda

2015-07-01

The study was designed to evaluate the intra-examiner reliability of ultrasound (US) thickness measurement of abdominal muscles activity when supine lying and during two isometric endurance tests in subjects with and without Low back pain (LBP). A total of 19 women (9 with LBP, 10 without LBP) participated in the study. Within-day reliability of the US thickness measurements at supine lying and the two isometric endurance tests were assessed in all subjects. The intra-class correlation coefficient (ICC) was used to assess the relative reliability of thickness measurement. The standard error of measurement (SEM), minimal detectable change (MDC) and the coefficient of variation (CV) were used to evaluate the absolute reliability. Results indicated high ICC scores (0.73-0.99) and also small SEM and MDC scores for within-day reliability assessment. The Bland-Altman plots of agreement in US measurement of the abdominal muscles during the two isometric endurance tests demonstrated that 95% of the observations fall between the limits of agreement for test and retest measurements. Together the results indicate high intra-tester reliability for the US measurement of the thickness of abdominal muscles in all the positions tested. According to the study's findings, US imaging can be used as a reliable method for assessment of abdominal muscles activity in supine lying and the two isometric endurance tests employed, in participants with and without LBP. Copyright © 2014 Elsevier Ltd. All rights reserved.
Test-retest reliability of sensor-based sit-to-stand measures in young and older adults.

PubMed

Regterschot, G Ruben H; Zhang, Wei; Baldus, Heribert; Stevens, Martin; Zijlstra, Wiebren

2014-01-01

This study investigated test-retest reliability of sensor-based sit-to-stand (STS) peak power and other STS measures in young and older adults. In addition, test-retest reliability of the sensor method was compared to test-retest reliability of the Timed Up and Go Test (TUGT) and Five-Times-Sit-to-Stand Test (FTSST) in older adults. Ten healthy young female adults (20-23 years) and 31 older adults (21 females; 73-94 years) participated in two assessment sessions separated by 3-8 days. Vertical peak power was assessed during three (young adults) and five (older adults) normal and fast STS trials with a hybrid motion sensor worn on the hip. Older adults also performed the FTSST and TUGT. The average sensor-based STS peak power of the normal STS trials and the average sensor-based STS peak power of the fast STS trials showed excellent test-retest reliability in young adults (intra-class correlation (ICC)≥0.90; zero in 95% confidence interval of mean difference between test and retest (95%CI of D); standard error of measurement (SEM)≤6.7% of mean peak power) and older adults (ICC≥0.91; zero in 95%CI of D; SEM≤9.9%). Test-retest reliability of sensor-based STS peak power and TUGT (ICC=0.98; zero in 95%CI of D; SEM=8.5%) was comparable in older adults, test-retest reliability of the FTSST was lower (ICC=0.73; zero outside 95%CI of D; SEM=14.4%). Sensor-based STS peak power demonstrated excellent test-retest reliability and may therefore be useful for clinical assessment of functional status and fall risk. Copyright © 2014 Elsevier B.V. All rights reserved.

Further Examination of the Reliability of the Modified Rathus Assertiveness Schedule.

ERIC Educational Resources Information Center

Del Greco, Linda; And Others

1986-01-01

Examined the reliability of the 30-item Modified Rathus Assertiveness Schedule (MRAS) using the test-retest method over a three-week period. The MRAS yielded correlations of .74 using the Pearson product and Spearman Brown correlation coefficient. Correlations for males yielded .77 and .72. For females correlations for both tests were .72.…
Space station software reliability analysis based on failures observed during testing at the multisystem integration facility

NASA Technical Reports Server (NTRS)

Tamayo, Tak Chai

1987-01-01

Quality of software not only is vital to the successful operation of the space station, it is also an important factor in establishing testing requirements, time needed for software verification and integration as well as launching schedules for the space station. Defense of management decisions can be greatly strengthened by combining engineering judgments with statistical analysis. Unlike hardware, software has the characteristics of no wearout and costly redundancies, thus making traditional statistical analysis not suitable in evaluating reliability of software. A statistical model was developed to provide a representation of the number as well as types of failures occur during software testing and verification. From this model, quantitative measure of software reliability based on failure history during testing are derived. Criteria to terminate testing based on reliability objectives and methods to estimate the expected number of fixings required are also presented.
An exploratory study into the effect of time-restricted internet access on face-validity, construct validity and reliability of postgraduate knowledge progress testing

PubMed Central

2013-01-01

Background Yearly formative knowledge testing (also known as progress testing) was shown to have a limited construct-validity and reliability in postgraduate medical education. One way to improve construct-validity and reliability is to improve the authenticity of a test. As easily accessible internet has become inseparably linked to daily clinical practice, we hypothesized that allowing internet access for a limited amount of time during the progress test would improve the perception of authenticity (face-validity) of the test, which would in turn improve the construct-validity and reliability of postgraduate progress testing. Methods Postgraduate trainees taking the yearly knowledge progress test were asked to participate in a study where they could access the internet for 30 minutes at the end of a traditional pen and paper test. Before and after the test they were asked to complete a short questionnaire regarding the face-validity of the test. Results Mean test scores increased significantly for all training years. Trainees indicated that the face-validity of the test improved with internet access and that they would like to continue to have internet access during future testing. Internet access did not improve the construct-validity or reliability of the test. Conclusion Improving the face-validity of postgraduate progress testing, by adding the possibility to search the internet for a limited amount of time, positively influences test performance and face-validity. However, it did not change the reliability or the construct-validity of the test. PMID:24195696
Reliability and Validity of a New Method for Isometric Back Extensor Strength Evaluation Using A Hand-Held Dynamometer.

PubMed

Park, Hee-Won; Baek, Sora; Kim, Hong Young; Park, Jung-Gyoo; Kang, Eun Kyoung

2017-10-01

To investigate the reliability and validity of a new method for isometric back extensor strength measurement using a portable dynamometer. A chair equipped with a small portable dynamometer was designed (Power Track II Commander Muscle Tester). A total of 15 men (mean age, 34.8±7.5 years) and 15 women (mean age, 33.1±5.5 years) with no current back problems or previous history of back surgery were recruited. Subjects were asked to push the back of the chair while seated, and their isometric back extensor strength was measured by the portable dynamometer. Test-retest reliability was assessed with intraclass correlation coefficient (ICC). For the validity assessment, isometric back extensor strength of all subjects was measured by a widely used physical performance evaluation instrument, BTE PrimusRS system. The limit of agreement (LoA) from the Bland-Altman plot was evaluated between two methods. The test-retest reliability was excellent (ICC=0.82; 95% confidence interval, 0.65-0.91). The Bland-Altman plots demonstrated acceptable agreement between the two methods: the lower 95% LoA was -63.1 N and the upper 95% LoA was 61.1 N. This study shows that isometric back extensor strength measurement using a portable dynamometer has good reliability and validity.
Reliability of Craniofacial Superimposition Using Three-Dimension Skull Model.

PubMed

Gaudio, Daniel; Olivieri, Lara; De Angelis, Danilo; Poppa, Pasquale; Galassi, Andrea; Cattaneo, Cristina

2016-01-01

Craniofacial superimposition is a technique potentially useful for the identification of unidentified human remains if a photo of the missing person is available. We have tested the reliability of the 2D-3D computer-aided nonautomatic superimposition techniques. Three-dimension laser scans of five skulls and ten photographs were overlaid with an imaging software. The resulting superimpositions were evaluated using three methods: craniofacial landmarks, morphological features, and a combination of the two. A 3D model of each skull without its mandible was tested for superimposition; we also evaluated whether separating skulls by sex would increase correct identifications. Results show that the landmark method employing the entire skull is the more reliable one (5/5 correct identifications, 40% false positives [FP]), regardless of sex. However, the persistence of a high percentage of FP in all the methods evaluated indicates that these methods are unreliable for positive identification although the landmark-only method could be useful for exclusion. © 2015 American Academy of Forensic Sciences.
KRAS mutation testing in colorectal cancer: comparison of the results obtained using 3 different methods for the analysis of codons G12 and G13.

PubMed

Bihl, Michel P; Hoeller, Sylvia; Andreozzi, Maria Carla; Foerster, Anja; Rufle, Alexander; Tornillo, Luigi; Terracciano, Luigi

2012-03-01

Targeting the epidermal growth factor receptor (EGFR) is a new therapeutic option for patients with metastatic colorectal or lung carcinoma. However, the therapy efficiency highly depends on the KRAS mutation status in the given tumour. Therefore a reliable and secure KRAS mutation testing is crucial. Here we investigated 100 colorectal carcinoma samples with known KRAS mutation status (62 mutated cases and 38 wild type cases) in a comparative manner with three different KRAS mutation testing techniques (Pyrosequencing, Dideoxysequencing and INFINITI) in order to test their reliability and sensitivity. For the large majority of samples (96/100, 96%), the KRAS mutation status obtained by all three methods was the same. Only two cases with clear discrepancies were observed. One case was reported as wild type by the INFINITI method while the two other methods detected a G13C mutation. In the second case the mutation could be detected by the Pyrosequencing and INFINITI method (15% and 15%), while no signal for mutation could be observed with the Dideoxysequencing method. Additional two unclear results were due to a detection of a G12V with the INFINITI method, which was below cut-off when repeated and which was not detectable by the other two methods and very weak signals in a G12V mutated case with the Dideoxy- and Pyroseqencing method compared to the INFINITI method, respectively. In summary all three methods are reliable and robust methods in detecting KRAS mutations. INFINITI, however seems to be slightly more sensitive compared to Dideoxy- and Pyrosequencing.
Reliability of Measurement of Glenohumeral Internal Rotation, External Rotation, and Total Arc of Motion in 3 Test Positions

PubMed Central

Kevern, Mark A.; Beecher, Michael; Rao, Smita

2014-01-01

Context: Athletes who participate in throwing and racket sports consistently demonstrate adaptive changes in glenohumeral-joint internal and external rotation in the dominant arm. Measurements of these motions have demonstrated excellent intrarater and poor interrater reliability. Objective: To determine intrarater reliability, interrater reliability, and standard error of measurement for shoulder internal rotation, external rotation, and total arc of motion using an inclinometer in 3 testing procedures in National Collegiate Athletic Association Division I baseball and softball athletes. Design: Cross-sectional study. Setting: Athletic department. Patients or Other Participants Thirty-eight players participated in the study. Shoulder internal rotation, external rotation, and total arc of motion were measured by 2 investigators in 3 test positions. The standard supine position was compared with a side-lying test position, as well as a supine test position without examiner overpressure. Results: Excellent intrarater reliability was noted for all 3 test positions and ranges of motion, with intraclass correlation coefficient values ranging from 0.93 to 0.99. Results for interrater reliability were less favorable. Reliability for internal rotation was highest in the side-lying position (0.68) and reliability for external rotation and total arc was highest in the supine-without-overpressure position (0.774 and 0.713, respectively). The supine-with-overpressure position yielded the lowest interrater reliability results in all positions. The side-lying position had the most consistent results, with very little variation among intraclass correlation coefficient values for the various test positions. Conclusions: The results of our study clearly indicate that the side-lying test procedure is of equal or greater value than the traditional supine-with-overpressure method. PMID:25188316
Covariate-free and Covariate-dependent Reliability.

PubMed

Bentler, Peter M

2016-12-01

Classical test theory reliability coefficients are said to be population specific. Reliability generalization, a meta-analysis method, is the main procedure for evaluating the stability of reliability coefficients across populations. A new approach is developed to evaluate the degree of invariance of reliability coefficients to population characteristics. Factor or common variance of a reliability measure is partitioned into parts that are, and are not, influenced by control variables, resulting in a partition of reliability into a covariate-dependent and a covariate-free part. The approach can be implemented in a single sample and can be applied to a variety of reliability coefficients.
A Psychometric Study of the Bayley Scales of Infant and Toddler Development in Persian Language Children.

PubMed

Azari, Nadia; Soleimani, Farin; Vameghi, Roshanak; Sajedi, Firoozeh; Shahshahani, Soheila; Karimi, Hossein; Kraskian, Adis; Shahrokhi, Amin; Teymouri, Robab; Gharib, Masoud

2017-01-01

Bayley Scales of infant & toddler development is a well-known diagnostic developmental assessment tool for children aged 1-42 months. Our aim was investigating the validity & reliability of this scale in Persian speaking children. The method was descriptive-analytic. Translation- back translation and cultural adaptation was done. Content & face validity of translated scale was determined by experts' opinions. Overall, 403 children aged 1 to 42 months were recruited from health centers of Tehran, during years of 2013-2014 for developmental assessment in cognitive, communicative (receptive & expressive) and motor (fine & gross) domains. Reliability of scale was calculated through three methods; internal consistency using Cronbach's alpha coefficient, test-retest and interrater methods. Construct validity was calculated using factor analysis and comparison of the mean scores methods. Cultural and linguistic changes were made in items of all domains especially on communication subscale. Content and face validity of the test were approved by experts' opinions. Cronbach's alpha coefficient was above 0.74 in all domains. Pearson correlation coefficient in various domains, were ≥ 0.982 in test retest method, and ≥0.993 in inter-rater method. Construct validity of the test was approved by factor analysis. Moreover, the mean scores for the different age groups were compared and statistically significant differences were observed between mean scores of different age groups, that confirms validity of the test. The Bayley Scales of Infant and Toddler Development is a valid and reliable tool for child developmental assessment in Persian language children.
Reliability and Repetition Effect of the Center of Pressure and Kinematics Parameters That Characterize Trunk Postural Control During Unstable Sitting Test.

PubMed

Barbado, David; Moreside, Janice; Vera-Garcia, Francisco J

2017-03-01

Although unstable seat methodology has been used to assess trunk postural control, the reliability of the variables that characterize it remains unclear. To analyze reliability and learning effect of center of pressure (COP) and kinematic parameters that characterize trunk postural control performance in unstable seating. The relationships between kinematic and COP parameters also were explored. Test-retest reliability design. Biomechanics laboratory setting. Twenty-three healthy male subjects. Participants volunteered to perform 3 sessions at 1-week intervals, each consisting of five 70-second balancing trials. A force platform and a motion capture system were used to measure COP and pelvis, thorax, and spine displacements. Reliability was assessed through standard error of measurement (SEM) and intraclass correlation coefficients (ICC 2,1 ) using 3 methods: (1) comparing the last trial score of each day; (2) comparing the best trial score of each day; and (3) calculating the average of the three last trial scores of each day. Standard deviation and mean velocity were calculated to assess balance performance. Although analyses of variance showed some differences in balance performance between days, these differences were not significant between days 2 and 3. Best result and average methods showed the greatest reliability. Mean velocity of the COP showed high reliability (0.71 < ICC < 0.86; 10.3 < SEM < 13.0), whereas standard deviation only showed a low to moderate reliability (0.37 < ICC < 0.61; 14.5 < SEM < 23.0). Regarding the kinematic variables, only pelvis displacement mean velocity achieved a high reliability using the average method (0.62 < ICC < 0.83; 18.8 < SEM < 23.1). Correlations between COP and kinematics were high only for mean velocity (0.45
The Healthcare Complaints Analysis Tool: development and reliability testing of a method for service monitoring and organisational learning

PubMed Central

Gillespie, Alex; Reader, Tom W

2016-01-01

Background Letters of complaint written by patients and their advocates reporting poor healthcare experiences represent an under-used data source. The lack of a method for extracting reliable data from these heterogeneous letters hinders their use for monitoring and learning. To address this gap, we report on the development and reliability testing of the Healthcare Complaints Analysis Tool (HCAT). Methods HCAT was developed from a taxonomy of healthcare complaints reported in a previously published systematic review. It introduces the novel idea that complaints should be analysed in terms of severity. Recruiting three groups of educated lay participants (n=58, n=58, n=55), we refined the taxonomy through three iterations of discriminant content validity testing. We then supplemented this refined taxonomy with explicit coding procedures for seven problem categories (each with four levels of severity), stage of care and harm. These combined elements were further refined through iterative coding of a UK national sample of healthcare complaints (n= 25, n=80, n=137, n=839). To assess reliability and accuracy for the resultant tool, 14 educated lay participants coded a referent sample of 125 healthcare complaints. Results The seven HCAT problem categories (quality, safety, environment, institutional processes, listening, communication, and respect and patient rights) were found to be conceptually distinct. On average, raters identified 1.94 problems (SD=0.26) per complaint letter. Coders exhibited substantial reliability in identifying problems at four levels of severity; moderate and substantial reliability in identifying stages of care (except for ‘discharge/transfer’ that was only fairly reliable) and substantial reliability in identifying overall harm. Conclusions HCAT is not only the first reliable tool for coding complaints, it is the first tool to measure the severity of complaints. It facilitates service monitoring and organisational learning and it enables future research examining whether healthcare complaints are a leading indicator of poor service outcomes. HCAT is freely available to download and use. PMID:26740496
Reliability and Validity of a New Method for Isometric Back Extensor Strength Evaluation Using A Hand-Held Dynamometer

PubMed Central

2017-01-01

Objective To investigate the reliability and validity of a new method for isometric back extensor strength measurement using a portable dynamometer. Methods A chair equipped with a small portable dynamometer was designed (Power Track II Commander Muscle Tester). A total of 15 men (mean age, 34.8±7.5 years) and 15 women (mean age, 33.1±5.5 years) with no current back problems or previous history of back surgery were recruited. Subjects were asked to push the back of the chair while seated, and their isometric back extensor strength was measured by the portable dynamometer. Test-retest reliability was assessed with intraclass correlation coefficient (ICC). For the validity assessment, isometric back extensor strength of all subjects was measured by a widely used physical performance evaluation instrument, BTE PrimusRS system. The limit of agreement (LoA) from the Bland-Altman plot was evaluated between two methods. Results The test-retest reliability was excellent (ICC=0.82; 95% confidence interval, 0.65–0.91). The Bland-Altman plots demonstrated acceptable agreement between the two methods: the lower 95% LoA was −63.1 N and the upper 95% LoA was 61.1 N. Conclusion This study shows that isometric back extensor strength measurement using a portable dynamometer has good reliability and validity. PMID:29201818
Impact of mounting methods in computerized axiography on assessment of condylar inclination.

PubMed

Schierz, Oliver; Wagner, Philipp; Rauch, Angelika; Reissmann, Daniel R

2017-08-30

Valid and reliable recording is a key requirement for accurately simulating individual jaw movements. Horizontal condylar inclination (HCI) and Bennett's angle were measured using a digital jaw tracker (Cadiax® Compact 2) in 27 young adults. Three mounting methods (paraocclusal tray adapter, periocclusal tray adapter, and tray adapter with mandibular clamp) were tested. The mean values of the HCI differed by up to 10° between the mounting methods; however, the values for Bennett's angle did not differ substantially. While the intersession reliability of the Bennett's angle assessment did not depend on the mounting method, the reliability of the HCI assessment was only fair to good for the paraocclusal mounting method but poor for both periocclusal mounting methods. For attaching the tracing bow of jaw trackers to the mandible, a paraocclusal tray adapter should be applied, to achieve the most reliable results.
Assessment of deep tissue hyperalgesia in the groin - a method comparison of electrical vs. pressure stimulation.

PubMed

Aasvang, E K; Werner, M U; Kehlet, H

2014-09-01

Deep pain complaints are more frequent than cutaneous in post-surgical patients, and a prevalent finding in quantitative sensory testing studies. However, the preferred assessment method - pressure algometry - is indirect and tissue unspecific, hindering advances in treatment and preventive strategies. Thus, there is a need for development of methods with direct stimulation of suspected hyperalgesic tissues to identify the peripheral origin of nociceptive input. We compared the reliability of an ultrasound-guided needle stimulation protocol of electrical detection and pain thresholds to pressure algometry, by performing identical test-retest sequences 10 days apart, in deep tissues in the groin region. Electrical stimulation was performed by five up-and-down staircase series of single impulses of 0.04 ms duration, starting from 0 mA in increments of 0.2 mA until a threshold was reached and descending until sensation was lost. Method reliability was assessed by Bland-Altman plots, descriptive statistics, coefficients of variance and intraclass correlation coefficients. The electrical stimulation method was comparable to pressure algometry regarding 10 days test-retest repeatability, but with superior same-day reliability for electrical stimulation (P < 0.05). Between-subject variance rather than within-subject variance was the main source for test variation. There were no systematic differences in electrical thresholds across tissues and locations (P > 0.05). The presented tissue-specific direct deep tissue electrical stimulation technique has equal or superior reliability compared with the indirect tissue-unspecific stimulation by pressure algometry. This method may facilitate advances in mechanism based preventive and treatment strategies in acute and chronic post-surgical pain states. © 2014 The Acta Anaesthesiologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
Methodology for Developing a New EFNEP Food and Physical Activity Behaviors Questionnaire.

PubMed

Murray, Erin K; Auld, Garry; Baker, Susan S; Barale, Karen; Franck, Karen; Khan, Tarana; Palmer-Keenan, Debra; Walsh, Jennifer

2017-10-01

Research methods are described for developing a food and physical activity behaviors questionnaire for the Expanded Food and Nutrition Education Program (EFNEP), a US Department of Agriculture nutrition education program serving low-income families. Mixed-methods observational study. The questionnaire will include 5 domains: (1) diet quality, (2) physical activity, (3) food safety, (4) food security, and (5) food resource management. A 5-stage process will be used to assess the questionnaire's test-retest reliability and content, face, and construct validity. Research teams across the US will coordinate questionnaire development and testing nationally. Convenience samples of low-income EFNEP, or EFNEP-eligible, adult participants across the US. A 5-stage process: (1) prioritize domain concepts to evaluate (2) question generation and content analysis panel, (3) question pretesting using cognitive interviews, (4) test-retest reliability assessment, and (5) construct validity testing. A nationally tested valid and reliable food and physical activity behaviors questionnaire for low-income adults to evaluate EFNEP's effectiveness. Cognitive interviews will be summarized to identify themes and dominant trends. Paired t tests (P ≤ .05) and Spearman and intra-class correlation coefficients (r > .5) will be conducted to assess reliability. Construct validity will be assessed using Wilcoxon t test (P ≤ .05), Spearman correlations, and Bland-Altman plots. Copyright © 2017 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
Normative Data for an Instrumental Assessment of the Upper-Limb Functionality.

PubMed

Caimmi, Marco; Guanziroli, Eleonora; Malosio, Matteo; Pedrocchi, Nicola; Vicentini, Federico; Molinari Tosatti, Lorenzo; Molteni, Franco

2015-01-01

Upper-limb movement analysis is important to monitor objectively rehabilitation interventions, contributing to improving the overall treatments outcomes. Simple, fast, easy-to-use, and applicable methods are required to allow routinely functional evaluation of patients with different pathologies and clinical conditions. This paper describes the Reaching and Hand-to-Mouth Evaluation Method, a fast procedure to assess the upper-limb motor control and functional ability, providing a set of normative data from 42 healthy subjects of different ages, evaluated for both the dominant and the nondominant limb motor performance. Sixteen of them were reevaluated after two weeks to perform test-retest reliability analysis. Data were clustered into three subgroups of different ages to test the method sensitivity to motor control differences. Experimental data show notable test-retest reliability in all tasks. Data from older and younger subjects show significant differences in the measures related to the ability for coordination thus showing the high sensitivity of the method to motor control differences. The presented method, provided with control data from healthy subjects, appears to be a suitable and reliable tool for the upper-limb functional assessment in the clinical environment.
Normative Data for an Instrumental Assessment of the Upper-Limb Functionality

PubMed Central

Caimmi, Marco; Guanziroli, Eleonora; Malosio, Matteo; Pedrocchi, Nicola; Vicentini, Federico; Molinari Tosatti, Lorenzo; Molteni, Franco

2015-01-01

Upper-limb movement analysis is important to monitor objectively rehabilitation interventions, contributing to improving the overall treatments outcomes. Simple, fast, easy-to-use, and applicable methods are required to allow routinely functional evaluation of patients with different pathologies and clinical conditions. This paper describes the Reaching and Hand-to-Mouth Evaluation Method, a fast procedure to assess the upper-limb motor control and functional ability, providing a set of normative data from 42 healthy subjects of different ages, evaluated for both the dominant and the nondominant limb motor performance. Sixteen of them were reevaluated after two weeks to perform test-retest reliability analysis. Data were clustered into three subgroups of different ages to test the method sensitivity to motor control differences. Experimental data show notable test-retest reliability in all tasks. Data from older and younger subjects show significant differences in the measures related to the ability for coordination thus showing the high sensitivity of the method to motor control differences. The presented method, provided with control data from healthy subjects, appears to be a suitable and reliable tool for the upper-limb functional assessment in the clinical environment. PMID:26539500
The Noninvasive Measurement of X-Ray Tube Potential.

NASA Astrophysics Data System (ADS)

Ranallo, Frank Nunzio

In this thesis I briefly describe the design of clinical x-ray imaging systems and also the various methods of measuring x-ray tube potential, both invasive and noninvasive. I also discuss the meaning and usage of the quantities tube potential (kV) and peak tube potential (kVp) with reference to x-ray systems used in medical imaging. I propose that there exist several quantities which describe different important aspects of the tube potential as a function of time. These quantities are measurable and can be well defined. I have developed a list of definitions of these quantities along with suggested names and symbols. I describe the development and physical principles of a superior noninvasive method of tube potential measurement along with the instrumentation used to implement this method. This thesis research resulted in the development of several commercial kVp test devices (or "kVp Meters") for which the actual measurement procedure is simple, rapid, and reliable compared to other methods, invasive or noninvasive. These kVp test devices provide measurements with a high level of accuracy and reliability over a wide range of test conditions. They provide results which are more reliable and clinically meaningful than many other, more primary and invasive methods. The errors inherent in these new kVp test devices were investigated and methods to minimize them are discussed.
Test-retest reliability of cognitive EEG

NASA Technical Reports Server (NTRS)

McEvoy, L. K.; Smith, M. E.; Gevins, A.

2000-01-01

OBJECTIVE: Task-related EEG is sensitive to changes in cognitive state produced by increased task difficulty and by transient impairment. If task-related EEG has high test-retest reliability, it could be used as part of a clinical test to assess changes in cognitive function. The aim of this study was to determine the reliability of the EEG recorded during the performance of a working memory (WM) task and a psychomotor vigilance task (PVT). METHODS: EEG was recorded while subjects rested quietly and while they performed the tasks. Within session (test-retest interval of approximately 1 h) and between session (test-retest interval of approximately 7 days) reliability was calculated for four EEG components: frontal midline theta at Fz, posterior theta at Pz, and slow and fast alpha at Pz. RESULTS: Task-related EEG was highly reliable within and between sessions (r0.9 for all components in WM task, and r0.8 for all components in the PVT). Resting EEG also showed high reliability, although the magnitude of the correlation was somewhat smaller than that of the task-related EEG (r0.7 for all 4 components). CONCLUSIONS: These results suggest that under appropriate conditions, task-related EEG has sufficient retest reliability for use in assessing clinical changes in cognitive status.
Inter-arch digital model vs. manual cast measurements: Accuracy and reliability.

PubMed

Kiviahde, Heikki; Bukovac, Lea; Jussila, Päivi; Pesonen, Paula; Sipilä, Kirsi; Raustia, Aune; Pirttiniemi, Pertti

2017-06-28

The purpose of this study was to evaluate the accuracy and reliability of inter-arch measurements using digital dental models and conventional dental casts. Thirty sets of dental casts with permanent dentition were examined. Manual measurements were done with a digital caliper directly on the dental casts, and digital measurements were made on 3D models by two independent examiners. Intra-class correlation coefficients (ICC), a paired sample t-test or Wilcoxon signed-rank test, and Bland-Altman plots were used to evaluate intra- and inter-examiner error and to determine the accuracy and reliability of the measurements. The ICC values were generally good for manual and excellent for digital measurements. The Bland-Altman plots of all the measurements showed good agreement between the manual and digital methods and excellent inter-examiner agreement using the digital method. Inter-arch occlusal measurements on digital models are accurate and reliable and are superior to manual measurements.

Automatically generated acceptance test: A software reliability experiment

NASA Technical Reports Server (NTRS)

Protzel, Peter W.

1988-01-01

This study presents results of a software reliability experiment investigating the feasibility of a new error detection method. The method can be used as an acceptance test and is solely based on empirical data about the behavior of internal states of a program. The experimental design uses the existing environment of a multi-version experiment previously conducted at the NASA Langley Research Center, in which the launch interceptor problem is used as a model. This allows the controlled experimental investigation of versions with well-known single and multiple faults, and the availability of an oracle permits the determination of the error detection performance of the test. Fault interaction phenomena are observed that have an amplifying effect on the number of error occurrences. Preliminary results indicate that all faults examined so far are detected by the acceptance test. This shows promise for further investigations, and for the employment of this test method on other applications.
Test-Retest Reliability of the Preschool Age Psychiatric Assessment (PAPA)

ERIC Educational Resources Information Center

Egger, Helen Link; Erkanli, Alaattin; Keeler, Gordon; Potts, Edward; Walter, Barbara Keith; Angold, Adrian

2006-01-01

Objective: To examine the test-retest reliability of a new interviewer-based psychiatric diagnostic measure (the Preschool Age Psychiatric Assessment) for use with parents of preschoolers 2 to 5 years old. Method: A total of 1,073 parents of children attending a large pediatric clinic completed the Child Behavior Checklist 1 1/2-5. For 18 months,…
Psychometric Properties of the Adolescent Health Concern Inventory: The Persian Version

PubMed Central

Baheiraei, Azam; Ahmadi, Fazlollah; Foroushani, Abbas Rahimi; Ghofranipour, Fazlollah; Weiler, Robert M

2013-01-01

Objective It is important to consider the health concerns of adolescents before developing and implementing public health promotion or health education curriculum programs aimed at ameliorating priority health problems experienced by adolescents. The aim of this study was to test the psychometric properties of the original Adolescent Health Concern Inventory (AHCI) for use with an Iranian population. Methods This was a methodological study in which 50 adolescents with age range of 14-18 years were selected using convenience sampling. The translation and cultural adaptation process of The AHCI followed recognized and established guidelines. The face and content validity was established by analyzing feedback solicited from teenagers and professionals with expertise in health, sociology and psychology. Reliability was examined using test-retest and Cronbach's alpha for internal consistency reliability. Kappa and McNemar tests were used to examine test-retest reliability for each item. Results Minor cultural differences were identified and resolved during the translation process and determining the validity of the checklist. Results from Kappa and McNemar tests indicate a high degree of test-retest reliability. Internal consistency reliability as measured by Cronbach's alpha for the subscales were between 0.68 and 0.87 with total instrument reliability of 0.96 indicating considerable overall reliability. Conclusion The Persian version of the AHCI appears valid and reliable. Hence, it can be used for filling a gap in identifying the adolescents’ health concerns in the research and community settings and school health education programs in Iran to design appropriate interventions. PMID:23682249
Validity and Reliability of the 8-Item Work Limitations Questionnaire.

PubMed

Walker, Timothy J; Tullar, Jessica M; Diamond, Pamela M; Kohl, Harold W; Amick, Benjamin C

2017-12-01

Purpose To evaluate factorial validity, scale reliability, test-retest reliability, convergent validity, and discriminant validity of the 8-item Work Limitations Questionnaire (WLQ) among employees from a public university system. Methods A secondary analysis using de-identified data from employees who completed an annual Health Assessment between the years 2009-2015 tested research aims. Confirmatory factor analysis (CFA) (n = 10,165) tested the latent structure of the 8-item WLQ. Scale reliability was determined using a CFA-based approach while test-retest reliability was determined using the intraclass correlation coefficient. Convergent/discriminant validity was tested by evaluating relations between the 8-item WLQ with health/performance variables for convergent validity (health-related work performance, number of chronic conditions, and general health) and demographic variables for discriminant validity (gender and institution type). Results A 1-factor model with three correlated residuals demonstrated excellent model fit (CFI = 0.99, TLI = 0.99, RMSEA = 0.03, and SRMR = 0.01). The scale reliability was acceptable (0.69, 95% CI 0.68-0.70) and the test-retest reliability was very good (ICC = 0.78). Low-to-moderate associations were observed between the 8-item WLQ and the health/performance variables while weak associations were observed between the demographic variables. Conclusions The 8-item WLQ demonstrated sufficient reliability and validity among employees from a public university system. Results suggest the 8-item WLQ is a usable alternative for studies when the more comprehensive 25-item WLQ is not available.
Data mining-based coefficient of influence factors optimization of test paper reliability

NASA Astrophysics Data System (ADS)

Xu, Peiyao; Jiang, Huiping; Wei, Jieyao

2018-05-01

Test is a significant part of the teaching process. It demonstrates the final outcome of school teaching through teachers' teaching level and students' scores. The analysis of test paper is a complex operation that has the characteristics of non-linear relation in the length of the paper, time duration and the degree of difficulty. It is therefore difficult to optimize the coefficient of influence factors under different conditions in order to get text papers with clearly higher reliability with general methods [1]. With data mining techniques like Support Vector Regression (SVR) and Genetic Algorithm (GA), we can model the test paper analysis and optimize the coefficient of impact factors for higher reliability. It's easy to find that the combination of SVR and GA can get an effective advance in reliability from the test results. The optimal coefficient of influence factors optimization has a practicability in actual application, and the whole optimizing operation can offer model basis for test paper analysis.
Reference values for the muscle power sprint test in 6- to 12-year-old children.

PubMed

Douma-van Riet, Danielle; Verschuren, Olaf; Jelsma, Dorothee; Kruitwagen, Cas; Smits-Engelsman, Bouwien; Takken, Tim

2012-01-01

The aims of this study were (1) to develop centile reference values for anaerobic performance of Dutch children tested using the Muscle Power Sprint Test (MPST) and (2) to examine the test-retest reliability of the MPST. Children who were developing typically (178 boys and 201 girls) and aged 6 to 12 years (mean = 8.9 years) were recruited. The MPST was administered to 379 children, and test-retest reliability was examined in 47 children. MPST scores were transformed into centile curves, which were created using generalized additive models for location, scale, and shape. Height-related reference curves were created for both genders. Excellent (intraclass correlation coefficient = 0.98) test-retest reliability was demonstrated. The reference values for the MPST of children who are developing typically and aged 6 to 12 years can serve as a clinical standard in pediatric physical therapy practice. The MPST is a reliable and practical method for determining anaerobic performance in children.
Improving the Test-Retest Reliability of Resting State fMRI by Removing the Impact of Sleep.

PubMed

Wang, Jiahui; Han, Junwei; Nguyen, Vinh T; Guo, Lei; Guo, Christine C

2017-01-01

Resting state functional magnetic resonance imaging (rs-fMRI) provides a powerful tool to examine large-scale neural networks in the human brain and their disturbances in neuropsychiatric disorders. Thanks to its low demand and high tolerance, resting state paradigms can be easily acquired from clinical population. However, due to the unconstrained nature, resting state paradigm is associated with excessive head movement and proneness to sleep. Consequently, the test-retest reliability of rs-fMRI measures is moderate at best, falling short of widespread use in the clinic. Here, we characterized the effect of sleep on the test-retest reliability of rs-fMRI. Using measures of heart rate variability (HRV) derived from simultaneous electrocardiogram (ECG) recording, we identified portions of fMRI data when subjects were more alert or sleepy, and examined their effects on the test-retest reliability of functional connectivity measures. When volumes of sleep were excluded, the reliability of rs-fMRI is significantly improved, and the improvement appears to be general across brain networks. The amount of improvement is robust with the removal of as much as 60% volumes of sleepiness. Therefore, test-retest reliability of rs-fMRI is affected by sleep and could be improved by excluding volumes of sleepiness as indexed by HRV. Our results suggest a novel and practical method to improve test-retest reliability of rs-fMRI measures.
The test-retest reliability of the latent construct of executive function depends on whether tasks are represented as formative or reflective indicators.

PubMed

Willoughby, Michael T; Kuhn, Laura J; Blair, Clancy B; Samek, Anya; List, John A

2017-10-01

This study investigates the test-retest reliability of a battery of executive function (EF) tasks with a specific interest in testing whether the method that is used to create a battery-wide score would result in differences in the apparent test-retest reliability of children's performance. A total of 188 4-year-olds completed a battery of computerized EF tasks twice across a period of approximately two weeks. Two different approaches were used to create a score that indexed children's overall performance on the battery-i.e., (1) the mean score of all completed tasks and (2) a factor score estimate which used confirmatory factor analysis (CFA). Pearson and intra-class correlations were used to investigate the test-retest reliability of individual EF tasks, as well as an overall battery score. Consistent with previous studies, the test-retest reliability of individual tasks was modest (rs ≈ .60). The test-retest reliability of the overall battery scores differed depending on the scoring approach (r mean = .72; r factor_ score = .99). It is concluded that the children's performance on individual EF tasks exhibit modest levels of test-retest reliability. This underscores the importance of administering multiple tasks and aggregating performance across these tasks in order to improve precision of measurement. However, the specific strategy that is used has a large impact on the apparent test-retest reliability of the overall score. These results replicate our earlier findings and provide additional cautionary evidence against the routine use of factor analytic approaches for representing individual performance across a battery of EF tasks.
Validity and reliability of bioelectrical impedance analysis and skinfold thickness in predicting body fat in military personnel.

PubMed

Aandstad, Anders; Holtberget, Kristian; Hageberg, Rune; Holme, Ingar; Anderssen, Sigmund A

2014-02-01

Previous studies show that body composition is related to injury risk and physical performance in soldiers. Thus, valid methods for measuring body composition in military personnel are needed. The frequently used body mass index method is not a valid measure of body composition in soldiers, but reliability and validity of alternative field methods are less investigated in military personnel. Thus, we carried out test and retest of skinfold (SKF), single frequency bioelectrical impedance analysis (SF-BIA), and multifrequency bioelectrical impedance analysis measurements in 65 male and female soldiers. Several validated equations were used to predict percent body fat from these methods. Dual-energy X-ray absorptiometry was also measured, and acted as the criterion method. Results showed that SF-BIA was the most reliable method in both genders. In women, SF-BIA was also the most valid method, whereas SKF or a combination of SKF and SF-BIA produced the highest validity in men. Reliability and validity varied substantially among the equations examined. The best methods and equations produced test-retest 95% limits of agreement below ±1% points, whereas the corresponding validity figures were ±3.5% points. Each investigator and practitioner must consider whether such measurement errors are acceptable for its specific use. Reprint & Copyright © 2014 Association of Military Surgeons of the U.S.
Validity and Reliability Study of the Korean Tinetti Mobility Test for Parkinson's Disease.

PubMed

Park, Jinse; Koh, Seong-Beom; Kim, Hee Jin; Oh, Eungseok; Kim, Joong-Seok; Yun, Ji Young; Kwon, Do-Young; Kim, Younsoo; Kim, Ji Seon; Kwon, Kyum-Yil; Park, Jeong-Ho; Youn, Jinyoung; Jang, Wooyoung

2018-01-01

Postural instability and gait disturbance are the cardinal symptoms associated with falling among patients with Parkinson's disease (PD). The Tinetti mobility test (TMT) is a well-established measurement tool used to predict falls among elderly people. However, the TMT has not been established or widely used among PD patients in Korea. The purpose of this study was to evaluate the reliability and validity of the Korean version of the TMT for PD patients. Twenty-four patients diagnosed with PD were enrolled in this study. For the interrater reliability test, thirteen clinicians scored the TMT after watching a video clip. We also used the test-retest method to determine intrarater reliability. For concurrent validation, the unified Parkinson's disease rating scale, Hoehn and Yahr staging, Berg Balance Scale, Timed-Up and Go test, 10-m walk test, and gait analysis by three-dimensional motion capture were also used. We analyzed receiver operating characteristic curve to predict falling. The interrater reliability and intrarater reliability of the Korean Tinetti balance scale were 0.97 and 0.98, respectively. The interrater reliability and intra-rater reliability of the Korean Tinetti gait scale were 0.94 and 0.96, respectively. The Korean TMT scores were significantly correlated with the other clinical scales and three-dimensional motion capture. The cutoff values for predicting falling were 14 points (balance subscale) and 10 points (gait subscale). We found that the Korean version of the TMT showed excellent validity and reliability for gait and balance and had high sensitivity and specificity for predicting falls among patients with PD.
Effective core potential calculations on small molecules containing transition metal atoms

NASA Astrophysics Data System (ADS)

Gropen, O.; Wahlgren, U.; Pettersson, L.

1982-04-01

A series of test calculations on diatomic oxides and hydrides of Sc, Ti, Cr, Ni and Zn have been carried out in order to test the reliability of some pseudopotential methods. Several different forms of some pseudopotential operators were used. Only the highest valence orbitals of each atomic symmetry were explicitly included in the calculations. The results indicate that there are problems associated with all the investigated operators particularly for the lighter transition elements. It is suggested that more reliable results may be obtained with pseudopotential methods using smaller cores.
A Psychometric Study of the Bayley Scales of Infant and Toddler Development in Persian Language Children

PubMed Central

AZARI, Nadia; SOLEIMANI, Farin; VAMEGHI, Roshanak; SAJEDI, Firoozeh; SHAHSHAHANI, Soheila; KARIMI, Hossein; KRASKIAN, Adis; SHAHROKHI, Amin; TEYMOURI, Robab; GHARIB, Masoud

2017-01-01

Objective Bayley Scales of infant & toddler development is a well-known diagnostic developmental assessment tool for children aged 1–42 months. Our aim was investigating the validity & reliability of this scale in Persian speaking children. Materials & Methods The method was descriptive-analytic. Translation- back translation and cultural adaptation was done. Content & face validity of translated scale was determined by experts’ opinions. Overall, 403 children aged 1 to 42 months were recruited from health centers of Tehran, during years of 2013-2014 for developmental assessment in cognitive, communicative (receptive & expressive) and motor (fine & gross) domains. Reliability of scale was calculated through three methods; internal consistency using Cronbach’s alpha coefficient, test-retest and interrater methods. Construct validity was calculated using factor analysis and comparison of the mean scores methods. Results Cultural and linguistic changes were made in items of all domains especially on communication subscale. Content and face validity of the test were approved by experts’ opinions. Cronbach’s alpha coefficient was above 0.74 in all domains. Pearson correlation coefficient in various domains, were ≥ 0.982 in test retest method, and ≥0.993 in inter-rater method. Construct validity of the test was approved by factor analysis. Moreover, the mean scores for the different age groups were compared and statistically significant differences were observed between mean scores of different age groups, that confirms validity of the test. Conclusion The Bayley Scales of Infant and Toddler Development is a valid and reliable tool for child developmental assessment in Persian language children. PMID:28277556
Method-independent, Computationally Frugal Convergence Testing for Sensitivity Analysis Techniques

NASA Astrophysics Data System (ADS)

Mai, Juliane; Tolson, Bryan

2017-04-01

The increasing complexity and runtime of environmental models lead to the current situation that the calibration of all model parameters or the estimation of all of their uncertainty is often computationally infeasible. Hence, techniques to determine the sensitivity of model parameters are used to identify most important parameters or model processes. All subsequent model calibrations or uncertainty estimation procedures focus then only on these subsets of parameters and are hence less computational demanding. While the examination of the convergence of calibration and uncertainty methods is state-of-the-art, the convergence of the sensitivity methods is usually not checked. If any, bootstrapping of the sensitivity results is used to determine the reliability of the estimated indexes. Bootstrapping, however, might as well become computationally expensive in case of large model outputs and a high number of bootstraps. We, therefore, present a Model Variable Augmentation (MVA) approach to check the convergence of sensitivity indexes without performing any additional model run. This technique is method- and model-independent. It can be applied either during the sensitivity analysis (SA) or afterwards. The latter case enables the checking of already processed sensitivity indexes. To demonstrate the method independency of the convergence testing method, we applied it to three widely used, global SA methods: the screening method known as Morris method or Elementary Effects (Morris 1991, Campolongo et al., 2000), the variance-based Sobol' method (Solbol' 1993, Saltelli et al. 2010) and a derivative-based method known as Parameter Importance index (Goehler et al. 2013). The new convergence testing method is first scrutinized using 12 analytical benchmark functions (Cuntz & Mai et al. 2015) where the true indexes of aforementioned three methods are known. This proof of principle shows that the method reliably determines the uncertainty of the SA results when different budgets are used for the SA. Subsequently, we focus on the model-independency by testing the frugal method using the hydrologic model mHM (www.ufz.de/mhm) with about 50 model parameters. The results show that the new frugal method is able to test the convergence and therefore the reliability of SA results in an efficient way. The appealing feature of this new technique is the necessity of no further model evaluation and therefore enables checking of already processed (and published) sensitivity results. This is one step towards reliable and transferable, published sensitivity results.
Reliability and validity of the Korean version of the community balance and mobility scale in patients with hemiplegia after stroke

PubMed Central

Lee, Kyoung-bo; Lee, Paul; Yoo, Sang-won; Kim, Young-dong

2016-01-01

[Purpose] The aim of this study was to translate and adapt the Community Balance and Mobility Scale (CB&M) into Korean (K-CB&M) and to verify the reliability and validity of scores obtained with Korean patients. [Subjects and Methods] A total of 16 subjects were recruited from St. Vincent’s Hospital in South Korea. At each testing session, subjects completed the K-CB&M, Berg balance scale (BBS), timed up and go test (TUG), and functional reaching test. All tests were administered by a physical therapist, and subjects completed the tests in an identical standardized order during all testing sessions. [Results] The inter- and intra-rater reliability coefficients were high for most subscores, while moderate inter-rater reliability was observed for the items “walking and looking” and “walk, look, and carry”, and moderate intra-rater reliability was observed for “forward to backward walking”. There was a positive correlation between the K-CB&M and BBS and a negative correlation between the K-CB&M and TUG in the convergent validity assessments. [Conclusion] The reliability and validity of the K-CB&M was high, suggesting that clinical practitioners treating Korean patients with hemiplegia can use this material for assessing static and dynamic balance. PMID:27630420
Measurement methods to assess diastasis of the rectus abdominis muscle (DRAM): A systematic review of their measurement properties and meta-analytic reliability generalisation.

PubMed

van de Water, A T M; Benjamin, D R

2016-02-01

Systematic literature review. Diastasis of the rectus abdominis muscle (DRAM) has been linked with low back pain, abdominal and pelvic dysfunction. Measurement is used to either screen or to monitor DRAM width. Determining which methods are suitable for screening and monitoring DRAM is of clinical value. To identify the best methods to screen for DRAM presence and monitor DRAM width. AMED, Embase, Medline, PubMed and CINAHL databases were searched for measurement property studies of DRAM measurement methods. Population characteristics, measurement methods/procedures and measurement information were extracted from included studies. Quality of all studies was evaluated using 'quality rating criteria'. When possible, reliability generalisation was conducted to provide combined reliability estimations. Thirteen studies evaluated measurement properties of the 'finger width'-method, tape measure, calipers, ultrasound, CT and MRI. Ultrasound was most evaluated. Methodological quality of these studies varied widely. Pearson's correlations of r = 0.66-0.79 were found between calipers and ultrasound measurements. Calipers and ultrasound had Intraclass Correlation Coefficients (ICC) of 0.78-0.97 for test-retest, inter- and intra-rater reliability. The 'finger width'-method had weighted Kappa's of 0.73-0.77 for test-retest reliability, but moderate agreement (63%; weighted Kappa = 0.53) between raters. Comparing calipers and ultrasound, low measurement error was found (above the umbilicus), and the methods had good agreement (83%; weighted Kappa = 0.66) for discriminative purposes. The available information support ultrasound and calipers as adequate methods to assess DRAM. For other methods limited measurement information of low to moderate quality is available and further evaluation of their measurement properties is required. Copyright © 2015 Elsevier Ltd. All rights reserved.
Clinical usefulness of the pendulum test using a NK table to measure the spasticity of patients with brain lesions.

PubMed

Kim, Yong-Wook

2013-10-01

. [Purpose] The purpose of the present study was to investigate the clinical usefulness (reliability and validity) of the pendulum test using a Noland-Kuckhoff (NK) table with an attached electrogoniometer to measure the spasticity of patients with brain lesions. [Subjects] The subjects were 31 patients with stroke or traumatic brain injury. [Methods] The intraclass correlation coefficient (ICC) was used to verify the test-retest reliability of spasticity measures obtained using the pendulum test. Pearson's product correlation coefficient was used to examine the validity of the pendulum test using the amplitude of the patellar tendon reflex (PTR) test, an objective and quantitative measure of spasticity. [Results] The test-retest reliability was high, reflecting a significant correlation between the test and the retest (ICCs = 0.95-0.97). A significant negative correlation was found between the amplitude of the PTR test and the four variables measured in the pendulum test (r = -0.77- -0.85). [Conclusion] The pendulum test using a NK table is an objective measure of spasticity and can be used in the clinical setting in place of more expensive and complicated equipment. Further studies are needed to investigate the therapeutic effect of this method on spasticity.
A comparison of reliability and conventional estimation of safe fatigue life and safe inspection intervals

NASA Technical Reports Server (NTRS)

Hooke, F. H.

1972-01-01

Both the conventional and reliability analyses for determining safe fatigue life are predicted on a population having a specified (usually log normal) distribution of life to collapse under a fatigue test load. Under a random service load spectrum, random occurrences of load larger than the fatigue test load may confront and cause collapse of structures which are weakened, though not yet to the fatigue test load. These collapses are included in reliability but excluded in conventional analysis. The theory of risk determination by each method is given, and several reasonably typical examples have been worked out, in which it transpires that if one excludes collapse through exceedance of the uncracked strength, the reliability and conventional analyses gave virtually identical probabilities of failure or survival.
Reliability of perceived neighbourhood conditions and the effects of measurement error on self-rated health across urban and rural neighbourhoods.

PubMed

Pruitt, Sandi L; Jeffe, Donna B; Yan, Yan; Schootman, Mario

2012-04-01

Limited psychometric research has examined the reliability of self-reported measures of neighbourhood conditions, the effect of measurement error on associations between neighbourhood conditions and health, and potential differences in the reliabilities between neighbourhood strata (urban vs rural and low vs high poverty). We assessed overall and stratified reliability of self-reported perceived neighbourhood conditions using five scales (social and physical disorder, social control, social cohesion, fear) and four single items (multidimensional neighbouring). We also assessed measurement error-corrected associations of these conditions with self-rated health. Using random-digit dialling, 367 women without breast cancer (matched controls from a larger study) were interviewed twice, 2-3 weeks apart. Test-retest (intraclass correlation coefficients (ICC)/weighted κ) and internal consistency reliability (Cronbach's α) were assessed. Differences in reliability across neighbourhood strata were tested using bootstrap methods. Regression calibration corrected estimates for measurement error. All measures demonstrated satisfactory internal consistency (α ≥ 0.70) and either moderate (ICC/κ=0.41-0.60) or substantial (ICC/κ=0.61-0.80) test-retest reliability in the full sample. Internal consistency did not differ by neighbourhood strata. Test-retest reliability was significantly lower among rural (vs urban) residents for two scales (social control, physical disorder) and two multidimensional neighbouring items; test-retest reliability was higher for physical disorder and lower for one multidimensional neighbouring item among the high (vs low) poverty strata. After measurement error correction, the magnitude of associations between neighbourhood conditions and self-rated health were larger, particularly in the rural population. Research is needed to develop and test reliable measures of perceived neighbourhood conditions relevant to the health of rural populations.
Screening fungicides for use in fish culture: Evaluation of the agar plug transfer, cellophane transfer, and agar dilution methods

USGS Publications Warehouse

Bailey, Tom A.

1983-01-01

The reliability, reproducibility, and usefulness of three screening methods -- the cellophane transfer, the agar plug transfer, and the agar dilution -- to screen aquatic fungicides were evaluated. Achlya flagellata and Saprolegnia hypogyna were exposed to 1, 10, and 100 mg/L of malachite green to test each method. The cellophane transfer and agar plug transfer techniques had similar reliability and reproducibility in rating fungicidal activity, and were both superior to the agar dilution technique. The agar plug transfer and agar dilution techniques adequately projected in vivo activity of malachite green, but the cellophane transfer technique overestimated its activity. Overall, the agar plug transfer technique most accurately rated the activity of malachite green and was the easiest test to perform. It therefore appears to be the method of choice for testing aquatic fungicides.
Lifetime prediction and reliability estimation methodology for Stirling-type pulse tube refrigerators by gaseous contamination accelerated degradation testing

NASA Astrophysics Data System (ADS)

Wan, Fubin; Tan, Yuanyuan; Jiang, Zhenhua; Chen, Xun; Wu, Yinong; Zhao, Peng

2017-12-01

Lifetime and reliability are the two performance parameters of premium importance for modern space Stirling-type pulse tube refrigerators (SPTRs), which are required to operate in excess of 10 years. Demonstration of these parameters provides a significant challenge. This paper proposes a lifetime prediction and reliability estimation method that utilizes accelerated degradation testing (ADT) for SPTRs related to gaseous contamination failure. The method was experimentally validated via three groups of gaseous contamination ADT. First, the performance degradation model based on mechanism of contamination failure and material outgassing characteristics of SPTRs was established. Next, a preliminary test was performed to determine whether the mechanism of contamination failure of the SPTRs during ADT is consistent with normal life testing. Subsequently, the experimental program of ADT was designed for SPTRs. Then, three groups of gaseous contamination ADT were performed at elevated ambient temperatures of 40 °C, 50 °C, and 60 °C, respectively and the estimated lifetimes of the SPTRs under normal condition were obtained through acceleration model (Arrhenius model). The results show good fitting of the degradation model with the experimental data. Finally, we obtained the reliability estimation of SPTRs through using the Weibull distribution. The proposed novel methodology enables us to take less than one year time to estimate the reliability of the SPTRs designed for more than 10 years.

Study samples are too small to produce sufficiently precise reliability coefficients.

PubMed

Charter, Richard A

2003-04-01

In a survey of journal articles, test manuals, and test critique books, the author found that a mean sample size (N) of 260 participants had been used for reliability studies on 742 tests. The distribution was skewed because the median sample size for the total sample was only 90. The median sample sizes for the internal consistency, retest, and interjudge reliabilities were 182, 64, and 36, respectively. The author presented sample size statistics for the various internal consistency methods and types of tests. In general, the author found that the sample sizes that were used in the internal consistency studies were too small to produce sufficiently precise reliability coefficients, which in turn could cause imprecise estimates of examinee true-score confidence intervals. The results also suggest that larger sample sizes have been used in the last decade compared with those that were used in earlier decades.
Method-independent, Computationally Frugal Convergence Testing for Sensitivity Analysis Techniques

NASA Astrophysics Data System (ADS)

Mai, J.; Tolson, B.

2017-12-01

The increasing complexity and runtime of environmental models lead to the current situation that the calibration of all model parameters or the estimation of all of their uncertainty is often computationally infeasible. Hence, techniques to determine the sensitivity of model parameters are used to identify most important parameters. All subsequent model calibrations or uncertainty estimation procedures focus then only on these subsets of parameters and are hence less computational demanding. While the examination of the convergence of calibration and uncertainty methods is state-of-the-art, the convergence of the sensitivity methods is usually not checked. If any, bootstrapping of the sensitivity results is used to determine the reliability of the estimated indexes. Bootstrapping, however, might as well become computationally expensive in case of large model outputs and a high number of bootstraps. We, therefore, present a Model Variable Augmentation (MVA) approach to check the convergence of sensitivity indexes without performing any additional model run. This technique is method- and model-independent. It can be applied either during the sensitivity analysis (SA) or afterwards. The latter case enables the checking of already processed sensitivity indexes. To demonstrate the method's independency of the convergence testing method, we applied it to two widely used, global SA methods: the screening method known as Morris method or Elementary Effects (Morris 1991) and the variance-based Sobol' method (Solbol' 1993). The new convergence testing method is first scrutinized using 12 analytical benchmark functions (Cuntz & Mai et al. 2015) where the true indexes of aforementioned three methods are known. This proof of principle shows that the method reliably determines the uncertainty of the SA results when different budgets are used for the SA. The results show that the new frugal method is able to test the convergence and therefore the reliability of SA results in an efficient way. The appealing feature of this new technique is the necessity of no further model evaluation and therefore enables checking of already processed sensitivity results. This is one step towards reliable and transferable, published sensitivity results.
Reliability Evaluation Method with Weibull Distribution for Temporary Overvoltages of Substation Equipment

NASA Astrophysics Data System (ADS)

Okabe, Shigemitsu; Tsuboi, Toshihiro; Takami, Jun

The power-frequency withstand voltage tests are regulated on electric power equipment in JEC by evaluating the lifetime reliability with a Weibull distribution function. The evaluation method is still controversial in terms of consideration of a plural number of faults and some alternative methods were proposed on this subject. The present paper first discusses the physical meanings of the various kinds of evaluating methods and secondly examines their effects on the power-frequency withstand voltage tests. Further, an appropriate method is investigated for an oil-filled transformer and a gas insulated switchgear with taking notice of dielectric breakdown or partial discharge mechanism under various insulating material and structure conditions and the tentative conclusion gives that the conventional method would be most pertinent under the present conditions.
Intrarater and interrater reliability of the Anteromedial Reach Test in healthy participants

PubMed Central

Bent, Nicholas P; Rushton, Alison B; Wright, Chris C; Petherick, Emma-Jane; Batt, Mark E

2014-01-01

Background The Anteromedial Reach Test is a performance-based outcome measure for evaluating dynamic knee stability in patients with anterior cruciate ligament injury. No previously published study has adequately evaluated intrarater or interrater reliability of the Anteromedial Reach Test, so the purpose of this study was to assess these measurement properties in healthy participants prior to their investigation in patients with anterior cruciate ligament injury. Methods Two raters (A and B) tested 39 healthy university staff and students (20 men, 19 women). For the intrarater reliability investigation, rater A tested participants on three separate test occasions (days 1, 2, and 3) at the same time of day. For the interrater reliability investigation, raters A and B independently tested participants on the same test occasion (day 3). Results There was no significant systematic bias between test occasions or raters. Values of the intraclass correlation coefficient (2,1) were 0.96 for intrarater reliability of both the dominant leg and nondominant leg and 0.97 (dominant leg) and 0.98 (nondominant leg) for interrater reliability. Values for the standard error of measurement were 1.46 (dominant leg) and 1.62 (nondominant leg) for the intrarater investigation, and 1.26 (dominant leg) and 1.04 (nondominant leg) for the interrater investigation. At the 90% confidence level, the minimum detectable change was 3.8% and the error in an individual’s score at a given point in time was ±2.7%. Conclusion The Anteromedial Reach Test demonstrated excellent intrarater and interrater reliability in healthy participants. This provides a basis for future investigation of the measurement properties of the Anteromedial Reach Test in patients with anterior cruciate ligament injury. PMID:24648776
Failure analysis of electronic parts: Laboratory methods. [for destructive and nondestructive testing

NASA Technical Reports Server (NTRS)

Anstead, R. J. (Editor); Goldberg, E. (Editor)

1975-01-01

Failure analysis test methods are presented for use in analyzing candidate electronic parts and in improving future design reliability. Each test is classified as nondestructive, semidestructive, or destructive. The effects upon applicable part types (i.e. integrated circuit, transitor) are discussed. Methodology is given for performing the following: immersion tests, radio graphic tests, dewpoint tests, gas ambient analysis, cross sectioning, and ultraviolet examination.
Inter- and intra-observer reliability of clinical movement-control tests for marines

PubMed Central

2012-01-01

Background Musculoskeletal disorders particularly in the back and lower extremities are common among marines. Here, movement-control tests are considered clinically useful for screening and follow-up evaluation. However, few studies have addressed the reliability of clinical tests, and no such published data exists for marines. The present aim was therefore to determine the inter- and intra-observer reliability of clinically convenient tests emphasizing movement control of the back and hip among marines. A secondary aim was to investigate the sensitivity and specificity of these clinical tests for discriminating musculoskeletal pain disorders in this group of military personnel. Methods This inter- and intra-observer reliability study used a test-retest approach with six standardized clinical tests focusing on movement control for back and hip. Thirty-three marines (age 28.7 yrs, SD 5.9) on active duty volunteered and were recruited. They followed an in-vivo observation test procedure that covered both low- and high-load (threshold) tasks relevant for marines on operational duty. Two independent observers simultaneously rated performance as “correct” or “incorrect” following a standardized assessment protocol. Re-testing followed 7–10 days thereafter. Reliability was analysed using kappa (κ) coefficients, while discriminative power of the best-fitting tests for back- and lower-extremity pain was assessed using a multiple-variable regression model. Results Inter-observer reliability for the six tests was moderate to almost perfect with κ-coefficients ranging between 0.56-0.95. Three tests reached almost perfect inter-observer reliability with mean κ-coefficients > 0.81. However, intra-observer reliability was fair-to-moderate with mean κ-coefficients between 0.22-0.58. Three tests achieved moderate intra-observer reliability with κ-coefficients > 0.41. Combinations of one low- and one high-threshold test best discriminated prior back pain, but results were inconsistent for lower-extremity pain. Conclusions Our results suggest that clinical tests of movement control of back and hip are reliable for use in screening protocols using several observers with marines. However, test-retest reproducibility was less accurate, which should be considered in follow-up evaluations. The results also indicate that combinations of low- and high-threshold tests have discriminative validity for prior back pain, but were inconclusive for lower-extremity pain. PMID:23273285
Study on the system-level test method of digital metering in smart substation

NASA Astrophysics Data System (ADS)

Zhang, Xiang; Yang, Min; Hu, Juan; Li, Fuchao; Luo, Ruixi; Li, Jinsong; Ai, Bing

2017-03-01

Nowadays, the test methods of digital metering system in smart substation are used to test and evaluate the performance of a single device, but these methods can only effectively guarantee the accuracy and reliability of the measurement results of a digital metering device in a single run, it does not completely reflect the performance when each device constitutes a complete system. This paper introduced the shortages of the existing test methods. A system-level test method of digital metering in smart substation was proposed, and the feasibility of the method was proved by the actual test.
[Reliability and validity of warning signs checklist for screening psychological, behavioral and developmental problems of children].

PubMed

Huang, X N; Zhang, Y; Feng, W W; Wang, H S; Cao, B; Zhang, B; Yang, Y F; Wang, H M; Zheng, Y; Jin, X M; Jia, M X; Zou, X B; Zhao, C X; Robert, J; Jing, Jin

2017-06-02

Objective: To evaluate the reliability and validity of warning signs checklist developed by the National Health and Family Planning Commission of the People's Republic of China (NHFPC), so as to determine the screening effectiveness of warning signs on developmental problems of early childhood. Method: Stratified random sampling method was used to assess the reliability and validity of checklist of warning sign and 2 110 children 0 to 6 years of age(1 513 low-risk subjects and 597 high-risk subjects) were recruited from 11 provinces of China. The reliability evaluation for the warning signs included the test-retest reliability and interrater reliability. With the use of Age and Stage Questionnaire (ASQ) and Gesell Development Diagnosis Scale (GESELL) as the criterion scales, criterion validity was assessed by determining the correlation and consistency between the screening results of warning signs and the criterion scales. Result: In terms of the warning signs, the screening positive rates at different ages ranged from 10.8%(21/141) to 26.2%(51/137). The median (interquartile) testing time for each subject was 1(0.6) minute. Both the test-retest reliability and interrater reliability of warning signs reached 0.7 or above, indicating that the stability was good. In terms of validity assessment, there was remarkable consistency between ASQ and warning signs, with the Kappa value of 0.63. With the use of GESELL as criterion, it was determined that the sensitivity of warning signs in children with suspected developmental delay was 82.2%, and the specificity was 77.7%. The overall Youden index was 0.6. Conclusion: The reliability and validity of warning signs checklist for screening early childhood developmental problems have met the basic requirements of psychological screening scales, with the characteristics of short testing time and easy operation. Thus, this warning signs checklist can be used for screening psychological and behavioral problems of early childhood, especially in community settings.
Reliability of the Cardiff Test of basic life support and automated external defibrillation version 3.1.

PubMed

Whitfield, Richard H; Newcombe, Robert G; Woollard, Malcolm

2003-12-01

The introduction of the European Resuscitation Guidelines (2000) for cardiopulmonary resuscitation (CPR) and automated external defibrillation (AED) prompted the development of an up-to-date and reliable method of assessing the quality of performance of CPR in combination with the use of an AED. The Cardiff Test of basic life support (BLS) and AED version 3.1 was developed to meet this need and uses standardised checklists to retrospectively evaluate performance from analyses of video recordings and data drawn from a laptop computer attached to a training manikin. This paper reports the inter- and intra-observer reliability of this test. Data used to assess reliability were obtained from an investigation of CPR and AED skill acquisition in a lay responder AED training programme. Six observers were recruited to evaluate performance in 33 data sets, repeating their evaluation after a minimum interval of 3 weeks. More than 70% of the 42 variables considered in this study had a kappa score of 0.70 or above for inter-observer reliability or were drawn from computer data and therefore not subject to evaluator variability. 85% of the 42 variables had kappa scores for intra-observer reliability of 0.70 or above or were drawn from computer data. The standard deviations for inter- and intra-observer measures of time to first shock were 11.6 and 7.7 s, respectively. The inter- and intra-observer reliability for the majority of the variables in the Cardiff Test of BLS and AED version 3.1 is satisfactory. However, reliability is less acceptable with respect to shaking when checking for responsiveness, initial check/clearing of the airway, checks for signs of circulation, time to first shock and performance of interventions in the correct sequence. Further research is required to determine if modifications to the method of assessing these variables can increase reliability.
NDE: An effective approach to improved reliability and safety. A technology survey. [nondestructive testing of aircraft structures

NASA Technical Reports Server (NTRS)

Carpenter, J. L., Jr.; Stuhrke, W. F.

1976-01-01

Technical abstracts are presented for about 100 significant documents relating to nondestructive testing of aircraft structures or related structural testing and the reliability of the more commonly used evaluation methods. Particular attention is directed toward acoustic emission; liquid penetrant; magnetic particle; ultrasonics; eddy current; and radiography. The introduction of the report includes an overview of the state-of-the-art represented in the documents that have been abstracted.
Statistical modeling of software reliability

NASA Technical Reports Server (NTRS)

Miller, Douglas R.

1992-01-01

This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.
Diagnosis of cystic fibrosis with chloride meter (Sherwood M926S chloride analyzer®) and sweat test analysis system (CFΔ collection system®) compared to the Gibson Cooke method.

PubMed

Emiralioğlu, Nagehan; Özçelik, Uğur; Yalçın, Ebru; Doğru, Deniz; Kiper, Nural

2016-01-01

Sweat test with Gibson Cooke (GC) method is the diagnostic gold standard for cystic fibrosis (CF). Recently, alternative methods have been introduced to simplify both the collection and analysis of sweat samples. Our aim was to compare sweat chloride values obtained by GC method with other sweat test methods in patients diagnosed with CF and whose CF diagnosis had been ruled out. We wanted to determine if the other sweat test methods could reliably identify patients with CF and differentiate them from healthy subjects. Chloride concentration was measured with GC method, chloride meter and sweat test analysis system; also conductivity was determined with sweat test analysis system. Forty eight patients with CF and 82 patients without CF underwent the sweat test, showing median sweat chloride values 98.9 mEq/L with GC method, 101 mmol/L with chloride meter, 87.8 mmol/L with sweat test analysis system. In non-CF group, median sweat chloride values were 16.8 mEq/L with GC method, 10.5 mmol/L with chloride meter, and 15.6 mmol/L with sweat test analysis system. Median conductivity value was 107.3 mmol/L in CF group and 32.1 mmol/L in non CF group. There was a strong positive correlation between GC method and the other sweat test methods with a statistical significance (r=0.85) in all subjects. Sweat chloride concentration and conductivity by other sweat test methods highly correlate with the GC method. We think that the other sweat test equipments can be used as reliably as the classic GC method to diagnose or exclude CF.
Time-Tagged Risk/Reliability Assessment Program for Development and Operation of Space System

NASA Astrophysics Data System (ADS)

Kubota, Yuki; Takegahara, Haruki; Aoyagi, Junichiro

We have investigated a new method of risk/reliability assessment for development and operation of space system. It is difficult to evaluate risk of spacecraft, because of long time operation, maintenance free and difficulty of test under the ground condition. Conventional methods are FMECA, FTA, ETA and miscellaneous. These are not enough to assess chronological anomaly and there is a problem to share information during R&D. A new method of risk and reliability assessment, T-TRAP (Time-tagged Risk/Reliability Assessment Program) is proposed as a management tool for the development and operation of space system. T-TRAP consisting of time-resolved Fault Tree and Criticality Analyses, upon occurrence of anomaly in the system, facilitates the responsible personnel to quickly identify the failure cause and decide corrective actions. This paper describes T-TRAP method and its availability.
EMG normalization method based on grade 3 of manual muscle testing: Within- and between-day reliability of normalization tasks and application to gait analysis.

PubMed

Tabard-Fougère, Anne; Rose-Dulcina, Kevin; Pittet, Vincent; Dayer, Romain; Vuillerme, Nicolas; Armand, Stéphane

2018-02-01

Electromyography (EMG) is an important parameter in Clinical Gait Analysis (CGA), and is generally interpreted with timing of activation. EMG amplitude comparisons between individuals, muscles or days need normalization. There is no consensus on existing methods. The gold standard, maximum voluntary isometric contraction (MVIC), is not adapted to pathological populations because patients are often unable to perform an MVIC. The normalization method inspired by the isometric grade 3 of manual muscle testing (isoMMT3), which is the ability of a muscle to maintain a position against gravity, could be an interesting alternative. The aim of this study was to evaluate the within- and between-day reliability of the isoMMT3 EMG normalizing method during gait compared with the conventional MVIC method. Lower limb muscles EMG (gluteus medius, rectus femoris, tibialis anterior, semitendinosus) were recorded bilaterally in nine healthy participants (five males, aged 29.7±6.2years, BMI 22.7±3.3kgm -2 ) giving a total of 18 independent legs. Three repeated measurements of the isoMMT3 and MVIC exercises were performed with an EMG recording. EMG amplitude of the muscles during gait was normalized by these two methods. This protocol was repeated one week later. Within- and between-day reliability of normalization tasks were similar for isoMMT3 and MVIC methods. Within- and between-day reliability of gait EMG normalized by isoMMT3 was higher than with MVIC normalization. These results indicate that EMG normalization using isoMMT3 is a reliable method with no special equipment needed and will support CGA interpretation. The next step will be to evaluate this method in pathological populations. Copyright © 2017 Elsevier B.V. All rights reserved.
Development of a clinical static and dynamic standing balance measurement tool appropriate for use in adolescents.

PubMed

Emery, Carolyn A; Cassidy, J David; Klassen, Terry P; Rosychuk, Rhonda J; Rowe, Brian B

2005-06-01

There is a need in sports medicine for a static and dynamic standing balance measure to quantify balance ability in adolescents. The purposes of this study were to determine the test-retest reliability of timed static (eyes open) and dynamic (eyes open and eyes closed) unipedal balance measurements and to examine factors associated with balance. Adolescents (n=123) were randomly selected from 10 Calgary high schools. This study used a repeated-measures design. One rater measured unipedal standing balance, including timed eyes-closed static (ECS), eyes-open dynamic (EOD), and eyes-closed dynamic (ECD) balance at baseline and 1 week later. Dynamic balance was measured on a foam surface. Reliability was examined using both intraclass correlation coefficients (ICCs) and Bland and Altman statistical techniques. Multiple linear regressions were used to examine other potentially influencing factors. Based on ICCs, test-retest reliability was adequate for ECS, EOD, and ECD balance (ICC=.69, .59, and .46, respectively). The results of Bland and Altman methods, however, suggest that caution is required in interpreting reliability based on ICCs alone. Although both ECS balance and ECD balance appear to demonstrate adequate test-retest reliability by ICC, Bland and Altman methods of agreement demonstrate sufficient reliability for ECD balance only. Thirty percent of the subjects reached the 180-second maximum on EOD balance, suggesting that this test is not appropriate for use in this population. Balance ability (ECS and ECD) was better in adolescents with no past history of lower-extremity injury. Timed ECD balance is an appropriate and reliable clinical measurement for use in adolescents and is influenced by previous injury.
Validity and Reliability of Published Comprehensive Theory of Mind Tests for Normal Preschool Children: A Systematic Review

PubMed Central

Ziatabar Ahmadi, Seyyede Zohreh; Jalaie, Shohreh; Ashayeri, Hassan

2015-01-01

Objective: Theory of mind (ToM) or mindreading is an aspect of social cognition that evaluates mental states and beliefs of oneself and others. Validity and reliability are very important criteria when evaluating standard tests; and without them, these tests are not usable. The aim of this study was to systematically review the validity and reliability of published English comprehensive ToM tests developed for normal preschool children. Method: We searched MEDLINE (PubMed interface), Web of Science, Science direct, PsycINFO, and also evidence base Medicine (The Cochrane Library) databases from 1990 to June 2015. Search strategy was Latin transcription of ‘Theory of Mind’ AND test AND children. Also, we manually studied the reference lists of all final searched articles and carried out a search of their references. Inclusion criteria were as follows: Valid and reliable diagnostic ToM tests published from 1990 to June 2015 for normal preschool children; and exclusion criteria were as follows: the studies that only used ToM tests and single tasks (false belief tasks) for ToM assessment and/or had no description about structure, validity or reliability of their tests. Methodological quality of the selected articles was assessed using the Critical Appraisal Skills Programme (CASP). Result: In primary searching, we found 1237 articles in total databases. After removing duplicates and applying all inclusion and exclusion criteria, we selected 11 tests for this systematic review. Conclusion: There were a few valid, reliable and comprehensive ToM tests for normal preschool children. However, we had limitations concerning the included articles. The defined ToM tests were different in populations, tasks, mode of presentations, scoring, mode of responses, times and other variables. Also, they had various validities and reliabilities. Therefore, it is recommended that the researchers and clinicians select the ToM tests according to their psychometric characteristics, validity and reliability. PMID:27006666
The Reliability of Pharyngeal High Resolution Manometry with Impedance for Derivation of Measures of Swallowing Function in Healthy Volunteers

PubMed Central

Omari, Taher I.; Savilampi, Johanna; Kokkinn, Karmen; Schar, Mistyka; Lamvik, Kristin; Doeltgen, Sebastian; Cock, Charles

2016-01-01

Purpose. We evaluated the intra- and interrater agreement and test-retest reliability of analyst derivation of swallow function variables based on repeated high resolution manometry with impedance measurements. Methods. Five subjects swallowed 10 × 10 mL saline on two occasions one week apart producing a database of 100 swallows. Swallows were repeat-analysed by six observers using software. Swallow variables were indicative of contractility, intrabolus pressure, and flow timing. Results. The average intraclass correlation coefficients (ICC) for intra- and interrater comparisons of all variable means showed substantial to excellent agreement (intrarater ICC 0.85–1.00; mean interrater ICC 0.77–1.00). Test-retest results were less reliable. ICC for test-retest comparisons ranged from slight to excellent depending on the class of variable. Contractility variables differed most in terms of test-retest reliability. Amongst contractility variables, UES basal pressure showed excellent test-retest agreement (mean ICC 0.94), measures of UES postrelaxation contractile pressure showed moderate to substantial test-retest agreement (mean Interrater ICC 0.47–0.67), and test-retest agreement of pharyngeal contractile pressure ranged from slight to substantial (mean Interrater ICC 0.15–0.61). Conclusions. Test-retest reliability of HRIM measures depends on the class of variable. Measures of bolus distension pressure and flow timing appear to be more test-retest reliable than measures of contractility. PMID:27190520
THE DYNAMIC LEAP AND BALANCE TEST (DLBT): A TEST-RETEST RELIABILITY STUDY

PubMed Central

Newman, Thomas M.; Smith, Brent I.; John Miller, Sayers

2017-01-01

Background There is a need for new clinical assessment tools to test dynamic balance during typical functional movements. Common methods for assessing dynamic balance, such as the Star Excursion Balance Test, which requires controlled movement of body segments over an unchanged base of support, may not be an adequate measure for testing typical functional movements that involve controlled movement of body segments along with a change in base of support. Purpose/hypothesis The purpose of this study was to determine the reliability of the Dynamic Leap and Balance Test (DLBT) by assessing its test-retest reliability. It was hypothesized that there would be no statistically significant differences between testing days in time taken to complete the test. Study Design Reliability study Methods Thirty healthy college aged individuals participated in this study. Participants performed a series of leaps in a prescribed sequence, unique to the DLBT test. Time required by the participants to complete the 20-leap task was the dependent variable. Subjects leaped back and forth from peripheral to central targets alternating weight bearing from one leg to the other. Participants landed on the central target with the tested limb and were required to stabilize for two seconds before leaping to the next target. Stability was based upon qualitative measures similar to Balance Error Scoring System. Each assessment was comprised of three trials and performed on two days with a separation of at least six days. Results Two-way mixed ANOVA was used to analyze the differences in time to complete the sequence between the three trial averages of the two testing sessions. Intraclass Correlation Coefficient (ICC3,1) was used to establish between session test-retest reliability of the test trial averages. Significance was set a priori at p ≤ 0.05. No significant differences (p > 0.05) were detected between the two testing sessions. The ICC was 0.93 with a 95% confidence interval from 0.84 to 0.96. Conclusion This test is a cost-effective, easy to administer and clinically relevant novel measure for assessing dynamic balance that has excellent test-retest reliability. Clinical relevance As a new measure of dynamic balance, the DLBT has the potential to be a cost-effective, challenging and functional tool for clinicians. Level of Evidence 2b PMID:28900556
Assessment of abdominal muscle function using the Biodex System-4. Validity and reliability in healthy volunteers and patients with giant ventral hernia.

PubMed

Gunnarsson, U; Johansson, M; Strigård, K

2011-08-01

The decrease in recurrence rates in ventral hernia surgery have led to a redirection of focus towards other important patient-related endpoints. One such endpoint is abdominal wall function. The aim of the present study was to evaluate the reliability and external validity of abdominal wall strength measurement using the Biodex System-4 with a back abdomen unit. Ten healthy volunteers and ten patients with ventral hernias exceeding 10 cm were recruited. Test-retest reliability, both with and without girdle, was evaluated by comparison of measurements at two test occasions 1 week apart. Reliability was calculated by the interclass correlation coefficients (ICC) method. Validity was evaluated by correlation with the well-established International Physical Activity Questionnaire (IPAQ) and a self-assessment of abdominal wall strength. One person in the healthy group was excluded after the first test due to neck problems following minor trauma. The reliability was excellent (>0.75), with ICC values between 0.92 and 0.97 for the different modalities tested. No differences were seen between testing with and without a girdle. Validity was also excellent both when calculated as correlation to self-assessment of abdominal wall strength, and to IPAQ, giving Kendall tau values of 0.51 and 0.47, respectively, and corresponding P values of 0.002 and 0.004. Measurement of abdominal muscle function using the Biodex System-4 is a reliable and valid method to assess this important patient-related endpoint. Further investigations will be made to explore the potential of this technique in the evaluation of the results of ventral hernia surgery, and to compare muscle function after different abdominal wall reconstruction techniques.
Validity and Reliability of the Turkish Chronic Pain Acceptance Questionnaire

PubMed Central

Akmaz, Hazel Ekin; Uyar, Meltem; Kuzeyli Yıldırım, Yasemin; Akın Korhan, Esra

2018-01-01

Background: Pain acceptance is the process of giving up the struggle with pain and learning to live a worthwhile life despite it. In assessing patients with chronic pain in Turkey, making a diagnosis and tracking the effectiveness of treatment is done with scales that have been translated into Turkish. However, there is as yet no valid and reliable scale in Turkish to assess the acceptance of pain. Aims: To validate a Turkish version of the Chronic Pain Acceptance Questionnaire developed by McCracken and colleagues. Study Design: Methodological and cross sectional study. Methods: A simple randomized sampling method was used in selecting the study sample. The sample was composed of 201 patients, more than 10 times the number of items examined for validity and reliability in the study, which totaled 20. A patient identification form, the Chronic Pain Acceptance Questionnaire, and the Brief Pain Inventory were used to collect data. Data were collected by face-to-face interviews. In the validity testing, the content validity index was used to evaluate linguistic equivalence, content validity, construct validity, and expert views. In reliability testing of the scale, Cronbach’s α coefficient was calculated, and item analysis and split-test reliability methods were used. Principal component analysis and varimax rotation were used in factor analysis and to examine factor structure for construct concept validity. Results: The item analysis established that the scale, all items, and item-total correlations were satisfactory. The mean total score of the scale was 21.78. The internal consistency coefficient was 0.94, and the correlation between the two halves of the scale was 0.89. Conclusion: The Chronic Pain Acceptance Questionnaire, which is intended to be used in Turkey upon confirmation of its validity and reliability, is an evaluation instrument with sufficient validity and reliability, and it can be reliably used to examine patients’ acceptance of chronic pain. PMID:29843496

SMART empirical approaches for predicting field performance of PV modules from results of reliability tests

NASA Astrophysics Data System (ADS)

Hardikar, Kedar Y.; Liu, Bill J. J.; Bheemreddy, Venkata

2016-09-01

Gaining an understanding of degradation mechanisms and their characterization are critical in developing relevant accelerated tests to ensure PV module performance warranty over a typical lifetime of 25 years. As newer technologies are adapted for PV, including new PV cell technologies, new packaging materials, and newer product designs, the availability of field data over extended periods of time for product performance assessment cannot be expected within the typical timeframe for business decisions. In this work, to enable product design decisions and product performance assessment for PV modules utilizing newer technologies, Simulation and Mechanism based Accelerated Reliability Testing (SMART) methodology and empirical approaches to predict field performance from accelerated test results are presented. The method is demonstrated for field life assessment of flexible PV modules based on degradation mechanisms observed in two accelerated tests, namely, Damp Heat and Thermal Cycling. The method is based on design of accelerated testing scheme with the intent to develop relevant acceleration factor models. The acceleration factor model is validated by extensive reliability testing under different conditions going beyond the established certification standards. Once the acceleration factor model is validated for the test matrix a modeling scheme is developed to predict field performance from results of accelerated testing for particular failure modes of interest. Further refinement of the model can continue as more field data becomes available. While the demonstration of the method in this work is for thin film flexible PV modules, the framework and methodology can be adapted to other PV products.
Reliability of Task-Based fMRI for Preoperative Planning: A Test-Retest Study in Brain Tumor Patients and Healthy Controls

PubMed Central

Morrison, Melanie A.; Churchill, Nathan W.; Cusimano, Michael D.; Schweizer, Tom A.; Das, Sunit; Graham, Simon J.

2016-01-01

Background Functional magnetic resonance imaging (fMRI) continues to develop as a clinical tool for patients with brain cancer, offering data that may directly influence surgical decisions. Unfortunately, routine integration of preoperative fMRI has been limited by concerns about reliability. Many pertinent studies have been undertaken involving healthy controls, but work involving brain tumor patients has been limited. To develop fMRI fully as a clinical tool, it will be critical to examine these reliability issues among patients with brain tumors. The present work is the first to extensively characterize differences in activation map quality between brain tumor patients and healthy controls, including the effects of tumor grade and the chosen behavioral testing paradigm on reliability outcomes. Method Test-retest data were collected for a group of low-grade (n = 6) and high-grade glioma (n = 6) patients, and for matched healthy controls (n = 12), who performed motor and language tasks during a single fMRI session. Reliability was characterized by the spatial overlap and displacement of brain activity clusters, BOLD signal stability, and the laterality index. Significance testing was performed to assess differences in reliability between the patients and controls, and low-grade and high-grade patients; as well as between different fMRI testing paradigms. Results There were few significant differences in fMRI reliability measures between patients and controls. Reliability was significantly lower when comparing high-grade tumor patients to controls, or to low-grade tumor patients. The motor task produced more reliable activation patterns than the language tasks, as did the rhyming task in comparison to the phonemic fluency task. Conclusion In low-grade glioma patients, fMRI data are as reliable as healthy control subjects. For high-grade glioma patients, further investigation is required to determine the underlying causes of reduced reliability. To maximize reliability outcomes, testing paradigms should be carefully selected to generate robust activation patterns. PMID:26894279
Methodology and technical requirements of the galectin-3 test for the preoperative characterization of thyroid nodules.

PubMed

Bartolazzi, Armando; Bellotti, Carlo; Sciacchitano, Salvatore

2012-01-01

In the last decade, the β-galactosyl binding protein galectin-3 has been the object of extensive molecular, structural, and functional studies aimed to clarify its biological role in cancer. Multicenter studies also contributed to discover the potential clinical value of galectin-3 expression analysis in distinguishing, preoperatively, benign from malignant thyroid nodules. As a consequence galectin-3 is receiving significant attention as tumor marker for thyroid cancer diagnosis, but some conflicting results mostly owing to methodological problems have been published. The possibility to apply preoperatively a reliable galectin-3 test method on fine needle aspiration biopsy (FNA)-derived thyroid cells represents an important achievement. When correctly applied, the method reduces consistently the gray area of thyroid FNA cytology, contributing to avoid unnecessary thyroid surgery. Although the efficacy and reliability of the galectin-3 test method have been extensively proved in several studies, its translation in the clinical setting requires well-standardized reagents and procedures. After a decade of experimental work on galectin-3-related basic and translational research projects, the major methodological problems that may potentially impair the diagnostic performance of galectin-3 immunotargeting are highlighted and discussed in detail. A standardized protocol for a reliable galectin-3 expression analysis is finally provided. The aim of this contribution is to improve the clinical management of patients with thyroid nodules, promoting the preoperative use of a reliable galectin-3 test method as ancillary technique to conventional thyroid FNA cytology. The final goal is to decrease unnecessary thyroid surgery and its related social costs.
Using Penelope to assess the correctness of NASA Ada software: A demonstration of formal methods as a counterpart to testing

NASA Technical Reports Server (NTRS)

Eichenlaub, Carl T.; Harper, C. Douglas; Hird, Geoffrey

1993-01-01

Life-critical applications warrant a higher level of software reliability than has yet been achieved. Since it is not certain that traditional methods alone can provide the required ultra reliability, new methods should be examined as supplements or replacements. This paper describes a mathematical counterpart to the traditional process of empirical testing. ORA's Penelope verification system is demonstrated as a tool for evaluating the correctness of Ada software. Grady Booch's Ada calendar utility package, obtained through NASA, was specified in the Larch/Ada language. Formal verification in the Penelope environment established that many of the package's subprograms met their specifications. In other subprograms, failed attempts at verification revealed several errors that had escaped detection by testing.
Test-retest reliability of lower limb isokinetic endurance in COPD: A comparison of angular velocities

PubMed Central

Ribeiro, Fernanda; Lépine, Pierre-Alexis; Garceau-Bolduc, Corine; Coats, Valérie; Allard, Étienne; Maltais, François; Saey, Didier

2015-01-01

Background The purpose of this study was to determine and compare the test-retest reliability of quadriceps isokinetic endurance testing at two knee angular velocities in patients with chronic obstructive pulmonary disease (COPD). Methods After one familiarization session, 14 patients with moderate to severe COPD (mean age 65±4 years; forced expiratory volume in 1 second (FEV1) 55%±18% predicted) performed two quadriceps isokinetic endurance tests on two separate occasions within a 5–7-day interval. Quadriceps isokinetic endurance tests consisted of 30 maximal knee extensions at angular velocities of 90° and 180° per second, performed in random order. Test-retest reliability was assessed for peak torque, muscle endurance, work slope, work fatigue index, and changes in FEV1 for dyspnea and leg fatigue from rest to the end of the test. The intraclass correlation coefficient, minimal detectable change, and limits of agreement were calculated. Results High test-retest reliability was identified for peak torque and muscle total work at both velocities. Work fatigue index was considered reliable at 90° per second but not at 180° per second. A lower reliability was identified for dyspnea and leg fatigue scores at both angular velocities. Conclusion Despite a limited sample size, our findings support the use of a 30-maximal repetition isokinetic muscle testing procedure at angular velocities of 90° and 180° per second in patients with moderate to severe COPD. Endurance measurement (total isokinetic work) at 90° per second was highly reliable, with a minimal detectable change at the 95% confidence level of 10%. Peak torque and fatigue index could also be assessed reliably at 90° per second. Evaluation of dyspnea and leg fatigue using the modified Borg scale of perceived exertion was poorly reliable and its clinical usefulness is questionable. These results should be useful in the design and interpretation of future interventions aimed at improving muscle endurance in COPD. PMID:26124656
Effect of a patient training video on visual field test reliability

PubMed Central

Sherafat, H; Spry, P G D; Waldock, A; Sparrow, J M; Diamond, J P

2003-01-01

Aims: To evaluate the effect of a visual field test educational video on the reliability of the first automated visual field test of new patients. Methods: A prospective, randomised, controlled trial of an educational video on visual field test reliability of patients referred to the hospital eye service for suspected glaucoma was undertaken. Patients were randomised to either watch an educational video or a control group with no video. The video group was shown a 4.5 minute audiovisual presentation to familiarise them with the various aspects of visual field examination with particular emphasis on sources of unreliability. Reliability was determined using standard criteria of fixation loss rate less than 20%, false positive responses less than 33%, and false negative responses less than 33%. Results: 244 patients were recruited; 112 in the video group and 132 in the control group with no significant between group difference in age, sex, and density of field defects. A significant improvement in reliability (p=0.015) was observed in the group exposed to the video with 85 (75.9%) patients having reliable results compared to 81 (61.4%) in the control group. The difference was not significant for the right (first tested) eye with 93 (83.0%) of the visual fields reliable in the video group compared to 106 (80.0%) in the control group (p = 0.583), but was significant for the left (second tested) eye with 97 (86.6 %) of the video group reliable versus 97 (73.5%) of the control group (p = 0.011). Conclusions: The use of a brief, audiovisual patient information guide on taking the visual field test produced an improvement in patient reliability for individuals tested for the first time. In this trial the use of the video had most of its impact by reducing the number of unreliable fields from the second tested eye. PMID:12543740
Reliability analysis of the objective structured clinical examination using generalizability theory.

PubMed

Trejo-Mejía, Juan Andrés; Sánchez-Mendiola, Melchor; Méndez-Ramírez, Ignacio; Martínez-González, Adrián

2016-01-01

Background The objective structured clinical examination (OSCE) is a widely used method for assessing clinical competence in health sciences education. Studies using this method have shown evidence of validity and reliability. There are no published studies of OSCE reliability measurement with generalizability theory (G-theory) in Latin America. The aims of this study were to assess the reliability of an OSCE in medical students using G-theory and explore its usefulness for quality improvement. Methods An observational cross-sectional study was conducted at National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City. A total of 278 fifth-year medical students were assessed with an 18-station OSCE in a summative end-of-career final examination. There were four exam versions. G-theory with a crossover random effects design was used to identify the main sources of variance. Examiners, standardized patients, and cases were considered as a single facet of analysis. Results The exam was applied to 278 medical students. The OSCE had a generalizability coefficient of 0.93. The major components of variance were stations, students, and residual error. The sites and the versions of the tests had minimum variance. Conclusions Our study achieved a G coefficient similar to that found in other reports, which is acceptable for summative tests. G-theory allows the estimation of the magnitude of multiple sources of error and helps decision makers to determine the number of stations, test versions, and examiners needed to obtain reliable measurements.
Validity and reliability of the session-RPE method for quantifying training in Australian football: a comparison of the CR10 and CR100 scales.

PubMed

Scott, Tannath J; Black, Cameron R; Quinn, John; Coutts, Aaron J

2013-01-01

The purpose of this study was to examine and compare the criterion validity and test-retest reliability of the CR10 and CR100 rating of perceived exertion (RPE) scales for team sport athletes that undertake high-intensity, intermittent exercise. Twenty-one male Australian football (AF) players (age: 19.0 ± 1.8 years, body mass: 83.92 ± 7.88 kg) participated the first part (part A) of this study, which examined the construct validity of the session-RPE (sRPE) method for quantifying training load in AF. Ten male athletes (age: 16.1 ± 0.5 years) participated in the second part of the study (part B), which compared the test-retest reliability of the CR10 and CR100 RPE scales. In part A, the validity of the sRPE method was assessed by examining the relationships between sRPE, and objective measures of internal (i.e., heart rate) and external training load (i.e., distance traveled), collected from AF training sessions. Part B of the study assessed the reliability of sRPE through examining the test-retest reliability of sRPE during 3 different intensities of controlled intermittent running (10, 11.5, and 13 km·h(-1)). Results from part A demonstrated strong correlations for CR10- and CR100-derived sRPE with measures of internal training load (Banisters TRIMP and Edwards TRIMP) (CR10: r = 0.83 and 0.83, and CR100: r = 0.80 and 0.81, p < 0.05). Correlations between sRPE and external training load (distance, higher speed running and player load) for both the CR10 (r = 0.81, 0.71, and 0.83) and CR100 (r = 0.78, 0.69, and 0.80) were significant (p < 0.05). Results from part B demonstrated poor reliability for both the CR10 (31.9% CV) and CR100 (38.6% CV) RPE scales after short bouts of intermittent running. Collectively, these results suggest both CR10- and CR100-derived sRPE methods have good construct validity for assessing training load in AF. The poor levels of reliability revealed under field testing indicate that the sRPE method may not be sensible to detecting small changes in exercise intensity during brief intermittent running bouts. Despite this limitation, the sRPE remains a valid method to quantify training loads in high-intensity, intermittent team sport.
Measurement of Latent Variables with Different Rating Scales: Testing Reliability and Measurement Equivalence by Varying the Verbalization and Number of Categories

ERIC Educational Resources Information Center

Menold, Natalja; Tausch, Anja

2016-01-01

Effects of rating scale forms on cross-sectional reliability and measurement equivalence were investigated. A randomized experimental design was implemented, varying category labels and number of categories. The participants were 800 students at two German universities. In contrast to previous research, reliability assessment method was used,…
Examination of the Test-Retest Reliability of a Computerized Neurocognitive Test Battery.

PubMed

Nakayama, Yusuke; Covassin, Tracey; Schatz, Philip; Nogle, Sally; Kovan, Jeff

2014-08-01

Test-retest reliability is a critical issue in the utility of computer-based neurocognitive assessment paradigms employing baseline and postconcussion tests. Researchers have reported low test-retest reliability for the Immediate Post Concussion Assessment and Cognitive Testing (ImPACT) across an interval of 45 and 50 days. To re-examine the test-retest reliability of the ImPACT between baseline, 45 days, and 50 days. Descriptive laboratory study. Eighty-five physically active college students (51 male, 34 female) volunteered for this study. Participants completed the ImPACT as well as a 15-item memory test at baseline, 45 days, and 50 days. Intraclass correlation coefficients (ICCs) were calculated for ImPACT composite scores, and change scores were calculated using reliable change indices (RCIs) and regression-based methods (RBMs) at 80% and 95% confidence intervals (CIs). The respective ICCs for baseline to day 45, day 45 to day 50, baseline to day 50, and overall were as follows: verbal memory (0.76, 0.69, 0.65, and 0.78), visual memory (0.72, 0.66, 0.60, and 0.74), visual motor (processing) speed (0.87, 0.88, 0.85, and 0.91), and reaction time (0.67, 0.81, 0.71, and 0.80). All ICCs exceeded the threshold value of 0.60 for acceptable test-retest reliability. All cases fell well within the 80% CI for both the RCI and RBM, while 1% to 5% of cases fell outside the 95% CI for the RCI and 1% for the RBM. Results suggest that the ImPACT is a reliable neurocognitive test battery at 45 and 50 days after the baseline assessment. The current findings agree with those of other reliability studies that have reported acceptable ICCs across 30-day to 1-year testing intervals, and they support the utility of the ImPACT for the multidisciplinary approach to concussion management. This study suggests that the computerized neurocognitive test battery, ImPACT, is a reliable test for postconcussion serial assessments. However, when managing concussed athletes, the ImPACT should not be used as a stand-alone measure. © 2014 The Author(s).
Reliability analysis of component of affination centrifugal 1 machine by using reliability engineering

NASA Astrophysics Data System (ADS)

Sembiring, N.; Ginting, E.; Darnello, T.

2017-12-01

Problems that appear in a company that produces refined sugar, the production floor has not reached the level of critical machine availability because it often suffered damage (breakdown). This results in a sudden loss of production time and production opportunities. This problem can be solved by Reliability Engineering method where the statistical approach to historical damage data is performed to see the pattern of the distribution. The method can provide a value of reliability, rate of damage, and availability level, of an machine during the maintenance time interval schedule. The result of distribution test to time inter-damage data (MTTF) flexible hose component is lognormal distribution while component of teflon cone lifthing is weibull distribution. While from distribution test to mean time of improvement (MTTR) flexible hose component is exponential distribution while component of teflon cone lifthing is weibull distribution. The actual results of the flexible hose component on the replacement schedule per 720 hours obtained reliability of 0.2451 and availability 0.9960. While on the critical components of teflon cone lifthing actual on the replacement schedule per 1944 hours obtained reliability of 0.4083 and availability 0.9927.
Intra and Inter-Rater Reliability of Screening for Movement Impairments: Movement Control Tests from The Foundation Matrix

PubMed Central

Mischiati, Carolina R.; Comerford, Mark; Gosford, Emma; Swart, Jacqueline; Ewings, Sean; Botha, Nadine; Stokes, Maria; Mottram, Sarah L.

2015-01-01

Pre-season screening is well established within the sporting arena, and aims to enhance performance and reduce injury risk. With the increasing need to identify potential injury with greater accuracy, a new risk assessment process has been produced; The Performance Matrix (battery of movement control tests). As with any new method of objective testing, it is fundamental to establish whether the same results can be reproduced between examiners and by the same examiner on consecutive occasions. This study aimed to determine the intra-rater test re-test and inter-rater reliability of tests from a component of The Performance Matrix, The Foundation Matrix. Twenty participants were screened by two experienced musculoskeletal therapists using nine tests to assess the ability to control movement during specific tasks. Movement evaluation criteria for each test were rated as pass or fail. The therapists observed participants real-time and tests were recorded on video to enable repeated ratings four months later to examine intra-rater reliability (videos rated two weeks apart). Overall test percentage agreement was 87% for inter-rater reliability; 98% Rater 1, 94% Rater 2 for test re-test reliability; and 75% for real-time versus video. Intraclass-correlation coefficients (ICCs) were excellent between raters (0.81) and within raters (Rater 1, 0.96; Rater 2, 0.88) but poor for real-time versus video (0.23). Reliability for individual components of each test was more variable: inter-rater, 68-100%; intra-rater, 88-100% Rater 1, 75-100% Rater 2; and real-time versus video 31-100%. Cohen’s Kappa values for inter-rater reliability were 0.0-1.0; intra-rater 0.6-1.0 for Rater 1; -0.1-1.0 for Rater 2; and -0.1-1 for real-time versus video. It is concluded that both inter and intra-rater reliability of tests in The Foundation Matrix are acceptable when rated by experienced therapists. Recommendations are made for modifying some of the criteria to improve reliability where excellence was not reached. Key points The movement control tests of The Foundation Matrix had acceptable reliability between raters and within raters on different days Agreement between observations made on tests performed real-time and on video recordings was low, indicating poor validity of use of video recordings Some movement evaluation criteria related to specific tests that did not achieve excellent agreement could be modified to improve reliability PMID:25983594
Clinical Usefulness of the Pendulum Test Using a NK Table to Measure the Spasticity of Patients with Brain Lesions

PubMed Central

Kim, Yong-Wook

2013-01-01

. [Purpose] The purpose of the present study was to investigate the clinical usefulness (reliability and validity) of the pendulum test using a Noland-Kuckhoff (NK) table with an attached electrogoniometer to measure the spasticity of patients with brain lesions. [Subjects] The subjects were 31 patients with stroke or traumatic brain injury. [Methods] The intraclass correlation coefficient (ICC) was used to verify the test–retest reliability of spasticity measures obtained using the pendulum test. Pearson's product correlation coefficient was used to examine the validity of the pendulum test using the amplitude of the patellar tendon reflex (PTR) test, an objective and quantitative measure of spasticity. [Results] The test–retest reliability was high, reflecting a significant correlation between the test and the retest (ICCs = 0.95–0.97). A significant negative correlation was found between the amplitude of the PTR test and the four variables measured in the pendulum test (r = −0.77– −0.85). [Conclusion] The pendulum test using a NK table is an objective measure of spasticity and can be used in the clinical setting in place of more expensive and complicated equipment. Further studies are needed to investigate the therapeutic effect of this method on spasticity. PMID:24259775
Reliability-based econometrics of aerospace structural systems: Design criteria and test options. Ph.D. Thesis - Georgia Inst. of Tech.

NASA Technical Reports Server (NTRS)

Thomas, J. M.; Hanagud, S.

1974-01-01

The design criteria and test options for aerospace structural reliability were investigated. A decision methodology was developed for selecting a combination of structural tests and structural design factors. The decision method involves the use of Bayesian statistics and statistical decision theory. Procedures are discussed for obtaining and updating data-based probabilistic strength distributions for aerospace structures when test information is available and for obtaining subjective distributions when data are not available. The techniques used in developing the distributions are explained.
Practical Issues in Implementing Software Reliability Measurement

NASA Technical Reports Server (NTRS)

Nikora, Allen P.; Schneidewind, Norman F.; Everett, William W.; Munson, John C.; Vouk, Mladen A.; Musa, John D.

1999-01-01

Many ways of estimating software systems' reliability, or reliability-related quantities, have been developed over the past several years. Of particular interest are methods that can be used to estimate a software system's fault content prior to test, or to discriminate between components that are fault-prone and those that are not. The results of these methods can be used to: 1) More accurately focus scarce fault identification resources on those portions of a software system most in need of it. 2) Estimate and forecast the risk of exposure to residual faults in a software system during operation, and develop risk and safety criteria to guide the release of a software system to fielded use. 3) Estimate the efficiency of test suites in detecting residual faults. 4) Estimate the stability of the software maintenance process.
Automation of diagnostic genetic testing: mutation detection by cyclic minisequencing.

PubMed

Alagrund, Katariina; Orpana, Arto K

2014-01-01

The rising role of nucleic acid testing in clinical decision making is creating a need for efficient and automated diagnostic nucleic acid test platforms. Clinical use of nucleic acid testing sets demands for shorter turnaround times (TATs), lower production costs and robust, reliable methods that can easily adopt new test panels and is able to run rare tests in random access principle. Here we present a novel home-brew laboratory automation platform for diagnostic mutation testing. This platform is based on the cyclic minisequecing (cMS) and two color near-infrared (NIR) detection. Pipetting is automated using Tecan Freedom EVO pipetting robots and all assays are performed in 384-well micro plate format. The automation platform includes a data processing system, controlling all procedures, and automated patient result reporting to the hospital information system. We have found automated cMS a reliable, inexpensive and robust method for nucleic acid testing for a wide variety of diagnostic tests. The platform is currently in clinical use for over 80 mutations or polymorphisms. Additionally to tests performed from blood samples, the system performs also epigenetic test for the methylation of the MGMT gene promoter, and companion diagnostic tests for analysis of KRAS and BRAF gene mutations from formalin fixed and paraffin embedded tumor samples. Automation of genetic test reporting is found reliable and efficient decreasing the work load of academic personnel.
The intra- and inter-observer reliability of the physical examination methods used to assess patients with patellofemoral joint instability.

PubMed

Smith, Toby O; Clark, Allan; Neda, Sophia; Arendt, Elizabeth A; Post, William R; Grelsamer, Ronald P; Dejour, David; Almqvist, Karl Fredrik; Donell, Simon T

2012-08-01

An accurate physical examination of patients with patellar instability is an important aspect of the diagnosis and treatment. While previous studies have assessed the diagnostic accuracy of such physical examination tests, little has been undertaken to assess the inter- and intra-tester reliability of such techniques. The purpose of this study was to determine the inter- and intra-tester reliability of the physical examination tests used for patients with patellar instability. Five patients (10 knees) with bilateral recurrent patellar instability were assessed by five members of the International Patellofemoral Study Group. Each surgeon assessed each patient twice using 18 reported physical examination tests. The inter- and intra-observer reliability was assessed using weighted Kappa statistics with 95% confidence intervals. The findings of the study suggested that there were very poor inter-observer reliability for the majority of the physical tests, with only the assessments of patellofemoral crepitus, foot arch position and the J-sign presenting with fair to moderate agreement respectively. The intra-observer reliability indicated largely moderate to substantial agreement between the first and second tests performed by each assessor, with the greatest agreement seen for the assessment of tibial torsion, popliteal angle and the Bassett's sign. For the common physical examination tests used in the management of patients with patellar instability inter-observer reliability is poor, while intra-observer reliability is moderate. Standardization of physical exam assessments and further study of these results among different clinicians and more divergent patient groups is indicated. Copyright © 2011 Elsevier B.V. All rights reserved.
The Queensland high risk foot form (QHRFF) – is it a reliable and valid clinical research tool for foot disease?

PubMed Central

2014-01-01

Background Foot disease complications, such as foot ulcers and infection, contribute to considerable morbidity and mortality. These complications are typically precipitated by “high-risk factors”, such as peripheral neuropathy and peripheral arterial disease. High-risk factors are more prevalent in specific “at risk” populations such as diabetes, kidney disease and cardiovascular disease. To the best of the authors’ knowledge a tool capturing multiple high-risk factors and foot disease complications in multiple at risk populations has yet to be tested. This study aimed to develop and test the validity and reliability of a Queensland High Risk Foot Form (QHRFF) tool. Methods The study was conducted in two phases. Phase one developed a QHRFF using an existing diabetes foot disease tool, literature searches, stakeholder groups and expert panel. Phase two tested the QHRFF for validity and reliability. Four clinicians, representing different levels of expertise, were recruited to test validity and reliability. Three cohorts of patients were recruited; one tested criterion measure reliability (n = 32), another tested criterion validity and inter-rater reliability (n = 43), and another tested intra-rater reliability (n = 19). Validity was determined using sensitivity, specificity and positive predictive values (PPV). Reliability was determined using Kappa, weighted Kappa and intra-class correlation (ICC) statistics. Results A QHRFF tool containing 46 items across seven domains was developed. Criterion measure reliability of at least moderate categories of agreement (Kappa > 0.4; ICC > 0.75) was seen in 91% (29 of 32) tested items. Criterion validity of at least moderate categories (PPV > 0.7) was seen in 83% (60 of 72) tested items. Inter- and intra-rater reliability of at least moderate categories (Kappa > 0.4; ICC > 0.75) was seen in 88% (84 of 96) and 87% (20 of 23) tested items respectively. Conclusions The QHRFF had acceptable validity and reliability across the majority of items; particularly items identifying relevant co-morbidities, high-risk factors and foot disease complications. Recommendations have been made to improve or remove identified weaker items for future QHRFF versions. Overall, the QHRFF possesses suitable practicality, validity and reliability to assess and capture relevant foot disease items across multiple at risk populations. PMID:24468080
Measuring the Environment for Friendliness Toward Physical Activity: A Comparison of the Reliability of 3 Questionnaires

PubMed Central

Brownson, Ross C.; Chang, Jen Jen; Eyler, Amy A.; Ainsworth, Barbara E.; Kirtland, Karen A.; Saelens, Brian E.; Sallis, James F.

2004-01-01

Objectives. We tested the reliability of 3 instruments that assessed social and physical environments. Methods. We conducted a test–retest study among US adults (n = 289). We used telephone survey methods to measure suitableness of the perceived (vs objective) environment for recreational physical activity and nonmotorized transportation. Results. Most questions in our surveys that attempted to measure specific characteristics of the built environment showed moderate to high reliability. Questions about the social environment showed lower reliability than those that assessed the physical environment. Certain blocks of questions appeared to be selectively more reliable for urban or rural respondents. Conclusions. Despite differences in content and in response formats, all 3 surveys showed evidence of reliability, and most items are now ready for use in research and in public health surveillance. PMID:14998817
Examinations of electron temperature calculation methods in Thomson scattering diagnostics.

PubMed

Oh, Seungtae; Lee, Jong Ha; Wi, Hanmin

2012-10-01

Electron temperature from Thomson scattering diagnostic is derived through indirect calculation based on theoretical model. χ-square test is commonly used in the calculation, and the reliability of the calculation method highly depends on the noise level of input signals. In the simulations, noise effects of the χ-square test are examined and scale factor test is proposed as an alternative method.

[Development of a proverb test for assessment of concrete thinking problems in schizophrenic patients].

PubMed

Barth, A; Küfferle, B

2001-11-01

Concretism is considered an important aspect of schizophrenic thought disorder. Traditionally it is measured using the method of proverb interpretation, in which metaphoric proverbs are presented with the request that the subject tell its meaning. Interpretations are recorded and scored on concretistic tendencies. However, this method has two problems: its reliability is doubtful and it is rather complicated to perform. In this paper, a new version of a multiple choice proverb test is presented which can solve these problems in a reliable and economic manner. Using the new test, it is has been shown that schizophrenic patients have greater deficits in proverb interpretation than depressive patients.
COMPARISON OF DIFFERENT TRUNK ENDURANCE TESTING METHODS IN COLLEGE‐AGED INDIVIDUALS

PubMed Central

Krier, Amber D.; Nelson, Julie A.; Rogers, Michael A.; Stuke, Zachariah O.; Smith, Barbara S.

2012-01-01

Objective: Determine the reliability of two different modified (MOD1 and MOD2) testing methods compared to a standard method (ST) for testing trunk flexion and extension endurance. Participants: Twenty‐eight healthy individuals (age 26.4 ± 3.2 years, height 1.75 ± m, weight 71.8 ± 10.3 kg, body mass index 23.6 ± 3.4 m/kg2). Method: Trunk endurance time was measured in seconds for flexion and extension under the three different stabilization conditions. The MOD1 testing procedure utilized a female clinician (70.3 kg) and MOD2 utilized a male clinician (90.7 kg) to provide stabilization as opposed to the ST method of belt stabilization. Results: No significant differences occurred between flexion and extension times. Intraclass correlations (ICCs3,1) for the different testing conditions ranged from .79 to .95 (p <.000) and are found in Table 3. Concurrent validity using the ST flexion times as the gold standard coefficients were .95 for MOD1 and .90 for MOD2. For ST extension, coefficients were .91 and .80, for MOD1 and MOD2 respectively (p <.01). Conclusions: These methods proved to be a reliable substitute for previously accepted ST testing methods in normal college‐aged individuals. These modified testing procedures can be implemented in athletic training rooms and weight rooms lacking appropriate tables for the ST testing. Level of Evidence: 3 PMID:23091786
Evaluation of Explosive Strength for Young and Adult Athletes

ERIC Educational Resources Information Center

Viitasalo, Jukka T.

1988-01-01

The reliability of new electrical measurements of vertical jumping height and of throwing velocity was tested. These results were compared to traditional measurement techniques. The new method was found to give reliable results from children to adults. Methodology is discussed. (Author/JL)
Clinical assessment of effusion in knee osteoarthritis—A systematic review

PubMed Central

Maricar, Nasimah; Callaghan, Michael J.; Parkes, Matthew J.; Felson, David T.; O׳Neill, Terence W.

2016-01-01

Objective The aim of this systematic review was to determine the validity and inter- and intra-observer reliability of the assessment of knee joint effusion in osteoarthritis (OA) of the knee. Methods MEDLINE, Web of Knowledge, CINAHL, EMBASE, and AMED were searched from their inception to February 2015. Articles were included according to a priori defined criteria: samples containing participants with knee OA; prospective evaluation of clinical tests and assessments of knee effusion that included reliability, sensitivity, and specificity of these tests. Results A total of 10 publications were reviewed. Eight of these considered reliability and four on validity of clinical assessments against ultrasound effusion. It was not possible to undertake a meta-analysis of reliability or validity because of differences in study designs and the clinical tests. Intra-observer kappa agreement for visible swelling ranged from 0.37 (suprapatellar) to 1.0 (prepatellar); for bulge sign 0.47 and balloon sign 0.37. Inter-observer kappa agreement for visible swelling ranged from −0.02 (prepatellar) to 0.65 (infrapatellar), the balloon sign −0.11 to 0.82, patellar tap −0.02 to 0.75 and bulge sign kappa −0.04 to 0.14 or reliability coefficient 0.97. Reliability and diagnostic accuracy tended to be better in experienced observers. Very few data looked at performance of individual clinical tests with sensitivity ranging 18.2–85.7% and specificity 35.3–93.3%, both higher with larger effusions. Conclusion The majority of unstandardized clinical tests to assess joint effusion in knee OA had relatively low intra- and inter-observer reliability. There is some evidence experience improved reliability and diagnostic accuracy of tests. Currently there is insufficient evidence to recommend any particular test in clinical practice. PMID:26581486
Test-retest reliability and construct validity of the ENERGY-child questionnaire on energy balance-related behaviours and their potential determinants: the ENERGY-project

PubMed Central

2011-01-01

Background Insight in children's energy balance-related behaviours (EBRBs) and their determinants is important to inform obesity prevention research. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. Objective To examine the test-retest reliability and construct validity of the child questionnaire used in the ENERGY-project, measuring EBRBs and their potential determinants among 10-12 year old children. Methods We collected data among 10-12 year old children (n = 730 in the test-retest reliability study; n = 96 in the construct validity study) in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent face-to-face interview was assessed using ICC and percentage agreement. Results Of the 150 questionnaire items, 115 (77%) showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Test-retest reliability was moderate for 34 items (23%) and poor for one item. Construct validity appeared to be good to excellent for 70 (47%) of the 150 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 80 items, construct validity was moderate for 39 (26%) and poor for 41 items (27%). Conclusions Our results demonstrate that the ENERGY-child questionnaire, assessing EBRBs of the child as well as personal, family, and school-environmental determinants related to these EBRBs, has good test-retest reliability and moderate to good construct validity for the large majority of items. PMID:22152048
Patient simulation: a literary synthesis of assessment tools in anesthesiology.

PubMed

Edler, Alice A; Fanning, Ruth G; Chen, Michael I; Claure, Rebecca; Almazan, Dondee; Struyk, Brain; Seiden, Samuel C

2009-12-20

High-fidelity patient simulation (HFPS) has been hypothesized as a modality for assessing competency of knowledge and skill in patient simulation, but uniform methods for HFPS performance assessment (PA) have not yet been completely achieved. Anesthesiology as a field founded the HFPS discipline and also leads in its PA. This project reviews the types, quality, and designated purpose of HFPS PA tools in anesthesiology. We used the systematic review method and systematically reviewed anesthesiology literature referenced in PubMed to assess the quality and reliability of available PA tools in HFPS. Of 412 articles identified, 50 met our inclusion criteria. Seventy seven percent of studies have been published since 2000; more recent studies demonstrated higher quality. Investigators reported a variety of test construction and validation methods. The most commonly reported test construction methods included "modified Delphi Techniques" for item selection, reliability measurement using inter-rater agreement, and intra-class correlations between test items or subtests. Modern test theory, in particular generalizability theory, was used in nine (18%) of studies. Test score validity has been addressed in multiple investigations and shown a significant improvement in reporting accuracy. However the assessment of predicative has been low across the majority of studies. Usability and practicality of testing occasions and tools was only anecdotally reported. To more completely comply with the gold standards for PA design, both shared experience of experts and recognition of test construction standards, including reliability and validity measurements, instrument piloting, rater training, and explicit identification of the purpose and proposed use of the assessment tool, are required.
Test-retest reliability and comparability of paper and computer questionnaires for the Finnish version of the Tampa Scale of Kinesiophobia.

PubMed

Koho, P; Aho, S; Kautiainen, H; Pohjolainen, T; Hurri, H

2014-12-01

To estimate the internal consistency, test-retest reliability and comparability of paper and computer versions of the Finnish version of the Tampa Scale of Kinesiophobia (TSK-FIN) among patients with chronic pain. In addition, patients' personal experiences of completing both versions of the TSK-FIN and preferences between these two methods of data collection were studied. Test-retest reliability study. Paper and computer versions of the TSK-FIN were completed twice on two consecutive days. The sample comprised 94 consecutive patients with chronic musculoskeletal pain participating in a pain management or individual rehabilitation programme. The group rehabilitation design consisted of physical and functional exercises, evaluation of the social situation, psychological assessment of pain-related stress factors, and personal pain management training in order to regain overall function and mitigate the inconvenience of pain and fear-avoidance behaviour. The mean TSK-FIN score was 37.1 [standard deviation (SD) 8.1] for the computer version and 35.3 (SD 7.9) for the paper version. The mean difference between the two versions was 1.9 (95% confidence interval 0.8 to 2.9). Test-retest reliability was 0.89 for the paper version and 0.88 for the computer version. Internal consistency was considered to be good for both versions. The intraclass correlation coefficient for comparability was 0.77 (95% confidence interval 0.66 to 0.85), indicating substantial reliability between the two methods. Both versions of the TSK-FIN demonstrated substantial intertest reliability, good test-retest reliability, good internal consistency and acceptable limits of agreement, suggesting their suitability for clinical use. However, subjects tended to score higher when using the computer version. As such, in an ideal situation, data should be collected in a similar manner throughout the course of rehabilitation or clinical research. Copyright © 2014 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
A medical record review for functional somatic symptoms in children.

PubMed

Rask, Charlotte Ulrikka; Borg, Carsten; Søndergaard, Charlotte; Schulz-Pedersen, Søren; Thomsen, Per Hove; Fink, Per

2010-04-01

The objectives of this study were to develop and test a systematic medical record review for functional somatic symptoms (FSSs) in paediatric patients and to estimate the inter-rater reliability of paediatricians' recognition of FSSs and their associated impairments while using this method. We developed the Medical Record Review for Functional Somatic Symptoms in Children (MRFC) for retrospective medical record review. Described symptoms were categorised as probably, definitely, or not FSSs. FSS-associated impairment was also determined. Three paediatricians performed the MRFC on the medical records of 54 children with a diagnosed, well-defined physical disease and 59 with 'symptom' diagnoses. The inter-rater reliabilities of the recognition and associated impairment of FSSs were tested on 20 of these records. The MRFC allowed identification of subgroups of children with multisymptomatic FSSs, long-term FSSs, and/or impairing FSSs. The FSS inter-rater reliability was good (combined kappa=0.69) but only fair as far as associated impairment was concerned (combined kappa=0.29). In the hands of skilled paediatricians, the MRFC is a reliable method for identifying paediatric patients with diverse types of FSSs for clinical research. However, additional information is needed for reliable judgement of impairment. The method may also prove useful in clinical practice. Copyright 2010 Elsevier Inc. All rights reserved.
Normalized Rotational Multiple Yield Surface Framework (NRMYSF) stress-strain curve prediction method based on small strain triaxial test data on undisturbed Auckland residual clay soils

NASA Astrophysics Data System (ADS)

Noor, M. J. Md; Ibrahim, A.; Rahman, A. S. A.

2018-04-01

Small strain triaxial test measurement is considered to be significantly accurate compared to the external strain measurement using conventional method due to systematic errors normally associated with the test. Three submersible miniature linear variable differential transducer (LVDT) mounted on yokes which clamped directly onto the soil sample at equally 120° from the others. The device setup using 0.4 N resolution load cell and 16 bit AD converter was capable of consistently resolving displacement of less than 1µm and measuring axial strains ranging from less than 0.001% to 2.5%. Further analysis of small strain local measurement data was performed using new Normalized Multiple Yield Surface Framework (NRMYSF) method and compared with existing Rotational Multiple Yield Surface Framework (RMYSF) prediction method. The prediction of shear strength based on combined intrinsic curvilinear shear strength envelope using small strain triaxial test data confirmed the significant improvement and reliability of the measurement and analysis methods. Moreover, the NRMYSF method shows an excellent data prediction and significant improvement toward more reliable prediction of soil strength that can reduce the cost and time of experimental laboratory test.
A new compound control method for sine-on-random mixed vibration test

NASA Astrophysics Data System (ADS)

Zhang, Buyun; Wang, Ruochen; Zeng, Falin

2017-09-01

Vibration environmental test (VET) is one of the important and effective methods to provide supports for the strength design, reliability and durability test of mechanical products. A new separation control strategy was proposed to apply in multiple-input multiple-output (MIMO) sine on random (SOR) mixed mode vibration test, which is the advanced and intensive test type of VET. As the key problem of the strategy, correlation integral method was applied to separate the mixed signals which included random and sinusoidal components. The feedback control formula of MIMO linear random vibration system was systematically deduced in frequency domain, and Jacobi control algorithm was proposed in view of the elements, such as self-spectrum, coherence, and phase of power spectral density (PSD) matrix. Based on the excessive correction of excitation in sine vibration test, compression factor was introduced to reduce the excitation correction, avoiding the destruction to vibration table or other devices. The two methods were synthesized to be applied in MIMO SOR vibration test system. In the final, verification test system with the vibration of a cantilever beam as the control object was established to verify the reliability and effectiveness of the methods proposed in the paper. The test results show that the exceeding values can be controlled in the tolerance range of references accurately, and the method can supply theory and application supports for mechanical engineering.
Agreement between digital image analysis and clinical spectrophotometer in CIEL*C*h° coordinate differences and total color difference (ΔE) measurements of dental ceramic shade tabs.

PubMed

Farah, Ra'fat I

2016-01-01

The objectives of this in vitro study were: 1) to test the agreement among color coordinate differences and total color difference (ΔL*, ΔC*, Δh°, and ΔE) measurements obtained by digital image analysis (DIA) and spectrophotometer, and 2) to test the reliability of each method for obtaining color differences. A digital camera was used to record standardized images of each of the 15 shade tabs from the IPS e.max shade guide placed edge-to-edge in a phantom head with a reference shade tab. The images were analyzed using image-editing software (Adobe Photoshop) to obtain the color differences between the middle area of each test shade tab and the corresponding area of the reference tab. The color differences for the same shade tab areas were also measured using a spectrophotometer. To assess the reliability, measurements for the 15 shade tabs were repeated twice using the two methods. The Intraclass Correlation Coefficient (ICC) and the Dahlberg index were used to calculate agreement and reliability. The total agreement of the two methods for measuring ΔL*, ΔC*, Δh°, and ΔE, according to the ICC, exceeded 0.82. The Dahlberg indices for ΔL* and ΔE were 2.18 and 2.98, respectively. For the reliability calculation, the ICCs for the DIA and the spectrophotometer ΔE were 0.91 and 0.94, respectively. High agreement was obtained between the DIA and spectrophotometer results for the ΔL*, ΔC*, Δh°, and ΔE measurements. Further, the reliability of the measurements for the spectrophotometer was slightly higher than the reliability of all measurements in the DIA.
Test-retest reliability at the item level and total score level of the Norwegian version of the Spinal Cord Injury Falls Concern Scale (SCI-FCS).

PubMed

Roaldsen, Kirsti Skavberg; Måøy, Åsa Blad; Jørgensen, Vivien; Stanghelle, Johan Kvalvik

2016-05-01

Translation of the Spinal Cord Injury Falls Concern Scale (SCI-FCS), and investigation of test-retest reliability on item-level and total-score-level. Translation, adaptation and test-retest study. A specialized rehabilitation setting in Norway. Fifty-four wheelchair users with a spinal cord injury. The median age of the cohort was 49 years, and the median number of years after injury was 13. Interventions/measurements: The SCI-FCS was translated and back-translated according to guidelines. Individuals answered the SCI-FCS twice over the course of one week. We investigated item-level test-retest reliability using Svensson's rank-based statistical method for disagreement analysis of paired ordinal data. For relative reliability, we analyzed the total-score-level test-retest reliability with intraclass correlation coefficients (ICC2.1), the standard error of measurement (SEM), and the smallest detectable change (SDC) for absolute reliability/measurement-error assessment and Cronbach's alpha for internal consistency. All items showed satisfactory percentage agreement (≥69%) between test and retest. There were small but non-negligible systematic disagreements among three items; we recovered an 11-13% higher chance for a lower second score. There was no disagreement due to random variance. The test-retest agreement (ICC2.1) was excellent (0.83). The SEM was 2.6 (12%), and the SDC was 7.1 (32%). The Cronbach's alpha was high (0.88). The Norwegian SCI-FCS is highly reliable for wheelchair users with chronic spinal cord injuries.
Testing Historical Skills.

ERIC Educational Resources Information Center

Baillie, Ray

1980-01-01

Outlines methods for including skill testing in teacher-made history tests. Focuses on distinguishing fact and fiction, evaluating the reliability of a source, distinguishing between primary and secondary sources, recognizing statements which support generalizations, testing with media, mapping geo-politics, and applying knowledge to new…
Reliably detectable flaw size for NDE methods that use calibration

NASA Astrophysics Data System (ADS)

Koshti, Ajay M.

2017-04-01

Probability of detection (POD) analysis is used in assessing reliably detectable flaw size in nondestructive evaluation (NDE). MIL-HDBK-1823 and associated mh18232 POD software gives most common methods of POD analysis. In this paper, POD analysis is applied to an NDE method, such as eddy current testing, where calibration is used. NDE calibration standards have known size artificial flaws such as electro-discharge machined (EDM) notches and flat bottom hole (FBH) reflectors which are used to set instrument sensitivity for detection of real flaws. Real flaws such as cracks and crack-like flaws are desired to be detected using these NDE methods. A reliably detectable crack size is required for safe life analysis of fracture critical parts. Therefore, it is important to correlate signal responses from real flaws with signal responses form artificial flaws used in calibration process to determine reliably detectable flaw size.
Reliably Detectable Flaw Size for NDE Methods that Use Calibration

NASA Technical Reports Server (NTRS)

Koshti, Ajay M.

2017-01-01

Probability of detection (POD) analysis is used in assessing reliably detectable flaw size in nondestructive evaluation (NDE). MIL-HDBK-1823 and associated mh1823 POD software gives most common methods of POD analysis. In this paper, POD analysis is applied to an NDE method, such as eddy current testing, where calibration is used. NDE calibration standards have known size artificial flaws such as electro-discharge machined (EDM) notches and flat bottom hole (FBH) reflectors which are used to set instrument sensitivity for detection of real flaws. Real flaws such as cracks and crack-like flaws are desired to be detected using these NDE methods. A reliably detectable crack size is required for safe life analysis of fracture critical parts. Therefore, it is important to correlate signal responses from real flaws with signal responses form artificial flaws used in calibration process to determine reliably detectable flaw size.
First Order Reliability Application and Verification Methods for Semistatic Structures

NASA Technical Reports Server (NTRS)

Verderaime, Vincent

1994-01-01

Escalating risks of aerostructures stimulated by increasing size, complexity, and cost should no longer be ignored by conventional deterministic safety design methods. The deterministic pass-fail concept is incompatible with probability and risk assessments, its stress audits are shown to be arbitrary and incomplete, and it compromises high strength materials performance. A reliability method is proposed which combines first order reliability principles with deterministic design variables and conventional test technique to surmount current deterministic stress design and audit deficiencies. Accumulative and propagation design uncertainty errors are defined and appropriately implemented into the classical safety index expression. The application is reduced to solving for a factor that satisfies the specified reliability and compensates for uncertainty errors, and then using this factor as, and instead of, the conventional safety factor in stress analyses. The resulting method is consistent with current analytical skills and verification practices, the culture of most designers, and with the pace of semistatic structural designs.
Development and Reliability of Items Measuring the Nonmedical Use of Prescription Drugs for the Youth Risk Behavior Survey: Results Froman Initial Pilot Test

ERIC Educational Resources Information Center

Howard, Melissa M.; Weiler, Robert M.; Haddox, J. David

2009-01-01

Background: The purpose of this study was to develop and test the reliability of self-report survey items designed to monitor the nonmedical use of prescription drugs among adolescents. Methods: Eighteen nonmedical prescription drug items designed to be congruent with the substance abuse items in the US Centers for Disease Control and Prevention's…
Transformation of arbitrary distributions to the normal distribution with application to EEG test-retest reliability.

PubMed

van Albada, S J; Robinson, P A

2007-04-15

Many variables in the social, physical, and biosciences, including neuroscience, are non-normally distributed. To improve the statistical properties of such data, or to allow parametric testing, logarithmic or logit transformations are often used. Box-Cox transformations or ad hoc methods are sometimes used for parameters for which no transformation is known to approximate normality. However, these methods do not always give good agreement with the Gaussian. A transformation is discussed that maps probability distributions as closely as possible to the normal distribution, with exact agreement for continuous distributions. To illustrate, the transformation is applied to a theoretical distribution, and to quantitative electroencephalographic (qEEG) measures from repeat recordings of 32 subjects which are highly non-normal. Agreement with the Gaussian was better than using logarithmic, logit, or Box-Cox transformations. Since normal data have previously been shown to have better test-retest reliability than non-normal data under fairly general circumstances, the implications of our transformation for the test-retest reliability of parameters were investigated. Reliability was shown to improve with the transformation, where the improvement was comparable to that using Box-Cox. An advantage of the general transformation is that it does not require laborious optimization over a range of parameters or a case-specific choice of form.
A comparative study of first-derivative spectrophotometry and column high-performance liquid chromatography applied to the determination of repaglinide in tablets and for dissolution testing.

PubMed

AlKhalidi, Bashar A; Shtaiwi, Majed; AlKhatib, Hatim S; Mohammad, Mohammad; Bustanji, Yasser

2008-01-01

A fast and reliable method for the determination of repaglinide is highly desirable to support formulation screening and quality control. A first-derivative UV spectroscopic method was developed for the determination of repaglinide in tablet dosage form and for dissolution testing. First-derivative UV absorbance was measured at 253 nm. The developed method was validated for linearity, accuracy, precision, limit of detection (LOD), and limit of quantitation (LOQ) in comparison to the U.S. Pharmacopeia (USP) column high-performance liquid chromatographic (HPLC) method. The first-derivative UV spectrophotometric method showed excellent linearity [correlation coefficient (r) = 0.9999] in the concentration range of 1-35 microg/mL and precision (relative standard deviation < 1.5%). The LOD and LOQ were 0.23 and 0.72 microg/mL, respectively, and good recoveries were achieved (98-101.8%). Statistical comparison of results of the first-derivative UV spectrophotometric and the USP HPLC methods using the t-test showed that there was no significant difference between the 2 methods. Additionally, the method was successfully used for the dissolution test of repaglinide and was found to be reliable, simple, fast, and inexpensive.
Methods for the Identification of Aircraft Tubing of Plain Carbon Steel and Chromium-Molybdenum Steel

NASA Technical Reports Server (NTRS)

Mutchler, W H; Buzzard, R W

1930-01-01

The survey of the possibilities for distinguishing between plain carbon and chromium-molybdenum steel tubing included the Herbert pendulum hardness, magnetic, sparks, and chemical tests. The Herbert pendulum test has the disadvantages of all hardness tests in being limited to factory use and being applicable only to scale-free, normalized material. The small difference in the range of hardness values between plain carbon and chromium-molybdenum steels is likewise a disadvantage. The Rockwell hardness test, at present used in the industry for this purpose, is much more reliable. It may be concluded on the basis of the experiments performed that of all methods surveyed, spark testing appears to be, at present, the most suitable for factory use from the standpoint of speed, accuracy, nondestructiveness and reliability. It is also applicable for field use.

A New Volumetric Radiologic Method to Assess Indirect Decompression After Extreme Lateral Interbody Fusion Using High-Resolution Intraoperative Computed Tomography.

PubMed

Navarro-Ramirez, Rodrigo; Berlin, Connor; Lang, Gernot; Hussain, Ibrahim; Janssen, Insa; Sloan, Stephen; Askin, Gulce; Avila, Mauricio J; Zubkov, Micaella; Härtl, Roger

2018-01-01

Two-dimensional radiographic methods have been proposed to evaluate the radiographic outcome after indirect decompression through extreme lateral interbody fusion (XLIF). However, the assessment of neural decompression in a single plane may underestimate the effect of indirect decompression on central canal and foraminal volumes. The present study aimed to assess the reliability and consistency of a novel 3-dimensional radiographic method that assesses neural decompression by volumetric analysis using a new generation of intraoperative fan-beam computed tomography scanner in patients undergoing XLIF. Prospectively collected data from 7 patients (9 levels) undergoing XLIF was retrospectively analyzed. Three independent, blind raters using imaging analysis software performed volumetric measurements pre- and postoperatively to determine central canal and foraminal volumes. Intrarater and Interrater reliability tests were performed to assess the reliability of this novel volumetric method. The interrater reliability between the three raters ranged from 0.800 to 0.952, P < 0.0001. The test-retest analysis on a randomly selected subset of three patients showed good to excellent internal reliability (range of 0.78-1.00) for all 3 raters. There was a significant increase in mean volume ≈20% for right foramen, left foramen, and central canal volumes postoperatively (P = 0.0472; P = 0.0066; P = 0.0003, respectively). Here we demonstrate a new volumetric analysis technique that is feasible, reliable, and reproducible amongst independent raters for central canal and foraminal volumes in the lumbar spine using an intraoperative computed tomography scanner. Copyright © 2017. Published by Elsevier Inc.
Identifying dyslexia in adults: an iterative method using the predictive value of item scores and self-report questions.

PubMed

Tamboer, Peter; Vorst, Harrie C M; Oort, Frans J

2014-04-01

Methods for identifying dyslexia in adults vary widely between studies. Researchers have to decide how many tests to use, which tests are considered to be the most reliable, and how to determine cut-off scores. The aim of this study was to develop an objective and powerful method for diagnosing dyslexia. We took various methodological measures, most of which are new compared to previous methods. We used a large sample of Dutch first-year psychology students, we considered several options for exclusion and inclusion criteria, we collected as many cognitive tests as possible, we used six independent sources of biographical information for a criterion of dyslexia, we compared the predictive power of discriminant analyses and logistic regression analyses, we used both sum scores and item scores as predictor variables, we used self-report questions as predictor variables, and we retested the reliability of predictions with repeated prediction analyses using an adjusted criterion. We were able to identify 74 dyslexic and 369 non-dyslexic students. For 37 students, various predictions were too inconsistent for a final classification. The most reliable predictions were acquired with item scores and self-report questions. The main conclusion is that it is possible to identify dyslexia with a high reliability, although the exact nature of dyslexia is still unknown. We therefore believe that this study yielded valuable information for future methods of identifying dyslexia in Dutch as well as in other languages, and that this would be beneficial for comparing studies across countries.
A simple behavioral test for locomotor function after brain injury in mice.

PubMed

Tabuse, Masanao; Yaguchi, Masae; Ohta, Shigeki; Kawase, Takeshi; Toda, Masahiro

2010-11-01

To establish a simple and reliable test for assessing locomotor function in mice with brain injury, we developed a new method, the rotarod slip test, in which the number of slips of the paralytic hind limb from a rotarod is counted. Brain injuries of different severity were created in adult C57BL/6 mice, by inflicting 1-point, 2-point and 4-point cryo-injuries. These mice were subjected to the rotarod slip test, the accelerating rotarod test and the elevated body swing test (EBST). Histological analyses were performed to assess the severity of the brain damage. Significant and consistent correlations between test scores and severity were observed for the rotarod slip test and the EBST. Only the rotarod slip test detected the mild hindlimb paresis in the acute and sub-acute phase after injury. Our results suggest that the rotarod slip test is the most sensitive and reliable method for assessing locomotor function after brain damage in mice. Copyright © 2010 Elsevier Ltd. All rights reserved.
A critical analysis of test-retest reliability in instrument validation studies of cancer patients under palliative care: a systematic review

PubMed Central

2014-01-01

Background Patient-reported outcome validation needs to achieve validity and reliability standards. Among reliability analysis parameters, test-retest reliability is an important psychometric property. Retested patients must be in a clinically stable condition. This is particularly problematic in palliative care (PC) settings because advanced cancer patients are prone to a faster rate of clinical deterioration. The aim of this study was to evaluate the methods by which multi-symptom and health-related qualities of life (HRQoL) based on patient-reported outcomes (PROs) have been validated in oncological PC settings with regards to test-retest reliability. Methods A systematic search of PubMed (1966 to June 2013), EMBASE (1980 to June 2013), PsychInfo (1806 to June 2013), CINAHL (1980 to June 2013), and SCIELO (1998 to June 2013), and specific PRO databases was performed. Studies were included if they described a set of validation studies. Studies were included if they described a set of validation studies for an instrument developed to measure multi-symptom or multidimensional HRQoL in advanced cancer patients under PC. The COSMIN checklist was used to rate the methodological quality of the study designs. Results We identified 89 validation studies from 746 potentially relevant articles. From those 89 articles, 31 measured test-retest reliability and were included in this review. Upon critical analysis of the overall quality of the criteria used to determine the test-retest reliability, 6 (19.4%), 17 (54.8%), and 8 (25.8%) of these articles were rated as good, fair, or poor, respectively, and no article was classified as excellent. Multi-symptom instruments were retested over a shortened interval when compared to the HRQoL instruments (median values 24 hours and 168 hours, respectively; p = 0.001). Validation studies that included objective confirmation of clinical stability in their design yielded better results for the test-retest analysis with regard to both pain and global HRQoL scores (p < 0.05). The quality of the statistical analysis and its description were of great concern. Conclusion Test-retest reliability has been infrequently and poorly evaluated. The confirmation of clinical stability was an important factor in our analysis, and we suggest that special attention be focused on clinical stability when designing a PRO validation study that includes advanced cancer patients under PC. PMID:24447633
Validity and reliability of the abdominal test and evaluation systems tool (ABTEST) to accurately measure abdominal force.

PubMed

Glenn, Jordan M; Galey, Madeline; Edwards, Abigail; Rickert, Bradley; Washington, Tyrone A

2015-07-01

Ability to generate force from the core musculature is a critical factor for sports and general activities with insufficiencies predisposing individuals to injury. This study evaluated isometric force production as a valid and reliable method of assessing abdominal force using the abdominal test and evaluation systems tool (ABTEST). Secondary analysis estimated 1-repetition maximum on commercially available abdominal machine compared to maximum force and average power on ABTEST system. This study utilized test-retest reliability and comparative analysis for validity. Reliability was measured using test-retest design on ABTEST. Validity was measured via comparison to estimated 1-repetition maximum on a commercially available abdominal device. Participants applied isometric, abdominal force against a transducer and muscular activation was evaluated measuring normalized electromyographic activity at the rectus-abdominus, rectus-femoris, and erector-spinae. Test, re-test force production on ABTEST was significantly correlated (r=0.84; p<0.001). Mean electromyographic activity for the rectus-abdominus (72.93% and 75.66%), rectus-femoris (6.59% and 6.51%), and erector-spinae (6.82% and 5.48%) were observed for trial-1 and trial-2, respectively. Significant correlations for the estimated 1-repetition maximum were found for average power (r=0.70, p=0.002) and maximum force (r=0.72, p<0.001). Data indicate the ABTEST can accurately measure rectus-abdominus force isolated from hip-flexor involvement. Negligible activation of erector-spinae substantiates little subjective effort among participants in the lower back. Results suggest ABTEST is a valid and reliable method of evaluating abdominal force. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Choosing a reliability inspection plan for interval censored data

DOE PAGES

Lu, Lu; Anderson-Cook, Christine Michaela

2017-04-19

Reliability test plans are important for producing precise and accurate assessment of reliability characteristics. This paper explores different strategies for choosing between possible inspection plans for interval censored data given a fixed testing timeframe and budget. A new general cost structure is proposed for guiding precise quantification of total cost in inspection test plan. Multiple summaries of reliability are considered and compared as the criteria for choosing the best plans using an easily adapted method. Different cost structures and representative true underlying reliability curves demonstrate how to assess different strategies given the logistical constraints and nature of the problem. Resultsmore » show several general patterns exist across a wide variety of scenarios. Given the fixed total cost, plans that inspect more units with less frequency based on equally spaced time points are favored due to the ease of implementation and consistent good performance across a large number of case study scenarios. Plans with inspection times chosen based on equally spaced probabilities offer improved reliability estimates for the shape of the distribution, mean lifetime, and failure time for a small fraction of population only for applications with high infant mortality rates. The paper uses a Monte Carlo simulation based approach in addition to the common evaluation based on the asymptotic variance and offers comparison and recommendation for different applications with different objectives. Additionally, the paper outlines a variety of different reliability metrics to use as criteria for optimization, presents a general method for evaluating different alternatives, as well as provides case study results for different common scenarios.« less
Choosing a reliability inspection plan for interval censored data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lu, Lu; Anderson-Cook, Christine Michaela

Reliability test plans are important for producing precise and accurate assessment of reliability characteristics. This paper explores different strategies for choosing between possible inspection plans for interval censored data given a fixed testing timeframe and budget. A new general cost structure is proposed for guiding precise quantification of total cost in inspection test plan. Multiple summaries of reliability are considered and compared as the criteria for choosing the best plans using an easily adapted method. Different cost structures and representative true underlying reliability curves demonstrate how to assess different strategies given the logistical constraints and nature of the problem. Resultsmore » show several general patterns exist across a wide variety of scenarios. Given the fixed total cost, plans that inspect more units with less frequency based on equally spaced time points are favored due to the ease of implementation and consistent good performance across a large number of case study scenarios. Plans with inspection times chosen based on equally spaced probabilities offer improved reliability estimates for the shape of the distribution, mean lifetime, and failure time for a small fraction of population only for applications with high infant mortality rates. The paper uses a Monte Carlo simulation based approach in addition to the common evaluation based on the asymptotic variance and offers comparison and recommendation for different applications with different objectives. Additionally, the paper outlines a variety of different reliability metrics to use as criteria for optimization, presents a general method for evaluating different alternatives, as well as provides case study results for different common scenarios.« less
Developing Confidence Limits For Reliability Of Software

NASA Technical Reports Server (NTRS)

Hayhurst, Kelly J.

1991-01-01

Technique developed for estimating reliability of software by use of Moranda geometric de-eutrophication model. Pivotal method enables straightforward construction of exact bounds with associated degree of statistical confidence about reliability of software. Confidence limits thus derived provide precise means of assessing quality of software. Limits take into account number of bugs found while testing and effects of sampling variation associated with random order of discovering bugs.
Feasibility, Reliability and Validity of the Dutch Translation of the Anxiety, Depression and Mood Scale in Older Adults with Intellectual Disabilities

ERIC Educational Resources Information Center

Hermans, Heidi; Jelluma, Naftha; van der Pas, Femke H.; Evenhuis, Heleen M.

2012-01-01

Background: The informant-based Anxiety, Depression And Mood Scale was translated into Dutch and its feasibility, reliability and validity in older adults (aged greater than or equal to 50 years) with intellectual disabilities (ID) was studied. Method: Test-retest (n = 93) and interrater reliability (n = 83), and convergent (n = 202 and n = 787),…
THE NAVICULAR POSITION TEST – A RELIABLE MEASURE OF THE NAVICULAR BONE POSITION DURING REST AND LOADING

PubMed Central

Spörndly-Nees, Søren; Dåsberg, Brian; Nielsen, Rasmus Oestergaard; Boesen, Morten Ilum

2011-01-01

Background: Lower limb injuries are a large problem in athletes. However, there is a paucity of knowledge on the relationship between alignment of the medial longitudinal arch (MLA) of the foot and development of such injuries. A reliable and valid test to quantify foot type is needed to be able to investigate the relationship between arch type and injury likelihood. Feiss Line is a valid clinical measure of the MLA. However, no study has investigated the reliability of the test. Objectives: The purpose was to describe a modified version of the Feiss Line test and to determine the intra- and inter-tester reliability of this new foot alignment test. To emphasize the purpose of the modified test, the authors have named it The Navicular Position Test. Methods: Intra- and inter-tester reliability were evaluated of The Navicular Position Test with the use of ICC (interclass correlation coefficient) and Bland-Altman limits of agreement on 43 healthy, young, subjects. Results: Inter-tester mean difference -0.35 degrees [–1.32; 0.62] p = 0.47. Bland-Altman limits of agreement –6.55 to 5.85 degrees, ICC = 0.94. Intra-tester mean difference 0.47 degrees [–0.57; 1.50] p = 0.37. Bland-Altman limits of agreement –6.15 to 7.08 degrees, ICC = 0.91. Discussion: The present data support The Navicular Position Test as a reliable test of the navicular bone position during rest and loading measured in a simple test set-up. Conclusion: The Navicular Position Test was shown to have a high intraday-, intra- and inter-tester reliability. When cut off values to categorize the MLA into planus, rectus, or cavus feet, has been determined and presented, the test could be used in prospective observational studies investigating the role of the arch type on the development of various lower limb injuries. PMID:21904698
Psychometrics of the Home Safety Self-Assessment Tool (HSSAT) to prevent falls in community-dwelling older adults.

PubMed

Tomita, Machiko R; Saharan, Sumandeep; Rajendran, Sheela; Nochajski, Susan M; Schweitzer, Jo A

2014-01-01

OBJECTIVE. To identify psychometric properties of the Home Safety Self-Assessment Tool (HSSAT) to prevent falls in community-dwelling older adults. METHOD. We tested content validity, test-retest reliability, interrater reliability, construct validity, convergent and discriminant validity, and responsiveness to change. RESULTS. The content validity index was .98, the intraclass correlation coefficient for test-retest reliability was .97, and the interrater reliability was .89. The difference on identified risk factors between the use and nonuse of the HSSAT was significant (p = .005). Convergent validity with the Centers for Disease Control and Prevention Home Safety Checklist was high (r = .65), and discriminant validity with fear of falling was very low (r = .10). The responsiveness to change was moderate (standardized response mean = 0.57). CONCLUSION. The HSSAT is a reliable and valid instrument to identify fall risks in a home environment, and the HSSAT booklet is effective as educational material leading to improvement in home safety. Copyright © 2014 by the American Occupational Therapy Association, Inc.
Gearbox Reliability Collaborative Phase 3 Gearbox 2 Test Plan

DOE Office of Scientific and Technical Information (OSTI.GOV)

Link, H.; Keller, J.; Guo, Y.

2013-04-01

Gearboxes in wind turbines have not been achieving their expected design life even though they commonly meet or exceed the design criteria specified in current design standards. One of the basic premises of the National Renewable Energy Laboratory (NREL) Gearbox Reliability Collaborative (GRC) is that the low gearbox reliability results from the absence of critical elements in the design process or insufficient design tools. Key goals of the GRC are to improve design approaches and analysis tools and to recommend practices and test methods resulting in improved design standards for wind turbine gearboxes that lower the cost of energy (COE)more » through improved reliability. The GRC uses a combined gearbox testing, modeling and analysis approach, along with a database of information from gearbox failures collected from overhauls and investigation of gearbox condition monitoring techniques to improve wind turbine operations and maintenance practices. Testing of Gearbox 2 (GB2) using the two-speed turbine controller that has been used in prior testing. This test series will investigate non-torque loads, high-speed shaft misalignment, and reproduction of field conditions in the dynamometer. This test series will also include vibration testing using an eddy-current brake on the gearbox's high speed shaft.« less
Psychometric instrumentation: reliability and validity of instruments used for clinical practice, evidence-based practice projects and research studies.

PubMed

Mayo, Ann M

2015-01-01

It is important for CNSs and other APNs to consider the reliability and validity of instruments chosen for clinical practice, evidence-based practice projects, or research studies. Psychometric testing uses specific research methods to evaluate the amount of error associated with any particular instrument. Reliability estimates explain more about how well the instrument is designed, whereas validity estimates explain more about scores that are produced by the instrument. An instrument may be architecturally sound overall (reliable), but the same instrument may not be valid. For example, if a specific group does not understand certain well-constructed items, then the instrument does not produce valid scores when used with that group. Many instrument developers may conduct reliability testing only once, yet continue validity testing in different populations over many years. All CNSs should be advocating for the use of reliable instruments that produce valid results. Clinical nurse specialists may find themselves in situations where reliability and validity estimates for some instruments that are being utilized are unknown. In such cases, CNSs should engage key stakeholders to sponsor nursing researchers to pursue this most important work.
We need more replication research - A case for test-retest reliability.

PubMed

Leppink, Jimmie; Pérez-Fuster, Patricia

2017-06-01

Following debates in psychology on the importance of replication research, we have also started to see pleas for a more prominent role for replication research in medical education. To enable replication research, it is of paramount importance to carefully study the reliability of the instruments we use. Cronbach's alpha has been the most widely used estimator of reliability in the field of medical education, notably as some kind of quality label of test or questionnaire scores based on multiple items or of the reliability of assessment across exam stations. However, as this narrative review outlines, Cronbach's alpha or alternative reliability statistics may complement but not replace psychometric methods such as factor analysis. Moreover, multiple-item measurements should be preferred above single-item measurements, and when using single-item measurements, coefficients as Cronbach's alpha should not be interpreted as indicators of the reliability of a single item when that item is administered after fundamentally different activities, such as learning tasks that differ in content. Finally, if we want to follow up on recent pleas for more replication research, we have to start studying the test-retest reliability of the instruments we use.
How effective are selection methods in medical education? A systematic review.

PubMed

Patterson, Fiona; Knight, Alec; Dowell, Jon; Nicholson, Sandra; Cousans, Fran; Cleland, Jennifer

2016-01-01

Selection methods used by medical schools should reliably identify whether candidates are likely to be successful in medical training and ultimately become competent clinicians. However, there is little consensus regarding methods that reliably evaluate non-academic attributes, and longitudinal studies examining predictors of success after qualification are insufficient. This systematic review synthesises the extant research evidence on the relative strengths of various selection methods. We offer a research agenda and identify key considerations to inform policy and practice in the next 50 years. A formalised literature search was conducted for studies published between 1997 and 2015. A total of 194 articles met the inclusion criteria and were appraised in relation to: (i) selection method used; (ii) research question(s) addressed, and (iii) type of study design. Eight selection methods were identified: (i) aptitude tests; (ii) academic records; (iii) personal statements; (iv) references; (v) situational judgement tests (SJTs); (vi) personality and emotional intelligence assessments; (vii) interviews and multiple mini-interviews (MMIs), and (viii) selection centres (SCs). The evidence relating to each method was reviewed against four evaluation criteria: effectiveness (reliability and validity); procedural issues; acceptability, and cost-effectiveness. Evidence shows clearly that academic records, MMIs, aptitude tests, SJTs and SCs are more effective selection methods and are generally fairer than traditional interviews, references and personal statements. However, achievement in different selection methods may differentially predict performance at the various stages of medical education and clinical practice. Research into selection has been over-reliant on cross-sectional study designs and has tended to focus on reliability estimates rather than validity as an indicator of quality. A comprehensive framework of outcome criteria should be developed to allow researchers to interpret empirical evidence and compare selection methods fairly. This review highlights gaps in evidence for the combination of selection tools that is most effective and the weighting to be given to each tool. © 2015 John Wiley & Sons Ltd.
Measuring verbal and non-verbal communication in aphasia: reliability, validity, and sensitivity to change of the Scenario Test.

PubMed

van der Meulen, Ineke; van de Sandt-Koenderman, W Mieke E; Duivenvoorden, Hugo J; Ribbers, Gerard M

2010-01-01

This study explores the psychometric qualities of the Scenario Test, a new test to assess daily-life communication in severe aphasia. The test is innovative in that it: (1) examines the effectiveness of verbal and non-verbal communication; and (2) assesses patients' communication in an interactive setting, with a supportive communication partner. To determine the reliability, validity, and sensitivity to change of the Scenario Test and discuss its clinical value. The Scenario Test was administered to 122 persons with aphasia after stroke and to 25 non-aphasic controls. Analyses were performed for the entire group of persons with aphasia, as well as for a subgroup of persons unable to communicate verbally (n = 43). Reliability (internal consistency, test-retest reliability, inter-judge, and intra-judge reliability) and validity (internal validity, convergent validity, known-groups validity) and sensitivity to change were examined using standard psychometric methods. The Scenario Test showed high levels of reliability. Internal consistency (Cronbach's alpha = 0.96; item-rest correlations = 0.58-0.82) and test-retest reliability (ICC = 0.98) were high. Agreement between judges in total scores was good, as indicated by the high inter- and intra-judge reliability (ICC = 0.86-1.00). Agreement in scores on the individual items was also good (square-weighted kappa values 0.61-0.92). The test demonstrated good levels of validity. A principal component analysis for categorical data identified two dimensions, interpreted as general communication and communicative creativity. Correlations with three other instruments measuring communication in aphasia, that is, Spontaneous Speech interview from the Aachen Aphasia Test (AAT), Amsterdam-Nijmegen Everyday Language Test (ANELT), and Communicative Effectiveness Index (CETI), were moderate to strong (0.50-0.85) suggesting good convergent validity. Group differences were observed between persons with aphasia and non-aphasic controls, as well as between persons with aphasia unable to use speech to convey information and those able to communicate verbally; this indicates good known-groups validity. The test was sensitive to changes in performance, measured over a period of 6 months. The data support the reliability and validity of the Scenario Test as an instrument for examining daily-life communication in aphasia. The test focuses on multimodal communication; its psychometric qualities enable future studies on the effect of Alternative and Augmentative Communication (AAC) training in aphasia.
Evaluating abdominal core muscle fatigue: Assessment of the validity and reliability of the prone bridging test.

PubMed

De Blaiser, C; De Ridder, R; Willems, T; Danneels, L; Vanden Bossche, L; Palmans, T; Roosen, P

2018-02-01

The aims of this study were to research the amplitude and median frequency characteristics of selected abdominal, back, and hip muscles of healthy subjects during a prone bridging endurance test, based on surface electromyography (sEMG), (a) to determine if the prone bridging test is a valid field test to measure abdominal muscle fatigue, and (b) to evaluate if the current method of administrating the prone bridging test is reliable. Thirty healthy subjects participated in this experiment. The sEMG activity of seven abdominal, back, and hip muscles was bilaterally measured. Normalized median frequencies were computed from the EMG power spectra. The prone bridging tests were repeated on separate days to evaluate inter and intratester reliability. Significant differences in normalized median frequency slope (NMF slope ) values between several abdominal, back, and hip muscles could be demonstrated. Moderate-to-high correlation coefficients were shown between NMF slope values and endurance time. Multiple backward linear regression revealed that the test endurance time could only be significantly predicted by the NMF slope of the rectus abdominis. Statistical analysis showed excellent reliability (ICC=0.87-0.89). The findings of this study support the validity and reliability of the prone bridging test for evaluating abdominal muscle fatigue. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
The Arthroscopic Surgical Skill Evaluation Tool (ASSET).

PubMed

Koehler, Ryan J; Amsdell, Simon; Arendt, Elizabeth A; Bisson, Leslie J; Braman, Jonathan P; Bramen, Jonathan P; Butler, Aaron; Cosgarea, Andrew J; Harner, Christopher D; Garrett, William E; Olson, Tyson; Warme, Winston J; Nicandri, Gregg T

2013-06-01

Surgeries employing arthroscopic techniques are among the most commonly performed in orthopaedic clinical practice; however, valid and reliable methods of assessing the arthroscopic skill of orthopaedic surgeons are lacking. The Arthroscopic Surgery Skill Evaluation Tool (ASSET) will demonstrate content validity, concurrent criterion-oriented validity, and reliability when used to assess the technical ability of surgeons performing diagnostic knee arthroscopic surgery on cadaveric specimens. Cross-sectional study; Level of evidence, 3. Content validity was determined by a group of 7 experts using the Delphi method. Intra-articular performance of a right and left diagnostic knee arthroscopic procedure was recorded for 28 residents and 2 sports medicine fellowship-trained attending surgeons. Surgeon performance was assessed by 2 blinded raters using the ASSET. Concurrent criterion-oriented validity, interrater reliability, and test-retest reliability were evaluated. Content validity: The content development group identified 8 arthroscopic skill domains to evaluate using the ASSET. Concurrent criterion-oriented validity: Significant differences in the total ASSET score (P < .05) between novice, intermediate, and advanced experience groups were identified. Interrater reliability: The ASSET scores assigned by each rater were strongly correlated (r = 0.91, P < .01), and the intraclass correlation coefficient between raters for the total ASSET score was 0.90. Test-retest reliability: There was a significant correlation between ASSET scores for both procedures attempted by each surgeon (r = 0.79, P < .01). The ASSET appears to be a useful, valid, and reliable method for assessing surgeon performance of diagnostic knee arthroscopic surgery in cadaveric specimens. Studies are ongoing to determine its generalizability to other procedures as well as to the live operating room and other simulated environments.
Advanced Guidance and Control Methods for Reusable Launch Vehicles: Test Results

NASA Technical Reports Server (NTRS)

Hanson, John M.; Jones, Robert E.; Krupp, Don R.; Fogle, Frank R. (Technical Monitor)

2002-01-01

There are a number of approaches to advanced guidance and control (AG&C) that have the potential for achieving the goals of significantly increasing reusable launch vehicle (RLV) safety/reliability and reducing the cost. In this paper, we examine some of these methods and compare the results. We briefly introduce the various methods under test, list the test cases used to demonstrate that the desired results are achieved, show an automated test scoring method that greatly reduces the evaluation effort required, and display results of the tests. Results are shown for the algorithms that have entered testing so far.
NDE detectability of fatigue type cracks in high strength alloys

NASA Technical Reports Server (NTRS)

Christner, B. K.; Rummel, W. D.

1983-01-01

Specimens suitable for investigating the reliability of production nondestructive evaluation (NDE) to detect tightly closed fatigue cracks in high strength alloys representative of those materials used in spacecraft engine/booster construction were produced. Inconel 718 was selected as representative of nickel base alloys and Haynes 188 was selected as representative of cobalt base alloys used in this application. Cleaning procedures were developed to insure the reusability of the test specimens and a flaw detection reliability assessment of the fluorescent penetrant inspection method was performed using the test specimens produced to characterize their use for future reliability assessments and to provide additional NDE flaw detection reliability data for high strength alloys. The statistical analysis of the fluorescent penetrant inspection data was performed to determine the detection reliabilities for each inspection at a 90% probability/95% confidence level.

Reliability of the Roussel Uclaf Causality Assessment Method for Assessing Causality in Drug-Induced Liver Injury*

PubMed Central

Rochon, James; Protiva, Petr; Seeff, Leonard B.; Fontana, Robert J.; Liangpunsakul, Suthat; Watkins, Paul B.; Davern, Timothy; McHutchison, John G.

2013-01-01

The Roussel Uclaf Causality Assessment Method (RUCAM) was developed to quantify the strength of association between a liver injury and the medication implicated as causing the injury. However, its reliability in a research setting has never been fully explored. The aim of this study was to determine test-retest and interrater reliabilities of RUCAM in retrospectively-identified cases of drug induced liver injury. The Drug-Induced Liver Injury Network is enrolling well-defined cases of hepatotoxicity caused by isoniazid, phenytoin, clavulanate/amoxicillin, or valproate occurring since 1994. Each case was adjudicated by three reviewers working independently; after an interval of at least 5 months, cases were readjudicated by the same reviewers. A total of 40 drug-induced liver injury cases were enrolled including individuals treated with isoniazid (nine), phenytoin (five), clavulanate/amoxicillin (15), and valproate (11). Mean ± standard deviation age at protocol-defined onset was 44.8 ± 19.5 years; patients were 68% female and 78% Caucasian. Cases were classified as hepatocellular (44%), mixed (28%), or cholestatic (28%). Test-retest differences ranged from −7 to +8 with complete agreement in only 26% of cases. On average, the maximum absolute difference among the three reviewers was 3.1 on the first adjudication and 2.7 on the second, although much of this variability could be attributed to differences between the enrolling investigator and the external reviewers. The test-retest reliability by the same assessors was 0.54 (upper 95% confidence limit = 0.77); the interrater reliability was 0.45 (upper 95% confidence limit = 0.58). Categorizing the RUCAM to a five-category scale improved these reliabilities but only marginally. Conclusion The mediocre reliability of the RUCAM is problematic for future studies of drug-induced liver injury. Alternative methods, including modifying the RUCAM, developing drug-specific instruments, or causality assessment based on expert opinion, may be more appropriate. PMID:18798340
Between-day reliability of a method for non-invasive estimation of muscle composition.

PubMed

Simunič, Boštjan

2012-08-01

Tensiomyography is a method for valid and non-invasive estimation of skeletal muscle fibre type composition. The validity of selected temporal tensiomyographic measures has been well established recently; there is, however, no evidence regarding the method's between-day reliability. Therefore it is the aim of this paper to establish the between-day repeatability of tensiomyographic measures in three skeletal muscles. For three consecutive days, 10 healthy male volunteers (mean±SD: age 24.6 ± 3.0 years; height 177.9 ± 3.9 cm; weight 72.4 ± 5.2 kg) were examined in a supine position. Four temporal measures (delay, contraction, sustain, and half-relaxation time) and maximal amplitude were extracted from the displacement-time tensiomyogram. A reliability analysis was performed with calculations of bias, random error, coefficient of variation (CV), standard error of measurement, and intra-class correlation coefficient (ICC) with a 95% confidence interval. An analysis of ICC demonstrated excellent agreement (ICC were over 0.94 in 14 out of 15 tested parameters). However, lower CV was observed in half-relaxation time, presumably because of the specifics of the parameter definition itself. These data indicate that for the three muscles tested, tensiomyographic measurements were reproducible across consecutive test days. Furthermore, we indicated the most possible origin of the lowest reliability detected in half-relaxation time. Copyright © 2012 Elsevier Ltd. All rights reserved.
An empirical study of flight control software reliability

NASA Technical Reports Server (NTRS)

Dunham, J. R.; Pierce, J. L.

1986-01-01

The results of a laboratory experiment in flight control software reliability are reported. The experiment tests a small sample of implementations of a pitch axis control law for a PA28 aircraft with over 14 million pitch commands with varying levels of additive input and feedback noise. The testing which uses the method of n-version programming for error detection surfaced four software faults in one implementation of the control law. The small number of detected faults precluded the conduct of the error burst analyses. The pitch axis problem provides data for use in constructing a model in the prediction of the reliability of software in systems with feedback. The study is undertaken to find means to perform reliability evaluations of flight control software.
Reliability of the test of gross motor development second edition (TGMD-2) for Kindergarten children in Myanmar

PubMed Central

Aye, Thanda; Oo, Khin Saw; Khin, Myo Thuzar; Kuramoto-Ahuja, Tsugumi; Maruyama, Hitoshi

2017-01-01

[Purpose] The purpose of this study was to investigate reliability of the test of gross motor development second edition (TGMD-2) for Kindergarten children in Myanmar. [Subjects and Methods] Fifty healthy Kindergarten children (23 males, 27 females) whose parents/guardians had given written consent were participated. The subjects were explained and demonstrated all 12 gross motor skills of TGMD-2 before the assessment. Each subject individually performed two trials for each gross motor skill and the performance was video recorded. Three raters separately watched the video recordings and rated for inter-rater reliability. The second assessment was done one month later with 25 out of 50 subjects for test-rest reliability. The video recordings of 12 subjects were randomly selected from the first 50 recordings for intra-rater reliability six weeks after the first assessment. The agreement on the locomotor and object control raw scores and the gross motor quotient (GMQ) were calculated. [Results] The findings of all the reliability coefficients for the locomotor and object control raw scores and the GMQ were interpreted as good and excellent reliability. [Conclusion] The results represented that TGMD-2 is a highly reliable and appropriate assessment tool for assessing gross motor skill development of Kindergarten children in Myanmar. PMID:29184278
Reliability of the test of gross motor development second edition (TGMD-2) for Kindergarten children in Myanmar.

PubMed

Aye, Thanda; Oo, Khin Saw; Khin, Myo Thuzar; Kuramoto-Ahuja, Tsugumi; Maruyama, Hitoshi

2017-10-01

[Purpose] The purpose of this study was to investigate reliability of the test of gross motor development second edition (TGMD-2) for Kindergarten children in Myanmar. [Subjects and Methods] Fifty healthy Kindergarten children (23 males, 27 females) whose parents/guardians had given written consent were participated. The subjects were explained and demonstrated all 12 gross motor skills of TGMD-2 before the assessment. Each subject individually performed two trials for each gross motor skill and the performance was video recorded. Three raters separately watched the video recordings and rated for inter-rater reliability. The second assessment was done one month later with 25 out of 50 subjects for test-rest reliability. The video recordings of 12 subjects were randomly selected from the first 50 recordings for intra-rater reliability six weeks after the first assessment. The agreement on the locomotor and object control raw scores and the gross motor quotient (GMQ) were calculated. [Results] The findings of all the reliability coefficients for the locomotor and object control raw scores and the GMQ were interpreted as good and excellent reliability. [Conclusion] The results represented that TGMD-2 is a highly reliable and appropriate assessment tool for assessing gross motor skill development of Kindergarten children in Myanmar.
Oxygen uptake during functional activities after stroke—Reliability and validity of a portable ergospirometry system

PubMed Central

Brurok, Berit; Tjønna, Arnt Erik; Tørhaug, Tom; Askim, Torunn

2017-01-01

Background People with stroke have a low peak aerobic capacity and experience increased effort during performance of daily activities. The purpose of this study was to examine test-retest reliability of a portable ergospirometry system in people with stroke during performance of functional activities in a field-test. Secondary aims were to examine the proportion of oxygen consumed during the field-test in relation to the peak-test and to analyse the correlation between the oxygen uptake during the field-test and peak-test in order to support the validity of the field-test. Methods With simultaneous measurement of oxygen consumption, participants performed a standardized field-test consisting of five activities; walking over ground, stair walking, stepping over obstacles, walking slalom between cones and from a standing position lifting objects from one height to another. All activities were performed in self-selected speed. Prior to the field-test, a peak aerobic capacity test was performed. The field-test was repeated minimum 2 and maximum 14 days between the tests. ICC2,1 and Bland Altman tests (Limits of Agreement, LoA) were used to analyse test-retest reliability. Results In total 31 participants (39% women, mean (SD) age 54.5 (12.7) years and 21.1 (14.3) months’ post-stroke) were included. The ICC2,1 was ≥ 0.80 for absolute V̇O2, relative V̇O2, minute ventilation, CO2, respiratory exchange ratio, heart rate and Borgs rating of perceived exertion. ICC2,1 for total time to complete the field-test was 0.99. Mean difference in steady state V̇O2 during Test 1 and Test 2 was -0.40 (2.12) The LoAs were -3.75 and 4.51. Participants spent 60.7% of their V̇O2peak performing functional activities. Correlation between field-test and peak-test was 0.689, p = 0.001 for absolute and 0.733, p = 0.001 for relative V̇O2. Conclusions This study presents first evidence on reliability of oxygen uptake during performance of functional activities after stroke, showing very good test-retest reliability. The secondary analysis showed that the amount of energy spent during the field-test relative to the peak-test was high and the correlation between the two test was good, supporting the validity of this method. PMID:29065164
A comparison of the reliability of the trochanteric prominence angle test and the alternative method in healthy subjects.

PubMed

Yoon, Tae-Lim; Park, Kyung-Mi; Choi, Sil-Ah; Lee, Ji-Hyun; Jeong, Hyo-Jung; Cynn, Heon-Seock

2014-04-01

A wide range of intra- and inter-rater reliabilities of the trochanteric prominence angle test (TPAT) has been reported. We introduced the transcondylar angle test (TCAT) as an alternative to the TPAT and using a smartphone as a reliable measurement tool for femoral neck anteversion (FNA) measurement. The reliabilities of the TPAT and the TCAT, the reliability of using a smartphone as a clinical measurement tool, and the correlation between the difference value of medial knee joint space (KJS) between rest and tested positions and the difference value between the TPAT and TCAT were assessed. Two physical therapists independently determined the reliabilities of the TPAT with a digital inclinometer, the TCAT with a digital inclinometer, and the TCAT with a smartphone in 19 hips of 10 healthy subjects (5 male and 5 female, 22.2 ± 1.69 years). The medial KJS in rest and the tested position were assessed using a sonography. The intra-class correlation coefficients (ICC) for the intra-rater reliabilities of TPAT with a digital inclinometer (ICC = 0.92), TCAT with a digital inclinometer (ICC = 0.94) and a smartphone (ICC = 0.95) in both testers were substantial. The inter-rater reliability of TPAT with a digital inclinometer was fair (ICC = 0.48) while TCAT with a digital inclinometer (ICC = 0.89) and a smartphone (ICC = 0.85) were substantial. The correlation between the difference value of medial KJS between rest and tested positions and the difference value between TPAT and TCAT was low and statistically non-significant (r = 0.114; p = 0.325). The TCAT would be more reliable than the TPAT in inter-rater test. Using a smartphone is a clinically comparable measuring tool to a digital inclinometer. Copyright © 2013 Elsevier Ltd. All rights reserved.
Cross-cultural Adaptation of the "Functional Activities Questionnaire - FAQ" for use in Brazil

PubMed Central

Sanchez, Maria Angélica dos Santos; Correa, Pricila Cristina Ribeiro; Lourenço, Roberto Alves

2011-01-01

Objective The aim of this paper was to present the results of the first stage of cross-cultural adaptation of the Functional Activities Questionnaire (FAQ). Methods The tool was subjected to translation and re-translation, and the test-retest reliability of a proposed version for use in Brazil was analyzed. Results Of the 548 questionnaire respondents, a convenience sample of 68 informants was selected for retesting. Internal consistency was measured by Cronbach's alpha (0.95) while test-retest reliability was assessed using intra-class correlation (0.97). The findings have shown that FAQ is brief - averaging seven minutes to apply, easily understood and has good intra-rater test-retest reliability. Conclusion Our results suggest this adapted version of the FAQ is a reliable and stable tool which may be useful for assessing function in Brazilian elderly. Notwithstanding, the version should be subjected to further analysis with the aim of reaching functional equivalence. PMID:29213759
Failure analysis of solid rocket apogee motors

NASA Technical Reports Server (NTRS)

Martin, P. J.

1972-01-01

The analysis followed five selected motors through initial design, development, test, qualification, manufacture, and final flight reports. An audit was conducted at the manufacturing plants to complement the literature search with firsthand observations of the current philosophies and practices that affect reliability of the motors. A second literature search emphasized acquisition of spacecraft and satellite data bearing on solid motor reliability. It was concluded that present practices at the plants yield highly reliable flight hardware. Reliability can be further improved by new developments of aft-end bonding and initiator/igniter nondestructive test methods, a safe/arm device, and an insulation formulation. Minimum diagnostic instrumentation is recommended for all motor flights. Surplus motors should be used in margin testing. Criteria should be established for pressure and zone curing. The motor contractor should be represented at launch. New design analyses should be made of stretched motors and spacecraft/motor pairs.
A novel method to remotely measure food intake of free-living individuals in real time: the remote food photography method.

PubMed

Martin, Corby K; Han, Hongmei; Coulon, Sandra M; Allen, H Raymond; Champagne, Catherine M; Anton, Stephen D

2009-02-01

The aim of the present study was to report the first reliability and validity tests of the remote food photography method (RFPM), which consists of camera-enabled cell phones with data transfer capability. Participants take and transmit photographs of food selection and plate waste to researchers/clinicians for analysis. Following two pilot studies, adult participants (n 52; BMI 20-35 kg/m2 inclusive) were randomly assigned to the dine-in or take-out group. Energy intake (EI) was measured for 3 d. The dine-in group ate lunch and dinner in the laboratory. The take-out group ate lunch in the laboratory and dinner in free-living conditions (participants received a cooler with pre-weighed food that they returned the following morning). EI was measured with the RFPM and by directly weighing foods. The RFPM was tested in laboratory and free-living conditions. Reliability was tested over 3 d and validity was tested by comparing directly weighed EI to EI estimated with the RFPM using Bland-Altman analysis. The RFPM produced reliable EI estimates over 3 d in laboratory (r 0.62; P < 0.0001) and free-living (r 0.68; P < 0.0001) conditions. Weighed EI correlated highly with EI estimated with the RFPM in laboratory and free-living conditions (r>0.93; P < 0.0001). In two laboratory-based validity tests, the RFPM underestimated EI by - 4.7 % (P = 0.046) and - 5.5 % (P = 0.076). In free-living conditions, the RFPM underestimated EI by - 6.6 % (P = 0.017). Bias did not differ by body weight or age. The RFPM is a promising new method for accurately measuring the EI of free-living individuals. Error associated with the method is small compared with self-report methods.
Validity and reliability of balance assessment software using the Nintendo Wii balance board: usability and validation

PubMed Central

2014-01-01

Background A balance test provides important information such as the standard to judge an individual’s functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Methods Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). Results The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. Conclusion The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment. PMID:24912769
ASSOCIATIONS BETWEEN THREE CLINICAL ASSESSMENT TOOLS FOR POSTURAL STABILITY

PubMed Central

Saxion, Casie E.; Cameron, Kenneth L.; Gerber, J. Parry

2010-01-01

Study Design: Clinical Measurement, Correlation, Reliability Objectives: To assess the relationship between the Single Leg Balance (SLB), modified Balance Error Scoring System (mBESS), and modified Star Excursion Balance (mSEBT) tests and secondarily to assess inter-rater and test-retest reliability of these tests. Background: Ankle sprains often result in chronic instability and dysfunction. Several clinical tests assess postural deficits as a potential cause of this dysfunction; however, limited information exists pertaining to the relationship that these tests have with one another. Methods: Two independent examiners measured the performance of 34 healthy participants completing the SLB Test, mBESS test, and mSEBT at two different time periods. The relationship between tests was assessed using the Pearson Correlation and Fisher's Exact Tests. Inter-rater and test-retest reliability were assessed using the intraclass correlation coefficient (ICC) and Kappa statistics. Results: A significant correlation (r = -0.35) was observed between the mSEBT and the mBESS. Fisher's Exact Test showed a significant association between the SLB Test and mBESS (P = .048), but no association between the SLB and mSEBT (P = 1.000). Inter-rater reliability was excellent for the mSEBT and fair for the mBESS (ICCs of .91 and .61 respectively). Excellent agreement was observed between raters for the SLB test (k = 1.00). Test-retest reliability was excellent for the mSEBT (ICC = 0.98) and fair for the mBESS (ICC = 0.74). There was poor test-retest agreement for the SLB test (k = .211). Conclusion: There was a significant relationship observed between the SLB Test, mBESS test, and mSEBT: however; strength of association measures showed limited overlap between these tests. This suggests that these tests are interrelated but may not assess equal components of postural stability. PMID:21589668
On the reliability of Fusarium oxysporum f. sp. niveum research: Do we need standardized testing methods?

USDA-ARS?s Scientific Manuscript database

Fusarium oxysporum f. sp. nivium (Fon) is a pathogen highly variable in aggressiveness that requires a standardized testing method to more accurately define isolate aggressiveness (races) and to identify resistant watermelon lines. Isolates of Fon vary in aggressiveness from weakly to highly aggres...
A Novel Test Method for Fuel Thermal Stability

DTIC Science & Technology

1993-02-01

1 1.1. The Problem ................................................. . 1 1.2. The Innovations ...OPPORTUNITY ....................... 4 2.1. The Problem .................................................... 4 2.2. The Innovations and Opportunity for Sol...reliable instrument and test method to evaluate these fuels. 1.2. The Innovations The first innovation is the application of Fourier-Transform Infrared
Eddy current crack detection capability assessment approach using crack specimens with differing electrical conductivity

NASA Astrophysics Data System (ADS)

Koshti, Ajay M.

2018-03-01

Like other NDE methods, eddy current surface crack detectability is determined using probability of detection (POD) demonstration. The POD demonstration involves eddy current testing of surface crack specimens with known crack sizes. Reliably detectable flaw size, denoted by, a90/95 is determined by statistical analysis of POD test data. The surface crack specimens shall be made from a similar material with electrical conductivity close to the part conductivity. A calibration standard with electro-discharged machined (EDM) notches is typically used in eddy current testing for surface crack detection. The calibration standard conductivity shall be within +/- 15% of the part conductivity. This condition is also applicable to the POD demonstration crack set. Here, a case is considered, where conductivity of the crack specimens available for POD testing differs by more than 15% from that of the part to be inspected. Therefore, a direct POD demonstration of reliably detectable flaw size is not applicable. Additional testing is necessary to use the demonstrated POD test data. An approach to estimate the reliably detectable flaw size in eddy current testing for part made from material A using POD crack specimens made from material B with different conductivity is provided. The approach uses additional test data obtained on EDM notch specimens made from materials A and B. EDM notch test data from the two materials is used to create a transfer function between the demonstrated a90/95 size on crack specimens made of material B and the estimated a90/95 size for part made of material A. Two methods are given. For method A, a90/95 crack size for material B is given and POD data is available. Objective of method A is to determine a90/95 crack size for material A using the same relative decision threshold that was used for material B. For method B, target crack size a90/95 for material A is known. Objective is to determine decision threshold for inspecting material A.
An overview of the mathematical and statistical analysis component of RICIS

NASA Technical Reports Server (NTRS)

Hallum, Cecil R.

1987-01-01

Mathematical and statistical analysis components of RICIS (Research Institute for Computing and Information Systems) can be used in the following problem areas: (1) quantification and measurement of software reliability; (2) assessment of changes in software reliability over time (reliability growth); (3) analysis of software-failure data; and (4) decision logic for whether to continue or stop testing software. Other areas of interest to NASA/JSC where mathematical and statistical analysis can be successfully employed include: math modeling of physical systems, simulation, statistical data reduction, evaluation methods, optimization, algorithm development, and mathematical methods in signal processing.
An In vitro evaluation of the reliability of QR code denture labeling technique

PubMed Central

Poovannan, Sindhu; Jain, Ashish R.; Krishnan, Cakku Jalliah Venkata; Chandran, Chitraa R.

2016-01-01

Statement of Problem: Positive identification of the dead after accidents and disasters through labeled dentures plays a key role in forensic scenario. A number of denture labeling methods are available, and studies evaluating their reliability under drastic conditions are vital. Aim: This study was conducted to evaluate the reliability of QR (Quick Response) Code labeled at various depths in heat-cured acrylic blocks after acid treatment, heat treatment (burns), and fracture in forensics. It was an in vitro study. Materials and Methods: This study included 160 specimens of heat-cured acrylic blocks (1.8 cm × 1.8 cm) and these were divided into 4 groups (40 samples per group). QR Codes were incorporated in the samples using clear acrylic sheet and they were assessed for reliability under various depths, acid, heat, and fracture. Data were analyzed using Chi-square test, test of proportion. Results: The QR Code inclusion technique was reliable under various depths of acrylic sheet, acid (sulfuric acid 99%, hydrochloric acid 40%) and heat (up to 370°C). Results were variable with fracture of QR Code labeled acrylic blocks. Conclusion: Within the limitations of the study, by analyzing the results, it was clearly indicated that the QR Code technique was reliable under various depths of acrylic sheet, acid, and heat (370°C). Effectiveness varied in fracture and depended on the level of distortion. This study thus suggests that QR Code is an effective and simpler denture labeling method. PMID:28123284
Reliability and validity of a Chinese version of the Diagnostic Interview for Borderlines-Revised.

PubMed

Wang, Lanlan; Yuan, Chenmei; Qiu, Jianying; Gunderson, John; Zhang, Min; Jiang, Kaida; Leung, Freedom; Zhong, Jie; Xiao, Zeping

2014-09-01

Borderline personality disorder (BPD) is the most studied of the axis II disorders. One of the most widely used diagnostic instruments is the Diagnostic Interview for Borderline Patients-Revised (DIB-R). The aim of this study was to test the reliability and validity of DIB-R for use in the Chinese culture. The reliability and validity of the DIB-R Chinese version were assessed in a sample of 236 outpatients with a probable BPD diagnosis. The Structured Clinical Interview for DSM-IV Personality Disorders (SCID-II) was used as a standard. Test-retest reliability was tested six months later with 20 patients, and inter-rater reliability was tested on 32 patients. The Chinese version of the DIB-R showed good internal global consistency (Cronbach's α of 0.916), good test-retest reliability (Pearson correlation of 0.704), good inter-rater reliability (intra-class correlation coefficient of 0.892 and kappa of 0.861). When compared with the DSM-IV diagnosis as measured by the SCID-II, the DIB-R showed relatively good sensitivity (0.768) and specificity (0.891) at the cutoff of 7, moderate diagnostic convergence (kappa of 0.631), as well as good discriminating validity. The Chinese version of the DIB-R has good psychometric properties, which renders it a valuable method for examining the presence, the severity, and component phenotypes of BPD in Chinese samples. © 2013 Wiley Publishing Asia Pty Ltd.
Development and validation of a German version of the joint protection behavior assessment in patients with rheumatoid arthritis.

PubMed

Niedermann, K; Forster, A; Hammond, A; Uebelhart, D; de Bie, R

2007-03-15

Joint protection (JP) is an important part of the treatment concept for patients with rheumatoid arthritis (RA). The Joint Protection Behavior Assessment short form (JPBA-S) assesses the use of hand JP methods by patients with RA while preparing a hot drink. The purpose of this study was to develop a German version of the JPBA-S (D-JPBA-S) and to test its validity and reliability. A manual was developed through consensus with 8 occupational therapist (OT) experts as the reference for assessing patients' JP behavior. Twenty-four patients with RA and 10 healthy individuals were videotaped while performing 10 tasks reflecting the activity of preparing instant coffee. Recordings were repeated after 3 months for test-retest analysis. One rater assessed all available patient recordings (n = 23, recorded twice) for test-retest reliability. The video recordings of 10 randomly selected patients and all healthy individuals were independently assessed for interrater reliability by 6 OTs who were explicitly asked to follow the manual. Rasch analysis was performed to test construct validity and transform ordinal raw data into interval data for reliability calculations. Nine of the 10 tasks fit the Rasch model. The D-JPBA-S, consisting of 9 valid tasks, had an intraclass correlation coefficient of 0.77 for interrater reliability and 0.71 for test-retest reliability. The D-JPBA-S provides a valid and reliable instrument for assessing JP behavior of patients with RA and can be used in German-speaking countries.
Development of the adult PedsQL™ neurofibromatosis type 1 module: initial feasibility, reliability and validity.

PubMed

Nutakki, Kavitha; Hingtgen, Cynthia M; Monahan, Patrick; Varni, James W; Swigonski, Nancy L

2013-02-21

Neurofibromatosis type 1 (NF1) is a common autosomal dominant genetic disorder with significant impact on health-related quality of life (HRQOL). Research in understanding the pathogenetic mechanisms of neurofibroma development has led to the use of new clinical trials for the treatment of NF1. One of the most important outcomes of a trial is improvement in quality of life, however, no condition specific HRQOL instrument for NF1 exists. The objective of this study was to develop an NF1 HRQOL instrument as a module of PedsQL™ and to test for its initial feasibility, internal consistency reliability and validity in adults with NF1. The NF1 specific HRQOL instrument was developed using a standard method of PedsQL™ module development - literature review, focus group/semi-structured interviews, cognitive interviews and experts' review of initial draft, pilot testing and field testing. Field testing involved 134 adults with NF1. Feasibility was measured by the percentage of missing responses, internal consistency reliability was measured with Cronbach's alpha and validity was measured by the known-groups method. Feasibility, measured by the percentage of missing responses was 4.8% for all subscales on the adult version of the NF1-specific instrument. Internal consistency reliability for the Total Score (alpha =0.97) and subscale reliabilities ranging from 0.72 to 0.96 were acceptable for group comparisons. The PedsQL™ NF1 module distinguished between NF1 adults with excellent to very good, good, and fair to poor health status. The results demonstrate the initial feasibility, reliability and validity of the PedsQL™ NF1 module in adult patients. The PedsQL™ NF1 Module can be used to understand the multidimensional nature of NF1 on the HRQOL patients with this disorder.

An accelerated stress testing program for determining the reliability sensitivity of silicon solar cells to encapsulation and metallization systems

NASA Technical Reports Server (NTRS)

Lathrop, J. W.; Davis, C. W.; Royal, E.

1982-01-01

The use of accelerated testing methods in a program to determine the reliability attributes of terrestrial silicon solar cells is discussed. Different failure modes are to be expected when cells with and without encapsulation are subjected to accelerated testing and separate test schedules for each are described. Unencapsulated test cells having slight variations in metallization are used to illustrate how accelerated testing can highlight different diffusion related failure mechanisms. The usefulness of accelerated testing when applied to encapsulated cells is illustrated by results showing that moisture related degradation may be many times worse with some forms of encapsulation than with no encapsulation at all.
The Reliability, Validity, and Normative Data of Interpupillary Distance and Pupil Diameter Using Eye-Tracking Technology

PubMed Central

Murray, Nicholas P.; Hunfalvay, Melissa; Bolte, Takumi

2017-01-01

Purpose The purpose of this study was to determine the reliability of interpupillary distance (IPD) and pupil diameter (PD) measures using an infrared eye tracker and central point stimuli. Validity of the test compared to known clinical tools was determined, and normative data was established against which individuals can measure themselves. Methods Participants (416) across various demographics were examined for normative data. Of these, 50 were examined for reliability and validity. Validity for IPD measured the test (RightEye IPD/PD) against the PL850 Pupilometer and the Essilor Digital CRP. For PD, the test was measured against the Rosenbaum Pocket Vision Screener (RPVS). Reliability was analyzed with intraclass correlation coefficients (ICC) between trials with Cronbach's alpha (CA) and the standard error of measurement for each ICC. Convergent validity was investigated by calculating the bivariate correlation coefficient. Results Reliability results were strong (CA > 0.7) for all measures. High positive significant correlations were found between the RightEye IPD test and the PL850 Pupilometer (P < 0.001) and Essilor Digital CRP (P < 0.001) and for the RightEye PD test and the RPVS (P < 0.001). Conclusions Using infrared eye tracking and the RightEye IPD/PD test stimuli, reliable and accurate measures of IPD and PD were found. Results from normative data showed an adequate comparison for people with normal vision development. Translational Relevance Results revealed a central point of fixation may remove variability in examining PD reliably using infrared eye tracking when consistent environmental and experimental procedures are conducted. PMID:28685104
Reliability, precision, and gender differences in knee internal/external rotation proprioception measurements.

PubMed

Nagai, Takashi; Sell, Timothy C; Abt, John P; Lephart, Scott M

2012-11-01

To develop and assess the reliability and precision of knee internal/external rotation (IR/ER) threshold to detect passive motion (TTDPM) and determine if gender differences exist. Test-retest for the reliability/precision and cross-sectional for gender comparisons. University neuromuscular and human performance research laboratory. Ten subjects for the reliability and precision aim. Twenty subjects (10 males and 10 females) for gender comparisons. All TTDPM tests were performed using a multi-mode dynamometer. Subjects performed TTDPM at two knee positions (near IR or ER end-range). Intraclass correlation coefficient (ICC (3,k)) and standard error of measurement (SEM) were used to evaluate the reliability and precision. Independent t-tests were used to compare genders. TTDPM toward IR and ER at two knee positions. Intrasession and intersession reliability and precision were good (ICC=0.68-0.86; SEM=0.22°-0.37°). Females had significantly diminished TTDPM toward IR at IR-test position (males: 0.77°±0.14°, females: 1.18°±0.46°, p=0.021) and TTDPM toward IR at the ER-test position (males: 0.87°±0.13°, females: 1.36°±0.58°, p=0.026). No other significant gender differences were found (p>0.05). The current IR/ER TTDPM methods are reliable and accurate for the test-retest or cross-section research design. Gender differences were found toward IR where the ACL acts as the secondary restraint. Copyright © 2011 Elsevier Ltd. All rights reserved.
Reliability, Validity, and Sensitivity of a Novel Smartphone-Based Eccentric Hamstring Strength Test in Professional Football Players.

PubMed

Lee, Justin W Y; Cai, Ming-Jing; Yung, Patrick S H; Chan, Kai-Ming

2018-05-01

To evaluate the test-retest reliability, sensitivity, and concurrent validity of a smartphone-based method for assessing eccentric hamstring strength among male professional football players. A total of 25 healthy male professional football players performed the Chinese University of Hong Kong (CUHK) Nordic break-point test, hamstring fatigue protocol, and isokinetic hamstring strength test. The CUHK Nordic break-point test is based on a Nordic hamstring exercise. The Nordic break-point angle was defined as the maximum point where the participant could no longer support the weight of his body against gravity. The criterion for the sensitivity test was the presprinting and postsprinting difference of the Nordic break-point angle with a hamstring fatigue protocol. The hamstring fatigue protocol consists of 12 repetitions of the 30-m sprint with 30-s recoveries between sprints. Hamstring peak torque of the isokinetic hamstring strength test was used as the criterion for validity. A high test-retest reliability (intraclass correlation coefficient = .94; 95% confidence interval, .82-.98) was found in the Nordic break-point angle measurements. The Nordic break-point angle significantly correlated with isokinetic hamstring peak torques at eccentric action of 30°/s (r = .88, r 2 = .77, P < .001). The minimal detectable difference was 8.03°. The sensitivity of the measure was good enough that a significance difference (effect size = 0.70, P < .001) was found between presprinting and postsprinting values. The CUHK Nordic break-point test is a simple, portable, quick smartphone-based method to provide reliable and accurate eccentric hamstring strength measures among male professional football players.
Workplace-based assessment of communication skills: A pilot project addressing feasibility, acceptance and reliability

PubMed Central

Weyers, Simone; Jemi, Iman; Karger, André; Raski, Bianca; Rotthoff, Thomas; Pentzek, Michael; Mortsiefer, Achim

2016-01-01

Background: Imparting communication skills has been given great importance in medical curricula. In addition to standardized assessments, students should communicate with real patients in actual clinical situations during workplace-based assessments and receive structured feedback on their performance. The aim of this project was to pilot a formative testing method for workplace-based assessment. Our investigation centered in particular on whether or not physicians view the method as feasible and how high acceptance is among students. In addition, we assessed the reliability of the method. Method: As part of the project, 16 students held two consultations each with chronically ill patients at the medical practice where they were completing GP training. These consultations were video-recorded. The trained mentoring physician rated the student’s performance and provided feedback immediately following the consultations using the Berlin Global Rating scale (BGR). Two impartial, trained raters also evaluated the videos using BGR. For qualitative and quantitative analysis, information on how physicians and students viewed feasibility and their levels of acceptance was collected in written form in a partially standardized manner. To test for reliability, the test-retest reliability was calculated for both of the overall evaluations given by each rater. The inter-rater reliability was determined for the three evaluations of each individual consultation. Results: The formative assessment method was rated positively by both physicians and students. It is relatively easy to integrate into daily routines. Its significant value lies in the personal, structured and recurring feedback. The two overall scores for each patient consultation given by the two impartial raters correlate moderately. The degree of uniformity among the three raters in respect to the individual consultations is low. Discussion: Within the scope of this pilot project, only a small sample of physicians and students could be surveyed to a limited extent. There are indications that the assessment can be improved by integrating more information on medical context and student self-assessments. Despite the current limitations regarding test criteria, it is clear that workplace-based assessment of communication skills in the clinical setting is a valuable addition to the communication curricula of medical schools. PMID:27990466
Workplace-based assessment of communication skills: A pilot project addressing feasibility, acceptance and reliability.

PubMed

Weyers, Simone; Jemi, Iman; Karger, André; Raski, Bianca; Rotthoff, Thomas; Pentzek, Michael; Mortsiefer, Achim

2016-01-01

Background: Imparting communication skills has been given great importance in medical curricula. In addition to standardized assessments, students should communicate with real patients in actual clinical situations during workplace-based assessments and receive structured feedback on their performance. The aim of this project was to pilot a formative testing method for workplace-based assessment. Our investigation centered in particular on whether or not physicians view the method as feasible and how high acceptance is among students. In addition, we assessed the reliability of the method. Method: As part of the project, 16 students held two consultations each with chronically ill patients at the medical practice where they were completing GP training. These consultations were video-recorded. The trained mentoring physician rated the student's performance and provided feedback immediately following the consultations using the Berlin Global Rating scale (BGR). Two impartial, trained raters also evaluated the videos using BGR. For qualitative and quantitative analysis, information on how physicians and students viewed feasibility and their levels of acceptance was collected in written form in a partially standardized manner. To test for reliability, the test-retest reliability was calculated for both of the overall evaluations given by each rater. The inter-rater reliability was determined for the three evaluations of each individual consultation. Results: The formative assessment method was rated positively by both physicians and students. It is relatively easy to integrate into daily routines. Its significant value lies in the personal, structured and recurring feedback. The two overall scores for each patient consultation given by the two impartial raters correlate moderately. The degree of uniformity among the three raters in respect to the individual consultations is low. Discussion: Within the scope of this pilot project, only a small sample of physicians and students could be surveyed to a limited extent. There are indications that the assessment can be improved by integrating more information on medical context and student self-assessments. Despite the current limitations regarding test criteria, it is clear that workplace-based assessment of communication skills in the clinical setting is a valuable addition to the communication curricula of medical schools.
The establisment of an achievement test for determination of primary teachers’ knowledge level of earthquake

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aydin, Süleyman, E-mail: yupul@hotmail.com; Haşiloğlu, M. Akif, E-mail: mehmet.hasiloglu@hotmail.com; Kunduraci, Ayşe, E-mail: ayse-kndrc@hotmail.com

In this study it was aimed to improve an academic achievement test to establish the students’ knowledge about the earthquake and the ways of protection from earthquakes. In the method of this study, the steps that Webb (1994) was created to improve an academic achievement test for a unit were followed. In the developmental process of multiple choice test having 25 questions, was prepared to measure the pre-service teachers’ knowledge levels about the earthquake and the ways of protection from earthquakes. The multiple choice test was presented to view of six academics (one of them was from geographic field andmore » five of them were science educator) and two expert teachers in science Prepared test was applied to 93 pre-service teachers studying in elementary education department in 2014-2015 academic years. As a result of validity and reliability of the study, the test was composed of 20 items. As a result of these applications, Pearson Moments Multiplication half-reliability coefficient was found to be 0.94. When this value is adjusted according to Spearman Brown reliability coefficient the reliability coefficient was set at 0.97.« less
Design, application and testing of the Work Observation Method by Activity Timing (WOMBAT) to measure clinicians' patterns of work and communication.

PubMed

Westbrook, Johanna I; Ampt, Amanda

2009-04-01

Evidence regarding how health information technologies influence clinicians' patterns of work and support efficient practices is limited. Traditional paper-based data collection methods are unable to capture clinical work complexity and communication patterns. The use of electronic data collection tools for such studies is emerging yet is rarely assessed for reliability or validity. Our aim was to design, apply and test an observational method which incorporated the use of an electronic data collection tool for work measurement studies which would allow efficient, accurate and reliable data collection, and capture greater degrees of work complexity than current approaches. We developed an observational method and software for personal digital assistants (PDAs) which captures multiple dimensions of clinicians' work tasks, namely what task, with whom, and with what; tasks conducted in parallel (multi-tasking); interruptions and task duration. During field-testing over 7 months across four hospital wards, fifty-two nurses were observed for 250 h. Inter-rater reliability was tested and validity was measured by (i) assessing whether observational data reflected known differences in clinical role work tasks and (ii) by comparing observational data with participants' estimates of their task time distribution. Observers took 15-20 h of training to master the method and data collection process. Only 1% of tasks observed did not match the classification developed and were classified as 'other'. Inter-rater reliability scores of observers were maintained at over 85%. The results discriminated between the work patterns of enrolled and registered nurses consistent with differences in their roles. Survey data (n=27) revealed consistent ratings of tasks by nurses, and their rankings of most to least time-consuming tasks were significantly correlated with those derived from the observational data. Over 40% of nurses' time was spent in direct care or professional communication, with 11.8% of time spent multi-tasking. Nurses were interrupted approximately every 49 min. One quarter of interruptions occurred while nurses were preparing or administering medications. This method efficiently produces reliable and valid data. The multi-dimensional nature of the data collected provides greater insights into patterns of clinicians' work and communication than has previously been possible using other methods.
Reliability and Validity of Isometric Knee Extensor Strength Test With Hand-Held Dynamometer Depending on Its Fixation: A Pilot Study

PubMed Central

Kim, Won Kuel; Seo, Kyung Mook; Kang, Si Hyun

2014-01-01

Objective To determine the reliability and validity of hand-held dynamometer (HHD) depending on its fixation in measuring isometric knee extensor strength by comparing the results with an isokinetic dynamometer. Methods Twenty-seven healthy female volunteers participated in this study. The subjects were tested in seated and supine position using three measurement methods: isometric knee extension by isokinetic dynamometer, non-fixed HHD, and fixed HHD. During the measurement, the knee joints of subjects were fixed at a 35° angle from the extended position. The fixed HHD measurement was conducted with the HHD fixed to distal tibia with a Velcro strap; non-fixed HHD was performed with a hand-held method without Velcro fixation. All the measurements were repeated three times and among them, the maximum values of peak torque were used for the analysis. Results The data from the fixed HHD method showed higher validity than the non-fixed method compared with the results of the isokinetic dynamometer. Pearson correlation coefficients (r) between fixed HHD and isokinetic dynamometer method were statistically significant (supine-right: r=0.806, p<0.05; seating-right: r=0.473, p<0.05; supine-left: r=0.524, p<0.05), whereas Pearson correlation coefficients between non-fixed dynamometer and isokinetic dynamometer methods were not statistically significant, except for the result of the supine position of the left leg (r=0.384, p<0.05). Both fixed and non-fixed HHD methods showed excellent inter-rater reliability. However, the fixed HHD method showed a higher reliability than the non-fixed HHD method by considering the intraclass correlation coefficient (fixed HHD, 0.952-0.984; non-fixed HHD, 0.940-0.963). Conclusion Fixation of HHD during measurement in the supine position increases the reliability and validity in measuring the quadriceps strength. PMID:24639931
Simulation-Based Acceptance Testing for Unmanned Ground Vehicles

DTIC Science & Technology

2011-05-12

Ground Robotic Reliability Center (GRRC) at the University of Michigan in 2010, the focus of his research has been on unmanned ground vehicles...Jong Lee is a former student of the University of Michigan’s Ground Robotics Reliability Center (GRRC). He received his Bachelor’s and Master’s degree...methods to improve reliability of Unmanned Ground Vehicle (UGV) systems. His primary research interests include robotic systems and control
Development Of Methodologies Using PhabrOmeter For Fabric Drape Evaluation

NASA Astrophysics Data System (ADS)

Lin, Chengwei

Evaluation of fabric drape is important for textile industry as it reveals the aesthetic and functionality of the cloth and apparel. Although many fabric drape measuring methods have been developed for several decades, they are falling behind the need for fast product development by the industry. To meet the requirement of industries, it is necessary to develop an effective and reliable method to evaluate fabric drape. The purpose of the present study is to determine if PhabrOmeter can be applied to fabric drape evaluation. PhabrOmeter is a fabric sensory performance evaluating instrument which is developed to provide fast and reliable quality testing results. This study was sought to determine the relationship between fabric drape and other fabric attributes. In addition, a series of conventional methods including AATCC standards, ASTM standards and ISO standards were used to characterize the fabric samples. All the data were compared and analyzed with linear correlation method. The results indicate that PhabrOmeter is reliable and effective instrument for fabric drape evaluation. Besides, some effects including fabric structure, testing directions were considered to examine their impact on fabric drape.
Test-Retest Reliability of Diffusion Tensor Imaging in Huntington's Disease.

PubMed

Cole, James H; Farmer, Ruth E; Rees, Elin M; Johnson, Hans J; Frost, Chris; Scahill, Rachael I; Hobbs, Nicola Z

2014-03-21

Diffusion tensor imaging (DTI) has shown microstructural abnormalities in patients with Huntington's Disease (HD) and work is underway to characterise how these abnormalities change with disease progression. Using methods that will be applied in longitudinal research, we sought to establish the reliability of DTI in early HD patients and controls. Test-retest reliability, quantified using the intraclass correlation coefficient (ICC), was assessed using region-of-interest (ROI)-based white matter atlas and voxelwise approaches on repeat scan data from 22 participants (10 early HD, 12 controls). T1 data was used to generate further ROIs for analysis in a reduced sample of 18 participants. The results suggest that fractional anisotropy (FA) and other diffusivity metrics are generally highly reliable, with ICCs indicating considerably lower within-subject compared to between-subject variability in both HD patients and controls. Where ICC was low, particularly for the diffusivity measures in the caudate and putamen, this was partly influenced by outliers. The analysis suggests that the specific DTI methods used here are appropriate for cross-sectional research in HD, and give confidence that they can also be applied longitudinally, although this requires further investigation. An important caveat for DTI studies is that test-retest reliability may not be evenly distributed throughout the brain whereby highly anisotropic white matter regions tended to show lower relative within-subject variability than other white or grey matter regions.
Concordance of DSM-IV Axis I and II diagnoses by personal and informant's interview.

PubMed

Schneider, Barbara; Maurer, Konrad; Sargk, Dieter; Heiskel, Harald; Weber, Bernhard; Frölich, Lutz; Georgi, Klaus; Fritze, Jürgen; Seidler, Andreas

2004-06-30

The validity and reliability of using psychological autopsies to diagnose a psychiatric disorder is a critical issue. Therefore, interrater and test-retest reliability of the Structured Clinical Interview for DSM-IV Axis I and Personality Disorders and the usefulness of these instruments for the psychological autopsy method were investigated. Diagnoses by informant's interview were compared with diagnoses generated by a personal interview of 35 persons. Interrater reliability and test-retest reliability were assessed in 33 and 29 persons, respectively. Chi-square analysis, kappa and intraclass correlation coefficients, and Kendall's tau were used to determine agreement of diagnoses. Kappa coefficients were above 0.84 for substance-related disorders, mood disorders, and anxiety and adjustment disorders, and above 0.65 for Axis II disorders for interrater and test-retest reliability. Agreement by personal and relative's interview generated kappa coefficients above 0.79 for most Axis I and above 0.65 for most personality disorder diagnoses; Kendall's tau for dimensional individual personality disorder scores ranged from 0.22 to 0.72. Despite of a small number of psychiatric disorders in the selected population, the present results provide support for the validity of most diagnoses obtained through the best-estimate method using the Structured Clinical Interview for DSM-IV Axis I and Personality Disorders. This instrument can be recommended as a tool for the psychological autopsy procedure in post-mortem research. Copyright 2004 Elsevier Ireland Ltd.
Scoring Method of a Situational Judgment Test: Influence on Internal Consistency Reliability, Adverse Impact and Correlation with Personality?

ERIC Educational Resources Information Center

De Leng, W. E.; Stegers-Jager, K. M.; Husbands, A.; Dowell, J. S.; Born, M. Ph.; Themmen, A. P.

2017-01-01

Situational Judgment Tests (SJTs) are increasingly used for medical school selection. Scoring an SJT is more complicated than scoring a knowledge test, because there are no objectively correct answers. The scoring method of an SJT may influence the construct and concurrent validity and the adverse impact with respect to non-traditional students.…
Comparison of Susceptibility Testing of Mycobacterium tuberculosis Using the ESP Culture System II with That Using the BACTEC Method

PubMed Central

Ruiz, P.; Zerolo, F. J.; Casal, M. J.

2000-01-01

The ESP Culture System II was evaluated for its capacity to test the susceptibility of 389 cultures of Mycobacterium tuberculosis to streptomycin, rifampin, ethambutol, and isoniazid. Good agreement with results with the BACTEC TB 460 was found. ESP II is a reliable, rapid, and automated method for performing susceptibility testing. PMID:11101619
Test-retest and between-site reliability in a multicenter fMRI study.

PubMed

Friedman, Lee; Stern, Hal; Brown, Gregory G; Mathalon, Daniel H; Turner, Jessica; Glover, Gary H; Gollub, Randy L; Lauriello, John; Lim, Kelvin O; Cannon, Tyrone; Greve, Douglas N; Bockholt, Henry Jeremy; Belger, Aysenil; Mueller, Bryon; Doty, Michael J; He, Jianchun; Wells, William; Smyth, Padhraic; Pieper, Steve; Kim, Seyoung; Kubicki, Marek; Vangel, Mark; Potkin, Steven G

2008-08-01

In the present report, estimates of test-retest and between-site reliability of fMRI assessments were produced in the context of a multicenter fMRI reliability study (FBIRN Phase 1, www.nbirn.net). Five subjects were scanned on 10 MRI scanners on two occasions. The fMRI task was a simple block design sensorimotor task. The impulse response functions to the stimulation block were derived using an FIR-deconvolution analysis with FMRISTAT. Six functionally-derived ROIs covering the visual, auditory and motor cortices, created from a prior analysis, were used. Two dependent variables were compared: percent signal change and contrast-to-noise-ratio. Reliability was assessed with intraclass correlation coefficients derived from a variance components analysis. Test-retest reliability was high, but initially, between-site reliability was low, indicating a strong contribution from site and site-by-subject variance. However, a number of factors that can markedly improve between-site reliability were uncovered, including increasing the size of the ROIs, adjusting for smoothness differences, and inclusion of additional runs. By employing multiple steps, between-site reliability for 3T scanners was increased by 123%. Dropping one site at a time and assessing reliability can be a useful method of assessing the sensitivity of the results to particular sites. These findings should provide guidance toothers on the best practices for future multicenter studies.
Measuring the Characteristic Topography of Brain Stiffness with Magnetic Resonance Elastography

PubMed Central

Murphy, Matthew C.; Huston, John; Jack, Clifford R.; Glaser, Kevin J.; Senjem, Matthew L.; Chen, Jun; Manduca, Armando; Felmlee, Joel P.; Ehman, Richard L.

2013-01-01

Purpose To develop a reliable magnetic resonance elastography (MRE)-based method for measuring regional brain stiffness. Methods First, simulation studies were used to demonstrate how stiffness measurements can be biased by changes in brain morphometry, such as those due to atrophy. Adaptive postprocessing methods were created that significantly reduce the spatial extent of edge artifacts and eliminate atrophy-related bias. Second, a pipeline for regional brain stiffness measurement was developed and evaluated for test-retest reliability in 10 healthy control subjects. Results This technique indicates high test-retest repeatability with a typical coefficient of variation of less than 1% for global brain stiffness and less than 2% for the lobes of the brain and the cerebellum. Furthermore, this study reveals that the brain possesses a characteristic topography of mechanical properties, and also that lobar stiffness measurements tend to correlate with one another within an individual. Conclusion The methods presented in this work are resistant to noise- and edge-related biases that are common in the field of brain MRE, demonstrate high test-retest reliability, and provide independent regional stiffness measurements. This pipeline will allow future investigations to measure changes to the brain’s mechanical properties and how they relate to the characteristic topographies that are typical of many neurologic diseases. PMID:24312570
Analyzing Test-As-You-Fly Single Event Upset (SEU) Responses using SEU Data, Classical Reliability Models, and Space Environment Data

NASA Technical Reports Server (NTRS)

Berg, Melanie; Label, Kenneth; Campola, Michael; Xapsos, Michael

2017-01-01

We propose a method for the application of single event upset (SEU) data towards the analysis of complex systems using transformed reliability models (from the time domain to the particle fluence domain) and space environment data.
Reliability of cervical lordosis measurement techniques on long-cassette radiographs.

PubMed

Janusz, Piotr; Tyrakowski, Marcin; Yu, Hailong; Siemionow, Kris

2016-11-01

Lateral radiographs are commonly used to assess cervical sagittal alignment. Three assessment methods have been described and are commonly utilized in clinical practice. These methods are described for perfect lateral cervical radiographs, however in everyday practice radiograph quality varies. The aim of this study was to compare the reliability and reproducibility of 3 cervical lordosis (CL) measurement methods. Forty-four standing lateral radiographs were randomly chosen from a lateral long-cassette radiograph database. Measurements of CL were performed with: Cobb method C2-C7 (CM), C2-C7 posterior tangent method (PTM), sum of posterior tangent method for each segment (SPTM). Three independent orthopaedic surgeons measured CL using the three methods on 44 lateral radiographs. One researcher used the three methods to measured CL three times at 4-week time intervals. Agreement between the methods as well as their intra- and interobserver reliability were tested and quantified by intraclass correlation coefficient (ICC) and median error for a single measurement (SEM). ICC of 0.75 or more reflected an excellent agreement/reliability. The results were compared with repeated ANOVA test, with p < 0.05 considered as significant. All methods revealed excellent intra- and interobserver reliability. Agreement (ICC, SEM) between three methods was (0.89°, 3.44°), between CM and SPTM was (0.82°, 4.42°), between CM and PTM was (0.80°, 4.80°) and between PTM and SPTM was (0.99°, 1.10°). Mean values CL for a CM, PTM, SPTM were 10.5° ± 13.9°, 17.5° ± 15.6° and 17.7° ± 15.9° (p < 0.0001), respectively. The significant difference was between CM vs PTM (p < 0.0001) and CM vs SPTM (p < 0.0001), but not between PTM vs SPTM (p > 0.05). All three methods appeared to be highly reliable. Although, high agreement between all measurement methods was shown, we do not recommend using Cobb measurement method interchangeably with PTM or SPTM within a single study as this could lead to error, whereas, such a comparison between tangent methods can be considered.
Iosipescu shear properties of graphite fabric/epoxy composite laminates

NASA Technical Reports Server (NTRS)

Walrath, D. E.; Adams, D. F.

1985-01-01

The Iosipescu shear test method is used to measure the in-plane and interlaminar shear properties of four T300 graphite fabric/934 epoxy composite materials. Different weave geometries tested include an Oxford weave, a 5-harness satin weave, an 8-harness satin weave, and a plain weave with auxiliary warp yarns. Both orthogonal and quasi-isotropic layup laminates were tested. In-plane and interlaminar shear properties are obtained for laminates of all four fabric types. Overall, little difference in shear properties attributable to the fabric weave pattern is observed. The auxiliary warp material is significantly weaker and less stiff in interlaminar shear parallel to its fill direction. A conventional strain gage extensometer is modified to measure shear strains for use with the Iosipescu shear test. While preliminary results are encouraging, several design iterations failed to produce a reliable shear transducer prototype. Strain gages are still the most reliable shear strain transducers for use with this test method.

Validity and test-retest reliability in assessing current body size with figure drawings in Chinese adolescents.

PubMed

Lo, Wing-Sze; Ho, Sai-Yin; Wong, Bonny Yee-Man; Mak, Kwok-Kei; Lam, Tai-Hing

2011-06-01

The reliability and validity of Stunkard's Figure Rating Scale (FRS) as a measure of current body size (CBS) was established in Western adolescent girls but not in non-Western population. We examined the validity and test-retest reliability of Stunkard's FRS in assessing CBS among Chinese adolescents. Methods. In a school-based survey in Hong Kong, 5666 adolescents (boys: 45.1%; mean age 14.7 years) provided data on self-reported height and weight, CBS, perceived weight status, and health-related quality of life using the Medical Outcomes Study Short-Form version 2 (SF-12v2). Height and weight were also objectively measured. Spearman's correlation was used to assess construct validity, concurrent validity and test-retest reliability. Convergent and discriminant validity were good: CBS correlated strongly with weight and self-reported/measured BMI, but only weakly with SF-12v2. CBS correlated strongly with perceived weight status, showing concurrent validity. Spearman's correlation (r) for CBS was 0.78 for girls and 0.72 for boys indicating good test-retest reliability. Validity and reliability results did not differ significantly between senior and junior grade adolescents. Our findings support the use of Stunkard's FRS to measure body size among Chinese adolescents.
A study of fault prediction and reliability assessment in the SEL environment

NASA Technical Reports Server (NTRS)

Basili, Victor R.; Patnaik, Debabrata

1986-01-01

An empirical study on estimation and prediction of faults, prediction of fault detection and correction effort, and reliability assessment in the Software Engineering Laboratory environment (SEL) is presented. Fault estimation using empirical relationships and fault prediction using curve fitting method are investigated. Relationships between debugging efforts (fault detection and correction effort) in different test phases are provided, in order to make an early estimate of future debugging effort. This study concludes with the fault analysis, application of a reliability model, and analysis of a normalized metric for reliability assessment and reliability monitoring during development of software.
A novel evaluation strategy for fatigue reliability of flexible nanoscale films

NASA Astrophysics Data System (ADS)

Zheng, Si-Xue; Luo, Xue-Mei; Wang, Dong; Zhang, Guang-Ping

2018-03-01

In order to evaluate fatigue reliability of nanoscale metal films on flexible substrates, here we proposed an effective evaluation way to obtain critical fatigue cracking strain based on the direct observation of fatigue damage sites through conventional dynamic bending testing technique. By this method, fatigue properties and damage behaviors of 930 nm-thick Au films and 600 nm-thick Mo-W multilayers with individual layer thickness 100 nm on flexible polyimide substrates were investigated. Coffin-Manson relationship between the fatigue life and the applied strain range was obtained for the Au films and Mo-W multilayers. The characterization of fatigue damage behaviors verifies the feasibility of this method, which seems easier and more effective comparing with the other testing methods.
Three-dimensional assessment of the asymptomatic and post-stroke shoulder: intra-rater test-retest reliability and within-subject repeatability of the palpation and digitization approach.

PubMed

Pain, Liza A M; Baker, Ross; Sohail, Qazi Zain; Richardson, Denyse; Zabjek, Karl; Mogk, Jeremy P M; Agur, Anne M R

2018-03-23

Altered three-dimensional (3D) joint kinematics can contribute to shoulder pathology, including post-stroke shoulder pain. Reliable assessment methods enable comparative studies between asymptomatic shoulders of healthy subjects and painful shoulders of post-stroke subjects, and could inform treatment planning for post-stroke shoulder pain. The study purpose was to establish intra-rater test-retest reliability and within-subject repeatability of a palpation/digitization protocol, which assesses 3D clavicular/scapular/humeral rotations, in asymptomatic and painful post-stroke shoulders. Repeated measurements of 3D clavicular/scapular/humeral joint/segment rotations were obtained using palpation/digitization in 32 asymptomatic and six painful post-stroke shoulders during four reaching postures (rest/flexion/abduction/external rotation). Intra-class correlation coefficients (ICCs), standard error of the measurement and 95% confidence intervals were calculated. All ICC values indicated high to very high test-retest reliability (≥0.70), with lower reliability for scapular anterior/posterior tilt during external rotation in asymptomatic subjects, and scapular medial/lateral rotation, humeral horizontal abduction/adduction and axial rotation during abduction in post-stroke subjects. All standard error of measurement values demonstrated within-subject repeatability error ≤5° for all clavicular/scapular/humeral joint/segment rotations (asymptomatic ≤3.75°; post-stroke ≤5.0°), except for humeral axial rotation (asymptomatic ≤5°; post-stroke ≤15°). This noninvasive, clinically feasible palpation/digitization protocol was reliable and repeatable in asymptomatic shoulders, and in a smaller sample of painful post-stroke shoulders. Implications for Rehabilitation In the clinical setting, a reliable and repeatable noninvasive method for assessment of three-dimensional (3D) clavicular/scapular/humeral joint orientation and range of motion (ROM) is currently required. The established reliability and repeatability of this proposed palpation/digitization protocol will enable comparative 3D ROM studies between asymptomatic and post-stroke shoulders, which will further inform treatment planning. Intra-rater test-retest repeatability, which is measured by the standard error of the measure, indicates the range of error associated with a single test measure. Therefore, clinicians can use the standard error of the measure to determine the "true" differences between pre-treatment and post-treatment test scores.
Reliability of a science admission test (HAM-Nat) at Hamburg medical school

PubMed Central

Hissbach, Johanna; Klusmann, Dietrich; Hampe, Wolfgang

2011-01-01

Objective: The University Hospital in Hamburg (UKE) started to develop a test of knowledge in natural sciences for admission to medical school in 2005 (Hamburger Auswahlverfahren für Medizinische Studiengänge, Naturwissenschaftsteil, HAM-Nat). This study is a step towards establishing the HAM-Nat. We are investigating parallel forms reliability, the effect of a crash course in chemistry on test results, and correlations of HAM-Nat test results with a test of scientific reasoning (similar to a subtest of the "Test for Medical Studies", TMS). Methods: 316 first-year students participated in the study in 2007. They completed different versions of the HAM-Nat test which consisted of items that had already been used (HN2006) and new items (HN2007). Four weeks later half of the participants were tested on the HN2007 version of the HAM-Nat again, while the other half completed the test of scientific reasoning. Within this four week interval students were offered a five day chemistry course. Results: Parallel forms reliability for four different test versions ranged from rtt=.53 to rtt=.67. The retest reliabilities of the HN2007 halves were rtt=.54 and rtt =.61. Correlations of the two HAM-Nat versions with the test of scientific reasoning were r=.34 und r=.21. The crash course in chemistry had no effect on HAM-Nat scores. Conclusions: The results suggest that further versions of the test of natural sciences will not easily conform to the standards of internal consistency, parallel-forms reliability and retest reliability. Much care has to be taken in order to assemble items which could be used interchangeably for the construction of new test versions. The test of scientific reasoning and the HAM-Nat are tapping different constructs. Participation in a chemistry course did not improve students’ achievement, probably because the content of the course was not coordinated with the test and many students lacked of motivation to do well in the second test. PMID:21866246
Thermal Cycling Life Prediction of Sn-3.0Ag-0.5Cu Solder Joint Using Type-I Censored Data

PubMed Central

Mi, Jinhua; Yang, Yuan-Jian; Huang, Hong-Zhong

2014-01-01

Because solder joint interconnections are the weaknesses of microelectronic packaging, their reliability has great influence on the reliability of the entire packaging structure. Based on an accelerated life test the reliability assessment and life prediction of lead-free solder joints using Weibull distribution are investigated. The type-I interval censored lifetime data were collected from a thermal cycling test, which was implemented on microelectronic packaging with lead-free ball grid array (BGA) and fine-pitch ball grid array (FBGA) interconnection structures. The number of cycles to failure of lead-free solder joints is predicted by using a modified Engelmaier fatigue life model and a type-I censored data processing method. Then, the Pan model is employed to calculate the acceleration factor of this test. A comparison of life predictions between the proposed method and the ones calculated directly by Matlab and Minitab is conducted to demonstrate the practicability and effectiveness of the proposed method. At last, failure analysis and microstructure evolution of lead-free solders are carried out to provide useful guidance for the regular maintenance, replacement of substructure, and subsequent processing of electronic products. PMID:25121138
[Modification and evaluation of assessment of  medication literacy].

PubMed

Zheng, Feng; Zhong, Zhuqing; Ding, Siqing; Luo, Aijing; Liu, Zina

2016-11-28

To translate and revise the Medication Literacy Assessment in English (MedLitRxSE-English) and evaluate its validity and reliability.  Methods: We introduced MedLitRxSE-English from abroad. According to the principles of Brislin and culture adjustment, we revised it as a Chinese edition. Using random sampling method, from Oct, 2014 to Jan, 2015, 461 non-hospitalized patients from the outpatient departments of the top three hospitals in Changsha city were investigated. The reliability and validity of the scale was tested.  Results: The test-retest reliability of the Chinese version for medication literacy scale was 0.885; the split reliability was 0.840; K-R was 0.820; the correlations between the assessment of medication literacy and the corresponding items were 0.427-0.587; the confirmatory factor analysis revealed overall good fit. Root mean square error of approximation (RMSEA), χ2/df, goodness of fit index (GFI) and comparative fit index (CFI) was 0.08, 3.06, 0.91 and 0.94, respectively.   Conclusion: The Chinese version for the assessment of medication literacy is in good reliability and validity, and it can be used to evaluate the medication literacy in our country.
A Simplified and Reliable Damage Method for the Prediction of the Composites Pieces

NASA Astrophysics Data System (ADS)

Viale, R.; Coquillard, M.; Seytre, C.

2012-07-01

Structural engineers are often faced to test results on composite structures largely tougher than predicted. By attempting to reduce this frequent gap, a survey of some extensive synthesis works relative to the prediction methods and to the failure criteria was led. This inquiry dealts with the plane stress state only. All classical methods have strong and weak points wrt practice and reliability aspects. The main conclusion is that in the plane stress case, the best usaul industrial methods give predictions rather similar. But very generally they do not explain the often large discrepancies wrt the tests, mainly in the cases of strong stress gradients or of bi-axial laminate loadings. It seems that only the methods considering the complexity of the composites damages (so-called physical methods or Continuum Damage Mechanics “CDM”) bring a clear mending wrt the usual methods..The only drawback of these methods is their relative intricacy mainly in urged industrial conditions. A method with an approaching but simplified representation of the CDM phenomenology is presented. It was compared to tests and other methods: - it brings a fear improvement of the correlation with tests wrt the usual industrial methods, - it gives results very similar to the painstaking CDM methods and very close to the test results. Several examples are provided. In addition this method is really thrifty wrt the material characterization as well as for the modelisation and the computation efforts.
Reliability and Validity of Gaze-Dependent Functional Vision Space: A Novel Metric Quantifying Visual Function in Infantile Nystagmus Syndrome.

PubMed

Roberts, Tawna L; Kester, Kristi N; Hertle, Richard W

2018-04-01

This study presents test-retest reliability of optotype visual acuity (OVA) across 60° of horizontal gaze position in patients with infantile nystagmus syndrome (INS). Also, the validity of the metric gaze-dependent functional vision space (GDFVS) is shown in patients with INS. In experiment 1, OVA was measured twice in seven horizontal gaze positions from 30° left to right in 10° steps in 20 subjects with INS and 14 without INS. Test-retest reliability was assessed using intraclass correlation coefficient (ICC) in each gaze. OVA area under the curve (AUC) was calculated with horizontal eye position on the x-axis, and logMAR visual acuity on the y-axis and then converted to GDFVS. In experiment 2, validity of GDFVS was determined over 40° horizontal gaze by applying the 95% limits of agreement from experiment 1 to pre- and post-treatment GDFVS values from 85 patients with INS. In experiment 1, test-retest reliability for OVA was high (ICC ≥ 0.88) as the difference in test-retest was on average less than 0.1 logMAR in each gaze position. In experiment 2, as a group, INS subjects had a significant increase (P < 0.001) in the size of their GDFVS that exceeded the 95% limits of agreement found during test-retest. OVA is a reliable measure in INS patients across 60° of horizontal gaze position. GDFVS is a valid clinical method to be used to quantify OVA as a function of eye position in INS patients. This method captures the dynamic nature of OVA in INS patients and may be a valuable measure to quantify visual function patients with INS, particularly in quantifying change as part of clinical studies.
Prospective patients rate practice factors: development of a questionnaire.

PubMed

St Louis, Brian Lingg; Firestone, Allen R; Johnston, William; Shanker, Shiva; Vig, Katherine W L

2011-02-01

The importance that prospective patients place on practice characteristics when choosing an orthodontic practice has not been extensively reported. The objective of this research was to develop a valid and reliable questionnaire to address the relative importance of orthodontic office and doctor characteristics for prospective patients or parents of child patients during the initial orthodontic office consultation. An initial questionnaire, based on published literature, was field-tested on 16 subjects to assess its validity. Based on the field test, the questionnaire was modified and tested for reliability by using a test-retest method. The questionnaire covered the following areas: doctor, office, staff, and finances. The reliability study included 2 groups of subjects: 12 consecutive prospective adult patients and 41 consecutive parents of prospective child patients. The questionnaires consisted of 43 and 50 questions for the adult patients and the parents of patients, respectively. The subjects rated the importance of practice characteristics in their selection of an orthodontic practice using a 100-mm visual analog scale anchored at "not important at all" and "most important." Reliability was analyzed by using the intraclass correlation coefficient (ICC). Summary scores of all 53 subjects showed excellent reliability (ICC, 0.88; range, 0.61-1.0). Summary scores of all 50 questions showed acceptable reliability (ICC, 0.70; range, 0.45-0.88). Twenty-one questions had excellent reliability (ICC, >.75), and 29 questions had fair-to-good reliability (ICC, 0.41-0.75). No questions showed poor reliability (ICC, <0.4). The pilot study data indicated that the overall reliability of the questionnaire is acceptable. Copyright © 2011 American Association of Orthodontists. Published by Mosby, Inc. All rights reserved.
Validation of highly reliable, real-time knowledge-based systems

NASA Technical Reports Server (NTRS)

Johnson, Sally C.

1988-01-01

Knowledge-based systems have the potential to greatly increase the capabilities of future aircraft and spacecraft and to significantly reduce support manpower needed for the space station and other space missions. However, a credible validation methodology must be developed before knowledge-based systems can be used for life- or mission-critical applications. Experience with conventional software has shown that the use of good software engineering techniques and static analysis tools can greatly reduce the time needed for testing and simulation of a system. Since exhaustive testing is infeasible, reliability must be built into the software during the design and implementation phases. Unfortunately, many of the software engineering techniques and tools used for conventional software are of little use in the development of knowledge-based systems. Therefore, research at Langley is focused on developing a set of guidelines, methods, and prototype validation tools for building highly reliable, knowledge-based systems. The use of a comprehensive methodology for building highly reliable, knowledge-based systems should significantly decrease the time needed for testing and simulation. A proven record of delivering reliable systems at the beginning of the highly visible testing and simulation phases is crucial to the acceptance of knowledge-based systems in critical applications.
Statistical complex fatigue data for SAE 4340 steel and its use in design by reliability

NASA Technical Reports Server (NTRS)

Kececioglu, D.; Smith, J. L.

1970-01-01

A brief description of the complex fatigue machines used in the test program is presented. The data generated from these machines are given and discussed. Two methods of obtaining strength distributions from the data are also discussed. Then follows a discussion of the construction of statistical fatigue diagrams and their use in designing by reliability. Finally, some of the problems encountered in the test equipment and a corrective modification are presented.
Reproducibility assessment of brain responses to visual food stimuli in adults with overweight and obesity

PubMed Central

Sayer, R Drew; Tamer, Gregory G; Chen, Ningning; Tregellas, Jason R; Cornier, Marc-Andre; Kareken, David A; Talavage, Thomas M; McCrory, Megan A; Campbell, Wayne W

2016-01-01

Objective The brain’s reward system influences ingestive behavior and subsequently, obesity risk. Functional magnetic resonance imaging (fMRI) is a common method for investigating brain reward function. We sought to assess the reproducibility of fasting-state brain responses to visual food stimuli using BOLD fMRI. Methods A priori brain regions of interest included bilateral insula, amygdala, orbitofrontal cortex, caudate, and putamen. Fasting-state fMRI and appetite assessments were completed by 28 women (n=16) and men (n=12) with overweight or obesity on 2 days. Reproducibility was assessed by comparing mean fasting-state brain responses and measuring test-retest reliability of these responses on the 2 testing days. Results Mean fasting-state brain responses on Day 2 were reduced compared to Day 1 in the left insula and right amygdala, but mean Day 1 and Day 2 responses were not different in the other regions of interest. With the exception of the left orbitofrontal cortex response (fair reliability), test-retest reliabilities of brain responses were poor or unreliable. Conclusion fMRI-measured responses to visual food cues in adults with overweight or obesity show relatively good mean-level reproducibility, but considerable within-subject variability. Poor test-retest reliability reduces the likelihood of observing true correlations and increases the necessary sample sizes for studies. PMID:27542906
Reproducibility of manual pressure force on provocation of the sacroiliac joint.

PubMed

Levin, U; Nilsson-Wikmar, L; Stenström, C H; Lundeberg, T

1998-01-01

Previous studies of pain-provocation sacroiliac (SI) joint tests have revealed conflicting results. The aim of the present study was to evaluate the intra- and inter-test reliability of pressure force applied during distraction test, compression test and pressure on the apex sacralis. Seventeen physiotherapists (PTs), median age 43 years and median clinical experience 11 years, all experienced in musculoskeletal evaluation and therapy, participated in the study. Each PT performed each test on the same healthy volunteer for 20 s, on three separate occasions, at intervals of one week using a specially constructed examination table which registered pressure force. The PTs were capable of maintaining a relatively constant pressure force for 20 s. The intra-test reliability was acceptable even though there were individual differences on different occasions between those PTs who used the SI joint tests often and those who seldom or never used them. The inter-test reliability was insufficient. The findings indicate the advantage of registering pressure force as a complement for standardized methods for pain-provoking tests and when learning provocation tests, since individual variability was considerable.
Reliability demonstration test for load-sharing systems with exponential and Weibull components

PubMed Central

Hu, Qingpei; Yu, Dan; Xie, Min

2017-01-01

Conducting a Reliability Demonstration Test (RDT) is a crucial step in production. Products are tested under certain schemes to demonstrate whether their reliability indices reach pre-specified thresholds. Test schemes for RDT have been studied in different situations, e.g., lifetime testing, degradation testing and accelerated testing. Systems designed with several structures are also investigated in many RDT plans. Despite the availability of a range of test plans for different systems, RDT planning for load-sharing systems hasn’t yet received the attention it deserves. In this paper, we propose a demonstration method for two specific types of load-sharing systems with components subject to two distributions: exponential and Weibull. Based on the assumptions and interpretations made in several previous works on such load-sharing systems, we set the mean time to failure (MTTF) of the total system as the demonstration target. We represent the MTTF as a summation of mean time between successive component failures. Next, we introduce generalized test statistics for both the underlying distributions. Finally, RDT plans for the two types of systems are established on the basis of these test statistics. PMID:29284030
Reliability demonstration test for load-sharing systems with exponential and Weibull components.

PubMed

Xu, Jianyu; Hu, Qingpei; Yu, Dan; Xie, Min

2017-01-01

Conducting a Reliability Demonstration Test (RDT) is a crucial step in production. Products are tested under certain schemes to demonstrate whether their reliability indices reach pre-specified thresholds. Test schemes for RDT have been studied in different situations, e.g., lifetime testing, degradation testing and accelerated testing. Systems designed with several structures are also investigated in many RDT plans. Despite the availability of a range of test plans for different systems, RDT planning for load-sharing systems hasn't yet received the attention it deserves. In this paper, we propose a demonstration method for two specific types of load-sharing systems with components subject to two distributions: exponential and Weibull. Based on the assumptions and interpretations made in several previous works on such load-sharing systems, we set the mean time to failure (MTTF) of the total system as the demonstration target. We represent the MTTF as a summation of mean time between successive component failures. Next, we introduce generalized test statistics for both the underlying distributions. Finally, RDT plans for the two types of systems are established on the basis of these test statistics.
Requirements for diagnosis of malaria at different levels of the laboratory network in Africa.

PubMed

Long, Earl G

2009-06-01

The rapid increase of resistance to cheap, reliable antimalarials, the increasing cost of effective drugs, and the low specificity of clinical diagnosis has increased the need for more reliable diagnostic methods for malaria. The most commonly used and most reliable remains microscopic examination of stained blood smears, but this technique requires skilled personnel, precision instruments, and ideally a source of electricity. Microscopy has the advantage of enabling the examiner to identify the species, stage, and density of an infection. An alternative to microscopy is the rapid diagnostic test (RDT), which uses a labeled monoclonal antibody to detect circulating parasitic antigens. This test is most commonly used to detect Plasmodium falciparum infections and is available in a plastic cassette format. Both microscopy and RDTs should be available at all levels of laboratory service in endemic areas, but in peripheral laboratories with minimally trained staff, the RDT may be a more practical diagnostic method.
The validation of Huffaz Intelligence Test (HIT)

NASA Astrophysics Data System (ADS)

Rahim, Mohd Azrin Mohammad; Ahmad, Tahir; Awang, Siti Rahmah; Safar, Ajmain

2017-08-01

In general, a hafiz who can memorize the Quran has many specialties especially in respect to their academic performances. In this study, the theory of multiple intelligences introduced by Howard Gardner is embedded in a developed psychometric instrument, namely Huffaz Intelligence Test (HIT). This paper presents the validation and the reliability of HIT of some tahfiz students in Malaysia Islamic schools. A pilot study was conducted involving 87 huffaz who were randomly selected to answer the items in HIT. The analysis method used includes Partial Least Square (PLS) on reliability, convergence and discriminant validation. The study has validated nine intelligences. The findings also indicated that the composite reliabilities for the nine types of intelligences are greater than 0.8. Thus, the HIT is a valid and reliable instrument to measure the multiple intelligences among huffaz.
Validity and reliability of a scale to measure genital body image.

PubMed

Zielinski, Ruth E; Kane-Low, Lisa; Miller, Janis M; Sampselle, Carolyn

2012-01-01

Women's body image dissatisfaction extends to body parts usually hidden from view--their genitals. Ability to measure genital body image is limited by lack of valid and reliable questionnaires. We subjected a previously developed questionnaire, the Genital Self Image Scale (GSIS) to psychometric testing using a variety of methods. Five experts determined the content validity of the scale. Then using four participant groups, factor analysis was performed to determine construct validity and to identify factors. Further construct validity was established using the contrasting groups approach. Internal consistency and test-retest reliability was determined. Twenty one of 29 items were considered content valid. Two items were added based on expert suggestions. Factor analysis was undertaken resulting in four factors, identified as Genital Confidence, Appeal, Function, and Comfort. The revised scale (GSIS-20) included 20 items explaining 59.4% of the variance. Women indicating an interest in genital cosmetic surgery exhibited significantly lower scores on the GSIS-20 than those who did not. The final 20 item scale exhibited internal reliability across all sample groups as well as test-retest reliability. The GSIS-20 provides a measure of genital body image demonstrating reliability and validity across several populations of women.
Construction of Response Surface with Higher Order Continuity and Its Application to Reliability Engineering

NASA Technical Reports Server (NTRS)

Krishnamurthy, T.; Romero, V. J.

2002-01-01

The usefulness of piecewise polynomials with C1 and C2 derivative continuity for response surface construction method is examined. A Moving Least Squares (MLS) method is developed and compared with four other interpolation methods, including kriging. First the selected methods are applied and compared with one another in a two-design variables problem with a known theoretical response function. Next the methods are tested in a four-design variables problem from a reliability-based design application. In general the piecewise polynomial with higher order derivative continuity methods produce less error in the response prediction. The MLS method was found to be superior for response surface construction among the methods evaluated.

Noncontact spirometry with a webcam

NASA Astrophysics Data System (ADS)

Liu, Chenbin; Yang, Yuting; Tsow, Francis; Shao, Dangdang; Tao, Nongjian

2017-05-01

We present an imaging-based method for noncontact spirometry. The method tracks the subtle respiratory-induced shoulder movement of a subject, builds a calibration curve, and determines the flow-volume spirometry curve and vital respiratory parameters, including forced expiratory volume in the first second, forced vital capacity, and peak expiratory flow rate. We validate the accuracy of the method by comparing the data with those simultaneously recorded with a gold standard reference method and examine the reliability of the noncontact spirometry with a pilot study including 16 subjects. This work demonstrates that the noncontact method can provide accurate and reliable spirometry tests with a webcam. Compared to the traditional spirometers, the present noncontact spirometry does not require using a spirometer, breathing into a mouthpiece, or wearing a nose clip, thus making spirometry test more easily accessible for the growing population of asthma and chronic obstructive pulmonary diseases.
Noncontact spirometry with a webcam.

PubMed

Liu, Chenbin; Yang, Yuting; Tsow, Francis; Shao, Dangdang; Tao, Nongjian

2017-05-01

We present an imaging-based method for noncontact spirometry. The method tracks the subtle respiratory-induced shoulder movement of a subject, builds a calibration curve, and determines the flow-volume spirometry curve and vital respiratory parameters, including forced expiratory volume in the first second, forced vital capacity, and peak expiratory flow rate. We validate the accuracy of the method by comparing the data with those simultaneously recorded with a gold standard reference method and examine the reliability of the noncontact spirometry with a pilot study including 16 subjects. This work demonstrates that the noncontact method can provide accurate and reliable spirometry tests with a webcam. Compared to the traditional spirometers, the present noncontact spirometry does not require using a spirometer, breathing into a mouthpiece, or wearing a nose clip, thus making spirometry test more easily accessible for the growing population of asthma and chronic obstructive pulmonary diseases.
Reliability and validity of a nutrition and physical activity environmental self-assessment for child care

PubMed Central

Benjamin, Sara E; Neelon, Brian; Ball, Sarah C; Bangdiwala, Shrikant I; Ammerman, Alice S; Ward, Dianne S

2007-01-01

Background Few assessment instruments have examined the nutrition and physical activity environments in child care, and none are self-administered. Given the emerging focus on child care settings as a target for intervention, a valid and reliable measure of the nutrition and physical activity environment is needed. Methods To measure inter-rater reliability, 59 child care center directors and 109 staff completed the self-assessment concurrently, but independently. Three weeks later, a repeat self-assessment was completed by a sub-sample of 38 directors to assess test-retest reliability. To assess criterion validity, a researcher-administered environmental assessment was conducted at 69 centers and was compared to a self-assessment completed by the director. A weighted kappa test statistic and percent agreement were calculated to assess agreement for each question on the self-assessment. Results For inter-rater reliability, kappa statistics ranged from 0.20 to 1.00 across all questions. Test-retest reliability of the self-assessment yielded kappa statistics that ranged from 0.07 to 1.00. The inter-quartile kappa statistic ranges for inter-rater and test-retest reliability were 0.45 to 0.63 and 0.27 to 0.45, respectively. When percent agreement was calculated, questions ranged from 52.6% to 100% for inter-rater reliability and 34.3% to 100% for test-retest reliability. Kappa statistics for validity ranged from -0.01 to 0.79, with an inter-quartile range of 0.08 to 0.34. Percent agreement for validity ranged from 12.9% to 93.7%. Conclusion This study provides estimates of criterion validity, inter-rater reliability and test-retest reliability for an environmental nutrition and physical activity self-assessment instrument for child care. Results indicate that the self-assessment is a stable and reasonably accurate instrument for use with child care interventions. We therefore recommend the Nutrition and Physical Activity Self-Assessment for Child Care (NAP SACC) instrument to researchers and practitioners interested in conducting healthy weight intervention in child care. However, a more robust, less subjective measure would be more appropriate for researchers seeking an outcome measure to assess intervention impact. PMID:17615078
Development and psychometric evaluation of an information literacy self-efficacy survey and an information literacy knowledge test*

PubMed Central

Tepe, Rodger; Tepe, Chabha

2015-01-01

Objective To develop and psychometrically evaluate an information literacy (IL) self-efficacy survey and an IL knowledge test. Methods In this test–retest reliability study, a 25-item IL self-efficacy survey and a 50-item IL knowledge test were developed and administered to a convenience sample of 53 chiropractic students. Item analyses were performed on all questions. Results The IL self-efficacy survey demonstrated good reliability (test–retest correlation = 0.81) and good/very good internal consistency (mean κ = .56 and Cronbach's α = .92). A total of 25 questions with the best item analysis characteristics were chosen from the 50-item IL knowledge test, resulting in a 25-item IL knowledge test that demonstrated good reliability (test–retest correlation = 0.87), very good internal consistency (mean κ = .69, KR20 = 0.85), and good item discrimination (mean point-biserial = 0.48). Conclusions This study resulted in the development of three instruments: a 25-item IL self-efficacy survey, a 50-item IL knowledge test, and a 25-item IL knowledge test. The information literacy self-efficacy survey and the 25-item version of the information literacy knowledge test have shown preliminary evidence of adequate reliability and validity to justify continuing study with these instruments. PMID:25517736
A novel method to remotely measure food intake of free-living people in real-time

PubMed Central

Martin, Corby K.; Han, Hongmei; Coulon, Sandra M.; Allen, H. Raymond; Champagne, Catherine M.; Anton, Stephen D.

2008-01-01

The aim of this study was to report the first reliability and validity tests of the Remote Food Photography Method (RFPM), which consists of camera-enabled cell phones with data transfer capability. Participants take and transmit photographs of food selection and plate waste to researchers/clinicians for analysis. Following two pilot studies, adult participants (N=52, 20≤BMI≤35) were randomly assigned to the dine-in or take-out group. Energy intake (EI) was measured for three days. The dine-in group ate lunch and dinner in the laboratory. The take-out group ate lunch in the laboratory and dinner in free-living conditions (participants received a cooler with pre-weighed food that they returned the following morning). Energy intake was measured with the RFPM and by directly weighing foods. The RFPM was tested in laboratory and free-living conditions. Reliability was tested over three days and validity was tested by comparing directly weighed EI to EI estimated with the RFPM using Bland-Altman analysis. The RFPM produced reliable EI estimates over three days in laboratory (r=.62, p<.0001) and free-living (r=.68, p<.0001) conditions. Weighed EI correlated highly with EI estimated with the RFPM in laboratory and free-living conditions (r’s>.93, p<.0001). In two laboratory-based validity tests, the RFPM underestimated EI by -4.7% (p=.046) and -5.5% (p=.076). In free-living conditions, the RFPM underestimated EI by -6.6% (p=.017). Bias did not differ by body weight or age. The RFPM is a promising new method for accurately measuring the EI of free-living people. Error associated with the method is small compared to self-report methods. PMID:18616837
Translating the semi-structured assessment for drug dependence and alcoholism in the Western Pacific: rationale, study design and reliability of alcohol dependence.

PubMed

Quinn, Amity E; Rosen, Rochelle K; McGeary, John E; Amoa, Francine; Kranzler, Henry R; Francazio, Sarah; McGarvey, Stephen T; Swift, Robert M

2014-01-01

The aims of this study were to develop a bilingual version of the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA) in English and Samoan and determine the reliability of assessments of alcohol dependence in American Samoa. The study consisted of development and reliability-testing phases. In the development phase, the SSADDA alcohol module was translated and the translation was evaluated through cognitive interviews. In the reliability-testing phase, the bilingual SSADDA was administered to 40 ethnic Samoans, including a sub-sample of 26 individuals who were retested. Cognitive interviews indicated the initial translation was culturally and linguistically appropriate except items pertaining to alcohol tolerance, which were modified to reflect Samoan concepts. SSADDA reliability testing indicated diagnoses of DSM-III-R and DSM-IV alcohol dependence were reliable. Reliability varied by language of administration. The English/Samoan version of the SSADDA is appropriate for the diagnosis of DSM-III-R alcohol dependence, which may be useful in advancing research and public health efforts to address alcohol problems in American Samoa and the Western Pacific. The translation methods may inform researchers translating diagnostic and assessment tools into different languages and cultures. © The Author 2014. Medical Council on Alcohol and Oxford University Press. All rights reserved.
The Arthroscopic Surgical Skill Evaluation Tool (ASSET)

PubMed Central

Koehler, Ryan J.; Amsdell, Simon; Arendt, Elizabeth A; Bisson, Leslie J; Braman, Jonathan P; Butler, Aaron; Cosgarea, Andrew J; Harner, Christopher D; Garrett, William E; Olson, Tyson; Warme, Winston J.; Nicandri, Gregg T.

2014-01-01

Background Surgeries employing arthroscopic techniques are among the most commonly performed in orthopaedic clinical practice however, valid and reliable methods of assessing the arthroscopic skill of orthopaedic surgeons are lacking. Hypothesis The Arthroscopic Surgery Skill Evaluation Tool (ASSET) will demonstrate content validity, concurrent criterion-oriented validity, and reliability, when used to assess the technical ability of surgeons performing diagnostic knee arthroscopy on cadaveric specimens. Study Design Cross-sectional study; Level of evidence, 3 Methods Content validity was determined by a group of seven experts using a Delphi process. Intra-articular performance of a right and left diagnostic knee arthroscopy was recorded for twenty-eight residents and two sports medicine fellowship trained attending surgeons. Subject performance was assessed by two blinded raters using the ASSET. Concurrent criterion-oriented validity, inter-rater reliability, and test-retest reliability were evaluated. Results Content validity: The content development group identified 8 arthroscopic skill domains to evaluate using the ASSET. Concurrent criterion-oriented validity: Significant differences in total ASSET score (p<0.05) between novice, intermediate, and advanced experience groups were identified. Inter-rater reliability: The ASSET scores assigned by each rater were strongly correlated (r=0.91, p <0.01) and the intra-class correlation coefficient between raters for the total ASSET score was 0.90. Test-retest reliability: there was a significant correlation between ASSET scores for both procedures attempted by each individual (r = 0.79, p<0.01). Conclusion The ASSET appears to be a useful, valid, and reliable method for assessing surgeon performance of diagnostic knee arthroscopy in cadaveric specimens. Studies are ongoing to determine its generalizability to other procedures as well as to the live OR and other simulated environments. PMID:23548808
Reliability of cognitive tests of ELSA-Brasil, the brazilian longitudinal study of adult health

PubMed Central

Batista, Juliana Alves; Giatti, Luana; Barreto, Sandhi Maria; Galery, Ana Roscoe Papini; Passos, Valéria Maria de Azeredo

2013-01-01

Cognitive function evaluation entails the use of neuropsychological tests, applied exclusively or in sequence. The results of these tests may be influenced by factors related to the environment, the interviewer or the interviewee. OBJECTIVES We examined the test-retest reliability of some tests of the Brazilian version from the Consortium to Establish a Registry for Alzheimer's disease. METHODS The ELSA-Brasil is a multicentre study of civil servants (35-74 years of age) from public institutions across six Brazilian States. The same tests were applied, in different order of appearance, by the same trained and certified interviewer, with an approximate 20-day interval, to 160 adults (51% men, mean age 52 years). The Intraclass Correlation Coefficient (ICC) was used to assess the reliability of the measures; and a dispersion graph was used to examine the patterns of agreement between them. RESULTS We observed higher retest scores in all tests as well as a shorter test completion time for the Trail Making Test B. ICC values for each test were as following: Word List Learning Test (0.56), Word Recall (0.50), Word Recognition (0.35), Phonemic Verbal Fluency Test (VFT, 0.61), Semantic VFT (0.53) and Trail B (0.91). The Bland-Altman plot showed better correlation of executive function (VFT and Trail B) than of memory tests. CONCLUSIONS Better performance in retest may reflect a learning effect, and suggest that retest should be repeated using alternate forms or after longer periods. In this sample of adults with high schooling level, reliability was only moderate for memory tests whereas the measurement of executive function proved more reliable. PMID:29213860
FUNCTIONAL PERFORMANCE TESTING OF THE HIP IN ATHLETES: A SYSTEMATIC REVIEW FOR RELIABILITY AND VALIDITY

PubMed Central

Martin, RobRoy L.

2012-01-01

Purpose/Background: The purpose of this study was to systematically review the literature for functional performance tests with evidence of reliability and validity that could be used for a young, athletic population with hip dysfunction. Methods: A search of PubMed and SPORTDiscus databases were performed to identify movement, balance, hop/jump, or agility functional performance tests from the current peer-reviewed literature used to assess function of the hip in young, athletic subjects. Results: The single-leg stance, deep squat, single-leg squat, and star excursion balance tests (SEBT) demonstrated evidence of validity and normative data for score interpretation. The single-leg stance test and SEBT have evidence of validity with association to hip abductor function. The deep squat test demonstrated evidence as a functional performance test for evaluating femoroacetabular impingement. Hop/Jump tests and agility tests have no reported evidence of reliability or validity in a population of subjects with hip pathology. Conclusions: Use of functional performance tests in the assessment of hip dysfunction has not been well established in the current literature. Diminished squat depth and provocation of pain during the single-leg balance test have been associated with patients diagnosed with FAI and gluteal tendinopathy, respectively. The SEBT and single-leg squat tests provided evidence of convergent validity through an analysis of kinematics and muscle function in normal subjects. Reliability of functional performance tests have not been established on patients with hip dysfunction. Further study is needed to establish reliability and validity of functional performance tests that can be used in a young, athletic population with hip dysfunction. Level of Evidence: 2b (Systematic Review of Literature) PMID:22893860
Two-colour chewing gum mixing ability test for evaluating masticatory performance in children with mixed dentition: validity and reliability study.

PubMed

Kaya, M S; Güçlü, B; Schimmel, M; Akyüz, S

2017-11-01

The unappealing taste of the chewing material and the time-consuming repetitive task in masticatory performance tests using artificial foodstuff may discourage children from performing natural chewing movements. Therefore, the aim was to determine the validity and reliability of a two-colour chewing gum mixing ability test for masticatory performance (MP) assessment in mixed dentition children. Masticatory performance was tested in two groups: systemically healthy fully dentate young adults and children in mixed dentition. Median particle size was assessed using a comminution test, and a two-colour chewing gum mixing ability test was applied for MP analysis. Validity was tested with Pearson correlation, and reliability was tested with intra-class correlation coefficient, Pearson correlation and Bland-Altman plots. Both comminution and two-colour chewing gum mixing ability tests revealed statistically significant MP differences between children (n = 25) and adults (n = 27, both P < 0·01). Pearson correlation between comminution and two-colour chewing gum mixing ability tests was positive and significant (r = 0·418, P = 0·002). Correlations for interobserver reliability and test-retest values were significant (r = 0·990, P = 0·0001 and r = 0·995, P = 0·0001). Although both methods could discriminate MP differences, the comminution test detected these differences generally in a wider range compared to two-colour chewing gum mixing ability test. However, considering the high reliability of the results, the two-colour chewing gum mixing ability test can be used to assess masticatory performance in children, especially at non-clinical settings. © 2017 John Wiley & Sons Ltd.
Attitudes about Advances in Sweat Patch Testing in Drug Courts: Insights from a Case Study in Southern California

ERIC Educational Resources Information Center

Polzer, Katherine

2010-01-01

Drug courts are reinventing the drug testing framework by experimenting with new methods, including use of the sweat patch. The sweat patch is a band-aid like strip used to monitor drug court participants. The validity and reliability of the sweat patch as an effective testing method was examined, as well as the effectiveness, meaning how likely…
An approach to operating system testing

NASA Technical Reports Server (NTRS)

Sum, R. N., Jr.; Campbell, R. H.; Kubitz, W. J.

1984-01-01

To ensure the reliability and performance of a new system, it must be verified or validated in some manner. Currently, testing is the only resonable technique available for doing this. Part of this testing process is the high level system test. System testing is considered with respect to operating systems and in particular UNIX. This consideration results in the development and presentation of a good method for performing the system test. The method includes derivations from the system specifications and ideas for management of the system testing project. Results of applying the method to the IBM System/9000 XENIX operating system test and the development of a UNIX test suite are presented.
First-order reliability application and verification methods for semistatic structures

NASA Astrophysics Data System (ADS)

Verderaime, V.

1994-11-01

Escalating risks of aerostructures stimulated by increasing size, complexity, and cost should no longer be ignored in conventional deterministic safety design methods. The deterministic pass-fail concept is incompatible with probability and risk assessments; stress audits are shown to be arbitrary and incomplete, and the concept compromises the performance of high-strength materials. A reliability method is proposed that combines first-order reliability principles with deterministic design variables and conventional test techniques to surmount current deterministic stress design and audit deficiencies. Accumulative and propagation design uncertainty errors are defined and appropriately implemented into the classical safety-index expression. The application is reduced to solving for a design factor that satisfies the specified reliability and compensates for uncertainty errors, and then using this design factor as, and instead of, the conventional safety factor in stress analyses. The resulting method is consistent with current analytical skills and verification practices, the culture of most designers, and the development of semistatic structural designs.
An Application of the Rasch Model to Computerized Adaptive Testing.

ERIC Educational Resources Information Center

Wisniewski, Dennis R.

Three questions concerning the Binary Search Method (BSM) of computerized adaptive testing were studied: (1) whether it provided a reliable and valid estimation of examinee ability; (2) its effect on examinee attitudes toward computerized adaptive testing and conventional paper-and-pencil testing; and (3) the relationship between item response…
Skeletal age estimation for forensic purposes: A comparison of GP, TW2 and TW3 methods on an Italian sample.

PubMed

Pinchi, Vilma; De Luca, Federica; Ricciardi, Federico; Focardi, Martina; Piredda, Valentina; Mazzeo, Elena; Norelli, Gian-Aristide

2014-05-01

Paediatricians, radiologists, anthropologists and medico-legal specialists are often called as experts in order to provide age estimation (AE) for forensic purposes. The literature recommends performing the X-rays of the left hand and wrist (HW-XR) for skeletal age estimation. The method most frequently employed is the Greulich and Pyle (GP) method. In addition, the so-called bone-specific techniques are also applied including the method of Tanner Whitehouse (TW) in the latest versions TW2 and TW3. To compare skeletal age and chronological age in a large sample of children and adolescents using GP, TW2 and TW3 methods in order to establish which of these is the most reliable for forensic purposes. The sample consisted of 307 HW-XRs of Italian children or adolescents, 145 females and 162 males aged between 6 and 20 years. The radiographies were scored according to the GP, TW2RUS and TW3RUS methods by one investigator. The results' reliability was assessed using intraclass correlation coefficient. Wilcoxon signed-rank test and Student t-test were performed to search for significant differences between skeletal and chronological ages. The distributions of the differences between estimated and chronological age, by means of boxplots, show how median differences for TW3 and GP methods are generally very close to 0. Hypothesis tests' results were obtained, with respect to the sex, both for the entire group of individuals and people grouped by age. Results show no significant differences among estimated and chronological age for TW3 and, to a lesser extent, GP. The TW2 proved to be the worst of the three methods. Our results support the conclusion that the TW2 method is not reliable for AE for forensic purpose. The GP and TW3 methods have proved to be reliable in males. For females, the best method was found to be TW3. When performing forensic age estimation in subjects around 14 years of age, it could be advisable to use and associate the TW3 and GP methods. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Statistical Bayesian method for reliability evaluation based on ADT data

NASA Astrophysics Data System (ADS)

Lu, Dawei; Wang, Lizhi; Sun, Yusheng; Wang, Xiaohong

2018-05-01

Accelerated degradation testing (ADT) is frequently conducted in the laboratory to predict the products’ reliability under normal operating conditions. Two kinds of methods, degradation path models and stochastic process models, are utilized to analyze degradation data and the latter one is the most popular method. However, some limitations like imprecise solution process and estimation result of degradation ratio still exist, which may affect the accuracy of the acceleration model and the extrapolation value. Moreover, the conducted solution of this problem, Bayesian method, lose key information when unifying the degradation data. In this paper, a new data processing and parameter inference method based on Bayesian method is proposed to handle degradation data and solve the problems above. First, Wiener process and acceleration model is chosen; Second, the initial values of degradation model and parameters of prior and posterior distribution under each level is calculated with updating and iteration of estimation values; Third, the lifetime and reliability values are estimated on the basis of the estimation parameters; Finally, a case study is provided to demonstrate the validity of the proposed method. The results illustrate that the proposed method is quite effective and accuracy in estimating the lifetime and reliability of a product.
Development and Validation of the User Version of the Mobile Application Rating Scale (uMARS).

PubMed

Stoyanov, Stoyan R; Hides, Leanne; Kavanagh, David J; Wilson, Hollie

2016-06-10

The Mobile Application Rating Scale (MARS) provides a reliable method to assess the quality of mobile health (mHealth) apps. However, training and expertise in mHealth and the relevant health field is required to administer it. This study describes the development and reliability testing of an end-user version of the MARS (uMARS). The MARS was simplified and piloted with 13 young people to create the uMARS. The internal consistency and test-retest reliability of the uMARS was then examined in a second sample of 164 young people participating in a randomized controlled trial of a mHealth app. App ratings were collected using the uMARS at 1-, 3,- and 6-month follow up. The uMARS had excellent internal consistency (alpha = .90), with high individual alphas for all subscales. The total score and subscales had good test-retest reliability over both 1-2 months and 3 months. The uMARS is a simple tool that can be reliably used by end-users to assess the quality of mHealth apps.
Small sample estimation of the reliability function for technical products

NASA Astrophysics Data System (ADS)

Lyamets, L. L.; Yakimenko, I. V.; Kanishchev, O. A.; Bliznyuk, O. A.

2017-12-01

It is demonstrated that, in the absence of big statistic samples obtained as a result of testing complex technical products for failure, statistic estimation of the reliability function of initial elements can be made by the moments method. A formal description of the moments method is given and its advantages in the analysis of small censored samples are discussed. A modified algorithm is proposed for the implementation of the moments method with the use of only the moments at which the failures of initial elements occur.
Unreliability as a Threat to Understanding Psychopathology: The Cautionary Tale of Attentional Bias

PubMed Central

Rodebaugh, Thomas L.; Scullin, Rachel B.; Langer, Julia K.; Dixon, David J.; Huppert, Jonathan D.; Bernstein, Amit; Zvielli, Ariel; Lenze, Eric J.

2016-01-01

The use of unreliable measures constitutes a threat to our understanding of psychopathology, because advancement of science using both behavioral and biologically-oriented measures can only be certain if such measurements are reliable. Two pillars of NIMH’s portfolio – the Research Domain Criteria (RDoC) initiative for psychopathology and the target engagement initiative in clinical trials – cannot succeed without measures that possess the high reliability necessary for tests involving mediation and selection based on individual differences. We focus on the historical lack of reliability of attentional bias measures as an illustration of how reliability can pose a threat to our understanding. Our own data replicate previous findings of poor reliability for traditionally-used scores, which suggests a serious problem with the ability to test theories regarding attentional bias. This lack of reliability may also suggest problems with the assumption (in both theory and the formula for the scores) that attentional bias is consistent and stable across time. In contrast, measures accounting for attention as a dynamic process in time show good reliability in our data. The field is sorely in need of research reporting findings and reliability for attentional bias scores using multiple methods, including those focusing on dynamic processes over time. We urge researchers to test and report reliability of all measures, considering findings of low reliability not just as a nuisance but as an opportunity to modify and improve upon the underlying theory. Full assessment of reliability of measures will maximize the possibility that RDoC (and psychological science more generally) will succeed. PMID:27322741
Translation, Cultural Adaptation and Validation of the Simple Shoulder Test to Spanish

PubMed Central

Arcuri, Francisco; Barclay, Fernando; Nacul, Ivan

2015-01-01

Background: The validation of widely used scales facilitates the comparison across international patient samples. Objective: The objective was to translate, culturally adapt and validate the Simple Shoulder Test into Argentinian Spanish. Methods: The Simple Shoulder Test was translated from English into Argentinian Spanish by two independent translators, translated back into English and evaluated for accuracy by an expert committee to correct the possible discrepancies. It was then administered to 50 patients with different shoulder conditions.Psycometric properties were analyzed including internal consistency, measured with Cronbach´s Alpha, test-retest reliability at 15 days with the interclass correlation coefficient. Results: The internal consistency, validation, was an Alpha of 0,808, evaluated as good. The test-retest reliability index as measured by intra-class correlation coefficient (ICC) was 0.835, evaluated as excellent. Conclusion: The Simple Shoulder Test translation and it´s cultural adaptation to Argentinian-Spanish demonstrated adequate internal reliability and validity, ultimately allowing for its use in the comparison with international patient samples.

Experimental Protocol to Determine the Chloride Threshold Value for Corrosion in Samples Taken from Reinforced Concrete Structures

PubMed Central

Angst, Ueli M.; Boschmann, Carolina; Wagner, Matthias; Elsener, Bernhard

2017-01-01

The aging of reinforced concrete infrastructure in developed countries imposes an urgent need for methods to reliably assess the condition of these structures. Corrosion of the embedded reinforcing steel is the most frequent cause for degradation. While it is well known that the ability of a structure to withstand corrosion depends strongly on factors such as the materials used or the age, it is common practice to rely on threshold values stipulated in standards or textbooks. These threshold values for corrosion initiation (Ccrit) are independent of the actual properties of a certain structure, which clearly limits the accuracy of condition assessments and service life predictions. The practice of using tabulated values can be traced to the lack of reliable methods to determine Ccrit on-site and in the laboratory. Here, an experimental protocol to determine Ccrit for individual engineering structures or structural members is presented. A number of reinforced concrete samples are taken from structures and laboratory corrosion testing is performed. The main advantage of this method is that it ensures real conditions concerning parameters that are well known to greatly influence Ccrit, such as the steel-concrete interface, which cannot be representatively mimicked in laboratory-produced samples. At the same time, the accelerated corrosion test in the laboratory permits the reliable determination of Ccrit prior to corrosion initiation on the tested structure; this is a major advantage over all common condition assessment methods that only permit estimating the conditions for corrosion after initiation, i.e., when the structure is already damaged. The protocol yields the statistical distribution of Ccrit for the tested structure. This serves as a basis for probabilistic prediction models for the remaining time to corrosion, which is needed for maintenance planning. This method can potentially be used in material testing of civil infrastructures, similar to established methods used for mechanical testing. PMID:28892023
Experimental Protocol to Determine the Chloride Threshold Value for Corrosion in Samples Taken from Reinforced Concrete Structures.

PubMed

Angst, Ueli M; Boschmann, Carolina; Wagner, Matthias; Elsener, Bernhard

2017-08-31

The aging of reinforced concrete infrastructure in developed countries imposes an urgent need for methods to reliably assess the condition of these structures. Corrosion of the embedded reinforcing steel is the most frequent cause for degradation. While it is well known that the ability of a structure to withstand corrosion depends strongly on factors such as the materials used or the age, it is common practice to rely on threshold values stipulated in standards or textbooks. These threshold values for corrosion initiation (Ccrit) are independent of the actual properties of a certain structure, which clearly limits the accuracy of condition assessments and service life predictions. The practice of using tabulated values can be traced to the lack of reliable methods to determine Ccrit on-site and in the laboratory. Here, an experimental protocol to determine Ccrit for individual engineering structures or structural members is presented. A number of reinforced concrete samples are taken from structures and laboratory corrosion testing is performed. The main advantage of this method is that it ensures real conditions concerning parameters that are well known to greatly influence Ccrit, such as the steel-concrete interface, which cannot be representatively mimicked in laboratory-produced samples. At the same time, the accelerated corrosion test in the laboratory permits the reliable determination of Ccrit prior to corrosion initiation on the tested structure; this is a major advantage over all common condition assessment methods that only permit estimating the conditions for corrosion after initiation, i.e., when the structure is already damaged. The protocol yields the statistical distribution of Ccrit for the tested structure. This serves as a basis for probabilistic prediction models for the remaining time to corrosion, which is needed for maintenance planning. This method can potentially be used in material testing of civil infrastructures, similar to established methods used for mechanical testing.
Validating the Alcohol Use Disorders Identification Test with Persons Who Have a Serious Mental Illness

ERIC Educational Resources Information Center

O'Hare, Thomas; Sherrer, Margaret V.; LaButti, Annamaria; Emrick, Kelly

2004-01-01

Objective/Method: The use of brief, reliable, valid, and practical measures of substance use is critical for conducting individual assessments and program evaluation for integrated mental health-substance abuse services for persons with serious mental illness. This investigation examines the internal consistency reliability, concurrent validity,…
Measuring Recognition Performance Using Computer-Based and Paper-Based Methods.

ERIC Educational Resources Information Center

Federico, Pat-Anthony

1991-01-01

Using a within-subjects design, computer-based and paper-based tests of aircraft silhouette recognition were administered to 83 male naval pilots and flight officers to determine the relative reliabilities and validities of 2 measurement modes. Relative reliabilities and validities of the two modes were contingent on the multivariate measurement…
Identifying Dyslexia in Adults: An Iterative Method Using the Predictive Value of Item Scores and Self-Report Questions

ERIC Educational Resources Information Center

Tamboer, Peter; Vorst, Harrie C. M.; Oort, Frans J.

2014-01-01

Methods for identifying dyslexia in adults vary widely between studies. Researchers have to decide how many tests to use, which tests are considered to be the most reliable, and how to determine cut-off scores. The aim of this study was to develop an objective and powerful method for diagnosing dyslexia. We took various methodological measures,…
A Retrospective Performance Assessment of the Developmental Neurotoxicity Study in Support of OECD Test Guideline 426

PubMed Central

Makris, Susan L.; Raffaele, Kathleen; Allen, Sandra; Bowers, Wayne J.; Hass, Ulla; Alleva, Enrico; Calamandrei, Gemma; Sheets, Larry; Amcoff, Patric; Delrue, Nathalie; Crofton, Kevin M.

2009-01-01

Objective We conducted a review of the history and performance of developmental neurotoxicity (DNT) testing in support of the finalization and implementation of Organisation of Economic Co-operation and Development (OECD) DNT test guideline 426 (TG 426). Information sources and analysis In this review we summarize extensive scientific efforts that form the foundation for this testing paradigm, including basic neurotoxicology research, interlaboratory collaborative studies, expert workshops, and validation studies, and we address the relevance, applicability, and use of the DNT study in risk assessment. Conclusions The OECD DNT guideline represents the best available science for assessing the potential for DNT in human health risk assessment, and data generated with this protocol are relevant and reliable for the assessment of these end points. The test methods used have been subjected to an extensive history of international validation, peer review, and evaluation, which is contained in the public record. The reproducibility, reliability, and sensitivity of these methods have been demonstrated, using a wide variety of test substances, in accordance with OECD guidance on the validation and international acceptance of new or updated test methods for hazard characterization. Multiple independent, expert scientific peer reviews affirm these conclusions. PMID:19165382
Structural Optimization for Reliability Using Nonlinear Goal Programming

NASA Technical Reports Server (NTRS)

El-Sayed, Mohamed E.

1999-01-01

This report details the development of a reliability based multi-objective design tool for solving structural optimization problems. Based on two different optimization techniques, namely sequential unconstrained minimization and nonlinear goal programming, the developed design method has the capability to take into account the effects of variability on the proposed design through a user specified reliability design criterion. In its sequential unconstrained minimization mode, the developed design tool uses a composite objective function, in conjunction with weight ordered design objectives, in order to take into account conflicting and multiple design criteria. Multiple design criteria of interest including structural weight, load induced stress and deflection, and mechanical reliability. The nonlinear goal programming mode, on the other hand, provides for a design method that eliminates the difficulty of having to define an objective function and constraints, while at the same time has the capability of handling rank ordered design objectives or goals. For simulation purposes the design of a pressure vessel cover plate was undertaken as a test bed for the newly developed design tool. The formulation of this structural optimization problem into sequential unconstrained minimization and goal programming form is presented. The resulting optimization problem was solved using: (i) the linear extended interior penalty function method algorithm; and (ii) Powell's conjugate directions method. Both single and multi-objective numerical test cases are included demonstrating the design tool's capabilities as it applies to this design problem.
Insightful practice: a reliable measure for medical revalidation

PubMed Central

Guthrie, Bruce; Sullivan, Frank M; Mercer, Stewart W; Russell, Andrew; Bruce, David A

2012-01-01

Background Medical revalidation decisions need to be reliable if they are to reassure on the quality and safety of professional practice. This study tested an innovative method in which general practitioners (GPs) were assessed on their reflection and response to a set of externally specified feedback. Setting and participants 60 GPs and 12 GP appraisers in the Tayside region of Scotland, UK. Methods A feedback dataset was specified as (1) GP-specific data collected by GPs themselves (patient and colleague opinion; open book self-evaluated knowledge test; complaints) and (2) Externally collected practice-level data provided to GPs (clinical quality and prescribing safety). GPs' perceptions of whether the feedback covered UK General Medical Council specified attributes of a ‘good doctor’ were examined using a mapping exercise. GPs' professionalism was examined in terms of appraiser assessment of GPs' level of insightful practice, defined as: engagement with, insight into and appropriate action on feedback data. The reliability of assessment of insightful practice and subsequent recommendations on GPs' revalidation by face-to-face and anonymous assessors were investigated using Generalisability G-theory. Main outcome measures Coverage of General Medical Council attributes by specified feedback and reliability of assessor recommendations on doctors' suitability for revalidation. Results Face-to-face assessment proved unreliable. Anonymous global assessment by three appraisers of insightful practice was highly reliable (G=0.85), as were revalidation decisions using four anonymous assessors (G=0.83). Conclusions Unlike face-to-face appraisal, anonymous assessment of insightful practice offers a valid and reliable method to decide GP revalidation. Further validity studies are needed. PMID:22653078
Validity and reliability of the Myotest accelerometric system for the assessment of vertical jump height.

PubMed

Casartelli, Nicola; Müller, Roland; Maffiuletti, Nicola A

2010-11-01

The aim of the present study was to verify the validity and reliability of the Myotest accelerometric system (Myotest SA, Sion, Switzerland) for the assessment of vertical jump height. Forty-four male basketball players (age range: 9-25 years) performed series of squat, countermovement and repeated jumps during 2 identical test sessions separated by 2-15 days. Flight height was simultaneously quantified with the Myotest system and validated photoelectric cells (Optojump). Two calculation methods were used to estimate the jump height from Myotest recordings: flight time (Myotest-T) and vertical takeoff velocity (Myotest-V). Concurrent validity was investigated comparing Myotest-T and Myotest-V to the criterion method (Optojump), and test-retest reliability was also examined. As regards validity, Myotest-T overestimated jumping height compared to Optojump (p < 0.001) with a systematic bias of approximately 7 cm, even though random errors were low (2.7 cm) and intraclass correlation coefficients (ICCs) where high (>0.98), that is, excellent validity. Myotest-V overestimated jumping height compared to Optojump (p < 0.001), with high random errors (>12 cm), high limits of agreement ratios (>36%), and low ICCs (<0.75), that is, poor validity. As regards reliability, Myotest-T showed high ICCs (range: 0.92-0.96), whereas Myotest-V showed low ICCs (range: 0.56-0.89), and high random errors (>9 cm). In conclusion, Myotest-T is a valid and reliable method for the assessment of vertical jump height, and its use is legitimate for field-based evaluations, whereas Myotest-V is neither valid nor reliable.
Reliability of visual acuity measurements taken with a notebook and a tablet computer in participants who were illiterate to Roman characters.

PubMed

Ruamviboonsuk, Paisan; Sudsakorn, Napitchareeya; Somkijrungroj, Thanapong; Engkagul, Chayanee; Tiensuwan, Montip

2012-03-01

Electronic measurement of visual acuity (VA) has been proposed and adopted as a method of determining VA scores in clinical research. Characters (optotypes) are displayed on a monitor screen and the examinee selects a match and inputs his choice to another electronic device. Unfortunately, the optotypes, called Sloan letters, in the standard protocol are 10 Roman characters. This limits their practicabilityfor measuring VA of patients who are illiterate to these characters. The authors introduced a method of displaying the Sloan letters one by one on a notebook and all 10 Sloan letters on a tablet computer screen. The former is for testing the patients whereas the latter is for them to input their responses by tapping on a letter that matches the one on the notebook screen. To assess test-retest reliability of VA scores determined with this method. Participants without ocular abnormality were recruited to have their right eyes measured with the same VA measurement method twice, one week apart. Those who were illiterate to Roman characters were enrolled for the aforementioned method for measuring their VA (Tablet group). A 15-inch display notebook computer and a 9-inch display tablet computer (iPad) communicated via a local wireless data network provided by a Wi-Fi router. Those who understood Roman characters were enrolled to have measurements with a 17-inch desktop computer and an infrared wireless keyboard (Keyboard group). Both methods used the same protocols and software for VA measurements. Reliability of VA scores obtained from each group was assessed by the confidence interval (CI) of the difference of the scores from the test and retest. The t test was used to analyze differences in mean VA scores between the test and retest in each group with p < 0.05 determined as statistically significant. There were 49 and 50 participants in the Tablet and Keyboard group respectively. The 95% CI of the difference between the scores from the test and retest in each group was 2 letters. Approximately 95% of participants in each group had an absolute difference of the scores between the test and retest of 7 letters. The mean of VA scores from the first test was significantly different from that of the second test in the Keyboard group (one-letter difference, p = 0.049); there was no significant difference between these scores in the Tablet group (0.1-letter difference, p = 0.86). Tablet computers may be used to assist patients who are illiterate to Roman characters in having their VA measured with the standard electronic protocol. This preliminary study suggested that the proposed method should be useful for reliable measuring VA outcome in multicenter international clinical trials without encountering a language barrier
A Novel Method for Quantifying Helmeted Field of View of a Spacesuit - And What It Means for Constellation

NASA Technical Reports Server (NTRS)

McFarland, Shane M.

2010-01-01

Field of view has always been a design feature paramount to helmet design, and in particular spacesuit design, where the helmet must provide an adequate field of view for a large range of activities, environments, and body positions. Historically, suited field of view has been evaluated either qualitatively in parallel with design or quantitatively using various test methods and protocols. As such, oftentimes legacy suit field of view information is either ambiguous for lack of supporting data or contradictory to other field of view tests performed with different subjects and test methods. This paper serves to document a new field of view testing method that is more reliable and repeatable than its predecessors. It borrows heavily from standard ophthalmologic field of vision tests such as the Goldmann kinetic perimetry test, but is designed specifically for evaluating field of view of a spacesuit helmet. In this test, four suits utilizing three different helmet designs were tested for field of view. Not only do these tests provide more reliable field of view data for legacy and prototype helmet designs, they also provide insight into how helmet design impacts field of view and what this means for the Constellation Project spacesuit helmet, which must meet stringent field of view requirements that are more generous to the crewmember than legacy designs.
Validation of the Simple Shoulder Test in a Portuguese-Brazilian Population. Is the Latent Variable Structure and Validation of the Simple Shoulder Test Stable across Cultures?

PubMed Central

Neto, Jose Osni Bruggemann; Gesser, Rafael Lehmkuhl; Steglich, Valdir; Bonilauri Ferreira, Ana Paula; Gandhi, Mihir; Vissoci, João Ricardo Nickenig; Pietrobon, Ricardo

2013-01-01

Background The validation of widely used scales facilitates the comparison across international patient samples. The objective of this study was to translate, culturally adapt and validate the Simple Shoulder Test into Brazilian Portuguese. Also we test the stability of factor analysis across different cultures. Objective The objective of this study was to translate, culturally adapt and validate the Simple Shoulder Test into Brazilian Portuguese. Also we test the stability of factor analysis across different cultures. Methods The Simple Shoulder Test was translated from English into Brazilian Portuguese, translated back into English, and evaluated for accuracy by an expert committee. It was then administered to 100 patients with shoulder conditions. Psychometric properties were analyzed including factor analysis, internal reliability, test-retest reliability at seven days, and construct validity in relation to the Short Form 36 health survey (SF-36). Results Factor analysis demonstrated a three factor solution. Cronbach’s alpha was 0.82. Test-retest reliability index as measured by intra-class correlation coefficient (ICC) was 0.84. Associations were observed in the hypothesized direction with all subscales of SF-36 questionnaire. Conclusion The Simple Shoulder Test translation and cultural adaptation to Brazilian-Portuguese demonstrated adequate factor structure, internal reliability, and validity, ultimately allowing for its use in the comparison with international patient samples. PMID:23675436
Reliability and validity of a brief method to assess nociceptive flexion reflex (NFR) threshold.

PubMed

Rhudy, Jamie L; France, Christopher R

2011-07-01

The nociceptive flexion reflex (NFR) is a physiological tool to study spinal nociception. However, NFR assessment can take several minutes and expose participants to repeated suprathreshold stimulations. The 4 studies reported here assessed the reliability and validity of a brief method to assess NFR threshold that uses a single ascending series of stimulations (Peak 1 NFR), by comparing it to a well-validated method that uses 3 ascending/descending staircases of stimulations (Staircase NFR). Correlations between the NFR definitions were high, were on par with test-retest correlations of Staircase NFR, and were not affected by participant sex or chronic pain status. Results also indicated the test-retest reliabilities for the 2 definitions were similar. Using larger stimulus increments (4 mAs) to assess Peak 1 NFR tended to result in higher NFR threshold estimates than using the Staircase NFR definition, whereas smaller stimulus increments (2 mAs) tended to result in lower NFR threshold estimates than the Staircase NFR definition. Neither NFR definition was correlated with anxiety, pain catastrophizing, or anxiety sensitivity. In sum, a single ascending series of electrical stimulations results in a reliable and valid estimate of NFR threshold. However, caution may be warranted when comparing NFR thresholds across studies that differ in the ascending stimulus increments. This brief method to assess NFR threshold is reliable and valid; therefore, it should be useful to clinical pain researchers interested in quickly assessing inter- and intra-individual differences in spinal nociceptive processes. Copyright © 2011 American Pain Society. Published by Elsevier Inc. All rights reserved.
Reliability Testing of NASA Piezocomposite Actuators

NASA Technical Reports Server (NTRS)

Wilkie, W.; High, J.; Bockman, J.

2002-01-01

NASA Langley Research Center has developed a low-cost piezocomposite actuator which has application for controlling vibrations in large inflatable smart space structures, space telescopes, and high performance aircraft. Tests show the NASA piezocomposite device is capable of producing large, directional, in-plane strains on the order of 2000 parts-per-million peak-to-peak, with no reduction in free-strain performance to 100 million electrical cycles. This paper describes methods, measurements, and preliminary results from our reliability evaluation of the device under externally applied mechanical loads and at various operational temperatures. Tests performed to date show no net reductions in actuation amplitude while the device was moderately loaded through 10 million electrical cycles. Tests were performed at both room temperature and at the maximum operational temperature of the epoxy resin system used in manufacture of the device. Initial indications are that actuator reliability is excellent, with no actuator failures or large net reduction in actuator performance.
Evaluation of the reliability of maize reference assays for GMO quantification.

PubMed

Papazova, Nina; Zhang, David; Gruden, Kristina; Vojvoda, Jana; Yang, Litao; Buh Gasparic, Meti; Blejec, Andrej; Fouilloux, Stephane; De Loose, Marc; Taverniers, Isabel

2010-03-01

A reliable PCR reference assay for relative genetically modified organism (GMO) quantification must be specific for the target taxon and amplify uniformly along the commercialised varieties within the considered taxon. Different reference assays for maize (Zea mays L.) are used in official methods for GMO quantification. In this study, we evaluated the reliability of eight existing maize reference assays, four of which are used in combination with an event-specific polymerase chain reaction (PCR) assay validated and published by the Community Reference Laboratory (CRL). We analysed the nucleotide sequence variation in the target genomic regions in a broad range of transgenic and conventional varieties and lines: MON 810 varieties cultivated in Spain and conventional varieties from various geographical origins and breeding history. In addition, the reliability of the assays was evaluated based on their PCR amplification performance. A single base pair substitution, corresponding to a single nucleotide polymorphism (SNP) reported in an earlier study, was observed in the forward primer of one of the studied alcohol dehydrogenase 1 (Adh1) (70) assays in a large number of varieties. The SNP presence is consistent with a poor PCR performance observed for this assay along the tested varieties. The obtained data show that the Adh1 (70) assay used in the official CRL NK603 assay is unreliable. Based on our results from both the nucleotide stability study and the PCR performance test, we can conclude that the Adh1 (136) reference assay (T25 and Bt11 assays) as well as the tested high mobility group protein gene assay, which also form parts of CRL methods for quantification, are highly reliable. Despite the observed uniformity in the nucleotide sequence of the invertase gene assay, the PCR performance test reveals that this target sequence might occur in more than one copy. Finally, although currently not forming a part of official quantification methods, zein and SSIIb assays are found to be highly reliable in terms of nucleotide stability and PCR performance and are proposed as good alternative targets for a reference assay for maize.
Reliability of Upright and Supine Power Measurements Using an Inertial Load Cycle Ergometer

NASA Technical Reports Server (NTRS)

Wickwire, P. J.; Leach, M.; Ryder, J.; Ploutz-Snyder, R.; Ploutz-Snyder, L.

2011-01-01

Practical, reliable, and time efficient methods of measuring muscular power are desirable for both research and applied testing situations. The inertial-load cycling method (ILC; Power/Cycle, Austin, TX) requires subjects to pedal as fast as possible against the inertial load of a flywheel for only 3-5 seconds, which could help reduce the time and effort required for maximal power testing. PURPOSE: 1) To test the intramachine reliability of ILC over 3 separate sessions, 2) to compare postural stance (upright vs. supine) during testing, and 3) to compare the maximal power (Pmax) output measured using ILC to that obtained from traditional isokinetic and leg press testing. METHODS: Subjects (n = 12) were tested on 4 non-consecutive days. The following tests were done on the first day of testing: isometric knee extension, isokinetic knee extension at several speeds, isokinetic power/endurance at 180/sec (Biodex System 4), leg press maximal isometric force, and leg press power/endurance. The other 3 days consisted exclusively of ILC testing. Subjects performed 6 ILC tests in an upright position and 6 ILC tests in a supine position on each day. The starting position was counterbalanced. Mixed-effects linear modeling was used to determine if any differences existed between testing days and between upright and supine for Pmax and revolutions per minute at Pmax (RPMpk). Mixed-modeling was also used to calculate intraclass correlation coefficients (ICC) to determine the reliability of the ILC on each testing day for Pmax and RPMpk (ICCs were calculated separately for upright and supine). gKendall fs Tau a h was used to determine the association between ILC Pmax and isokinetic and leg press data. RESULTS: For Pmax, significant differences were found between days 1 and 2 (upright: p = 0.018; supine: p = 0.014) and between days 1 and 3 (upright: p = 0.001; supine: p = 0.002), but not between days 2 and 3 (upright: p = 0.422; supine: p = 0.501). Pmax ICC values were greater than or equal to 0.97 for all days in both positions. Also, no significant differences between upright and supine postures were found for Pmax. No significant differences between days were found for RPMpk; however, there was a significant posture effect (upright greater than supine). Moderate correlations were observed between ILC Pmax and isokinetic and leg press tests (upright: 0.64-0.79, supine: 0.52-0.82). CONCLUSIONS: Overall, ILC is a very reliable test. Since a significant difference was found between day 1 and the other ILC testing days, it is suggested that day 1 of ILC testing should be used as a familiarization session to allow for subject learning. No significant difference in Pmax was seen from test 3 to test 6. However, an increase of 1.3% was observed from test 4 to test 6. Therefore, although 4 tests may be sufficient for most subjects to produce Pmax, in some cases 6 tests may be required. PRACTICAL APPLICATIONS: No differences were seen in Pmax between upright and supine positions despite differing RPMpk. This suggests that ILC testing can be used to provide reliable testing both in an upright position (appropriate for athletes) and in research (e.g., bed rest) or rehabilitation settings where supine testing is necessary. Future research should evaluate whether peak power measurements obtained with the ILC are sensitive to changes such as that observed with training and de-training.
Assessing system reliability and allocating resources: a bayesian approach that integrates multi-level data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Graves, Todd L; Hamada, Michael S

2008-01-01

Good estimates of the reliability of a system make use of test data and expert knowledge at all available levels. Furthermore, by integrating all these information sources, one can determine how best to allocate scarce testing resources to reduce uncertainty. Both of these goals are facilitated by modern Bayesian computational methods. We apply these tools to examples that were previously solvable only through the use of ingenious approximations, and use genetic algorithms to guide resource allocation.
Resimulation of noise: a precision estimator for least square error curve-fitting tested for axial strain time constant imaging

NASA Astrophysics Data System (ADS)

Nair, S. P.; Righetti, R.

2015-05-01

Recent elastography techniques focus on imaging information on properties of materials which can be modeled as viscoelastic or poroelastic. These techniques often require the fitting of temporal strain data, acquired from either a creep or stress-relaxation experiment to a mathematical model using least square error (LSE) parameter estimation. It is known that the strain versus time relationships for tissues undergoing creep compression have a non-linear relationship. In non-linear cases, devising a measure of estimate reliability can be challenging. In this article, we have developed and tested a method to provide non linear LSE parameter estimate reliability: which we called Resimulation of Noise (RoN). RoN provides a measure of reliability by estimating the spread of parameter estimates from a single experiment realization. We have tested RoN specifically for the case of axial strain time constant parameter estimation in poroelastic media. Our tests show that the RoN estimated precision has a linear relationship to the actual precision of the LSE estimator. We have also compared results from the RoN derived measure of reliability against a commonly used reliability measure: the correlation coefficient (CorrCoeff). Our results show that CorrCoeff is a poor measure of estimate reliability for non-linear LSE parameter estimation. While the RoN is specifically tested only for axial strain time constant imaging, a general algorithm is provided for use in all LSE parameter estimation.
The influence of validity criteria on Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) test-retest reliability among high school athletes.

PubMed

Brett, Benjamin L; Solomon, Gary S

2017-04-01

Research findings to date on the stability of Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) Composite scores have been inconsistent, requiring further investigation. The use of test validity criteria across these studies also has been inconsistent. Using multiple measures of stability, we examined test-retest reliability of repeated ImPACT baseline assessments in high school athletes across various validity criteria reported in previous studies. A total of 1146 high school athletes completed baseline cognitive testing using the online ImPACT test battery at two time periods of approximately two-year intervals. No participant sustained a concussion between assessments. Five forms of validity criteria used in previous test-retest studies were applied to the data, and differences in reliability were compared. Intraclass correlation coefficients (ICCs) ranged in composite scores from .47 (95% confidence interval, CI [.38, .54]) to .83 (95% CI [.81, .85]) and showed little change across a two-year interval for all five sets of validity criteria. Regression based methods (RBMs) examining the test-retest stability demonstrated a lack of significant change in composite scores across the two-year interval for all forms of validity criteria, with no cases falling outside the expected range of 90% confidence intervals. The application of more stringent validity criteria does not alter test-retest reliability, nor does it account for some of the variation observed across previously performed studies. As such, use of the ImPACT manual validity criteria should be utilized in the determination of test validity and in the individualized approach to concussion management. Potential future efforts to improve test-retest reliability are discussed.
Test-retest reliability of biodex system 4 pro for isometric ankle-eversion and -inversion measurement.

PubMed

Tankevicius, Gediminas; Lankaite, Doanata; Krisciunas, Aleksandras

2013-08-01

The lack of knowledge about isometric ankle testing indicates the need for research in this area. to assess test-retest reliability and to determine the optimal position for isometric ankle-eversion and -inversion testing. Test-retest reliability study. Isometric ankle eversion and inversion were assessed in 3 different dynamometer foot-plate positions: 0°, 7°, and 14° of inversion. Two maximal repetitions were performed at each angle. Both limbs were tested (40 ankles in total). The test was performed 2 times with a period of 7 d between the tests. University hospital. The study was carried out on 20 healthy athletes with no history of ankle sprains. Reliability was assessed using intraclass correlation coefficient (ICC2,1); minimal detectable change (MDC) was calculated using a 95% confidence interval. Paired t test was used to measure statistically significant changes, and P <.05 was considered statistically significant. Eversion and inversion peak torques showed high ICCs in all 3 angles (ICC values .87-.96, MDC values 3.09-6.81 Nm). Eversion peak torque was the smallest when testing at the 0° angle and gradually increased, reaching maximum values at 14° angle. The increase of eversion peak torque was statistically significant at 7 ° and 14° of inversion. Inversion peak torque showed an opposite pattern-it was the smallest when measured at the 14° angle and increased at the other 2 angles; statistically significant changes were seen only between measures taken at 0° and 14°. Isometric eversion and inversion testing using the Biodex 4 Pro system is a reliable method. The authors suggest that the angle of 7° of inversion is the best for isometric eversion and inversion testing.

Reliability of doming and toe flexion testing to quantify foot muscle strength.

PubMed

Ridge, Sarah Trager; Myrer, J William; Olsen, Mark T; Jurgensmeier, Kevin; Johnson, A Wayne

2017-01-01

Quantifying the strength of the intrinsic foot muscles has been a challenge for clinicians and researchers. The reliable measurement of this strength is important in order to assess weakness, which may contribute to a variety of functional issues in the foot and lower leg, including plantar fasciitis and hallux valgus. This study reports 3 novel methods for measuring foot strength - doming (previously unmeasured), hallux flexion, and flexion of the lesser toes. Twenty-one healthy volunteers performed the strength tests during two testing sessions which occurred one to five days apart. Each participant performed each series of strength tests (doming, hallux flexion, and lesser toe flexion) four times during the first testing session (twice with each of two raters) and two times during the second testing session (once with each rater). Intra-class correlation coefficients were calculated to test for reliability for the following comparisons: between raters during the same testing session on the same day (inter-rater, intra-day, intra-session), between raters on different days (inter-rater, inter-day, inter-session), between days for the same rater (intra-rater, inter-day, inter-session), and between sessions on the same day by the same rater (intra-rater, intra-day, inter-session). ICCs showed good to excellent reliability for all tests between days, raters, and sessions. Average doming strength was 99.96 ± 47.04 N. Average hallux flexion strength was 65.66 ± 24.5 N. Average lateral toe flexion was 50.96 ± 22.54 N. These simple tests using relatively low cost equipment can be used for research or clinical purposes. If repeated testing will be conducted on the same participant, it is suggested that the same researcher or clinician perform the testing each time for optimal reliability.
Assessing the reliability of ecotoxicological studies: An overview of current needs and approaches.

PubMed

Moermond, Caroline; Beasley, Amy; Breton, Roger; Junghans, Marion; Laskowski, Ryszard; Solomon, Keith; Zahner, Holly

2017-07-01

In general, reliable studies are well designed and well performed, and enough details on study design and performance are reported to assess the study. For hazard and risk assessment in various legal frameworks, many different types of ecotoxicity studies need to be evaluated for reliability. These studies vary in study design, methodology, quality, and level of detail reported (e.g., reviews, peer-reviewed research papers, or industry-sponsored studies documented under Good Laboratory Practice [GLP] guidelines). Regulators have the responsibility to make sound and verifiable decisions and should evaluate each study for reliability in accordance with scientific principles regardless of whether they were conducted in accordance with GLP and/or standardized methods. Thus, a systematic and transparent approach is needed to evaluate studies for reliability. In this paper, 8 different methods for reliability assessment were compared using a number of attributes: categorical versus numerical scoring methods, use of exclusion and critical criteria, weighting of criteria, whether methods are tested with case studies, domain of applicability, bias toward GLP studies, incorporation of standard guidelines in the evaluation method, number of criteria used, type of criteria considered, and availability of guidance material. Finally, some considerations are given on how to choose a suitable method for assessing reliability of ecotoxicity studies. Integr Environ Assess Manag 2017;13:640-651. © 2016 The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals, Inc. on behalf of Society of Environmental Toxicology & Chemistry (SETAC). © 2016 The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals, Inc. on behalf of Society of Environmental Toxicology & Chemistry (SETAC).
Reliability and validity of three pain provocation tests used for the diagnosis of chronic proximal hamstring tendinopathy.

PubMed

Cacchio, Angelo; Borra, Fabrizio; Severini, Gabriele; Foglia, Andrea; Musarra, Frank; Taddio, Nicola; De Paulis, Fosco

2012-09-01

The clinical assessment of chronic proximal hamstring tendinopathy (PHT) in athletes is a challenge to sports medicine. To be able to compare the results of research and treatments, the methods used to diagnose and evaluate PHT must be clearly defined and reproducible. To assess the reliability and validity of three pain provocation tests used for the diagnosis of PHT. Ninety-two athletes with (N=46) and without (N=46) PHT were examined by one physician and two physiotherapists, who were trained in the examination techniques before the study. The examiners were blinded to the symptoms and identity of the athletes. The three pain provocation tests examined were the Puranen-Orava, bent-knee stretch and modified bent-knee stretch tests. Intraclass correlation coefficients (ICCs) based on the repeated measures analysis of variance were used to analyse the intraexaminer and interexaminer reliability, while sensitivity, specificity, predictive values and likelihood ratios were used to determine the validity of the three tests. The ICC values in all three tests revealed a high correlation (range 0.82 to 0.88) for the interexaminer reliability and a high-to-very high correlation (range 0.87 to 0.93) for the intraexaminer reliability. All three tests displayed a moderate-to-high validity, with the highest degree of validity being yielded by the modified bent-knee stretch test. All three pain provocation tests proved to be of potential value in assessing chronic PHT in athletes. However, we recommend that they be used in conjunction with other objective measures, such as MRI.
THE 6-MINUTE WALK TEST AND OTHER CLINICAL ENDPOINTS IN DUCHENNE MUSCULAR DYSTROPHY: RELIABILITY, CONCURRENT VALIDITY, AND MINIMAL CLINICALLY IMPORTANT DIFFERENCES FROM A MULTICENTER STUDY

PubMed Central

McDonald, Craig M; Henricson, Erik K; Abresch, R Ted; Florence, Julaine; Eagle, Michelle; Gappmaier, Eduard; Glanzman, Allan M; Spiegel, Robert; Barth, Jay; Elfring, Gary; Reha, Allen; Peltz, Stuart W

2013-01-01

Introduction: An international clinical trial enrolled 174 ambulatory males ≥5 years old with nonsense mutation Duchenne muscular dystrophy (nmDMD). Pretreatment data provide insight into reliability, concurrent validity, and minimal clinically important differences (MCIDs) of the 6-minute walk test (6MWT) and other endpoints. Methods: Screening and baseline evaluations included the 6-minute walk distance (6MWD), timed function tests (TFTs), quantitative strength by myometry, the PedsQL, heart rate–determined energy expenditure index, and other exploratory endpoints. Results: The 6MWT proved feasible and reliable in a multicenter context. Concurrent validity with other endpoints was excellent. The MCID for 6MWD was 28.5 and 31.7 meters based on 2 statistical distribution methods. Conclusions: The ratio of MCID to baseline mean is lower for 6MWD than for other endpoints. The 6MWD is an optimal primary endpoint for Duchenne muscular dystrophy (DMD) clinical trials that are focused therapeutically on preservation of ambulation and slowing of disease progression. Muscle Nerve 48: 357–368, 2013 PMID:23674289
Comparison of Medical and Consumer Wireless EEG Systems for Use in Clinical Trials.

PubMed

Ratti, Elena; Waninger, Shani; Berka, Chris; Ruffini, Giulio; Verma, Ajay

2017-01-01

Objectives: To compare quantitative EEG signal and test-retest reliability of medical grade and consumer EEG systems. Methods: Resting state EEG was acquired by two medical grade (B-Alert, Enobio) and two consumer (Muse, Mindwave) EEG systems in five healthy subjects during two study visits. EEG patterns, power spectral densities (PSDs) and test/retest reliability in eyes closed and eyes open conditions were compared across the four systems, focusing on Fp1, the only common electrode. Fp1 PSDs were obtained using Welch's modified periodogram method and averaged for the five subjects for each visit. The test/retest results were calculated as a ratio of Visit 1/Visit 2 Fp1 channel PSD at each 1 s epoch. Results: B-Alert, Enobio, and Mindwave Fp1 power spectra were similar. Muse showed a broadband increase in power spectra and the highest relative variation across test-retest acquisitions. Consumer systems were more prone to artifact due to eye blinks and muscle movement in the frontal region. Conclusions: EEG data can be successfully collected from all four systems tested. Although there was slightly more time required for application, medical systems offer clear advantages in data quality, reliability, and depth of analysis over the consumer systems. Significance: This evaluation provides evidence for informed selection of EEG systemsappropriate for clinical trials.
Test-retest reliability of computer-based video analysis of general movements in healthy term-born infants.

PubMed

Valle, Susanne Collier; Støen, Ragnhild; Sæther, Rannei; Jensenius, Alexander Refsum; Adde, Lars

2015-10-01

A computer-based video analysis has recently been presented for quantitative assessment of general movements (GMs). This method's test-retest reliability, however, has not yet been evaluated. The aim of the current study was to evaluate the test-retest reliability of computer-based video analysis of GMs, and to explore the association between computer-based video analysis and the temporal organization of fidgety movements (FMs). Test-retest reliability study. 75 healthy, term-born infants were recorded twice the same day during the FMs period using a standardized video set-up. The computer-based movement variables "quantity of motion mean" (Qmean), "quantity of motion standard deviation" (QSD) and "centroid of motion standard deviation" (CSD) were analyzed, reflecting the amount of motion and the variability of the spatial center of motion of the infant, respectively. In addition, the association between the variable CSD and the temporal organization of FMs was explored. Intraclass correlation coefficients (ICC 1.1 and ICC 3.1) were calculated to assess test-retest reliability. The ICC values for the variables CSD, Qmean and QSD were 0.80, 0.80 and 0.86 for ICC (1.1), respectively; and 0.80, 0.86 and 0.90 for ICC (3.1), respectively. There were significantly lower CSD values in the recordings with continual FMs compared to the recordings with intermittent FMs (p<0.05). This study showed high test-retest reliability of computer-based video analysis of GMs, and a significant association between our computer-based video analysis and the temporal organization of FMs. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
W5″ Test: A simple method for measuring mean power output in the bench press exercise.

PubMed

Tous-Fajardo, Julio; Moras, Gerard; Rodríguez-Jiménez, Sergio; Gonzalo-Skok, Oliver; Busquets, Albert; Mujika, Iñigo

2016-11-01

The aims of the present study were to assess the validity and reliability of a novel simple test [Five Seconds Power Test (W5″ Test)] for estimating the mean power output during the bench press exercise at different loads, and its sensitivity to detect training-induced changes. Thirty trained young men completed as many repetitions as possible in a time of ≈5 s at 25%, 45%, 65% and 85% of one-repetition maximum (1RM) in two test sessions separated by four days. The number of repetitions, linear displacement of the bar and time needed to complete the test were recorded by two independent testers, and a linear encoder was used as the criterion measure. For each load, the mean power output was calculated in the W5″ Test as mechanical work per time unit and compared with that obtained from the linear encoder. Subsequently, 20 additional subjects (10 training group vs. 10 control group) were assessed before and after completing a seven-week training programme designed to improve maximal power. Results showed that both assessment methods correlated highly in estimating mean power output at different loads (r range: 0.86-0.94; p < .01) and detecting training-induced changes (R(2): 0.78). Good to excellent intra-tester (intraclass correlation coefficient (ICC) range: 0.81-0.97) and excellent inter-tester (ICC range: 0.96-0.99; coefficient of variation range: 2.4-4.1%) reliability was found for all loads. The W5″ Test was shown to be a valid, reliable and sensitive method for measuring mean power output during the bench press exercise in subjects who have previous resistance training experience.
Reliability analysis of the objective structured clinical examination using generalizability theory.

PubMed

Trejo-Mejía, Juan Andrés; Sánchez-Mendiola, Melchor; Méndez-Ramírez, Ignacio; Martínez-González, Adrián

2016-01-01

The objective structured clinical examination (OSCE) is a widely used method for assessing clinical competence in health sciences education. Studies using this method have shown evidence of validity and reliability. There are no published studies of OSCE reliability measurement with generalizability theory (G-theory) in Latin America. The aims of this study were to assess the reliability of an OSCE in medical students using G-theory and explore its usefulness for quality improvement. An observational cross-sectional study was conducted at National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City. A total of 278 fifth-year medical students were assessed with an 18-station OSCE in a summative end-of-career final examination. There were four exam versions. G-theory with a crossover random effects design was used to identify the main sources of variance. Examiners, standardized patients, and cases were considered as a single facet of analysis. The exam was applied to 278 medical students. The OSCE had a generalizability coefficient of 0.93. The major components of variance were stations, students, and residual error. The sites and the versions of the tests had minimum variance. Our study achieved a G coefficient similar to that found in other reports, which is acceptable for summative tests. G-theory allows the estimation of the magnitude of multiple sources of error and helps decision makers to determine the number of stations, test versions, and examiners needed to obtain reliable measurements.
Validation and Test-Retest Reliability of New Thermographic Technique Called Thermovision Technique of Dry Needling for Gluteus Minimus Trigger Points in Sciatica Subjects and TrPs-Negative Healthy Volunteers

PubMed Central

Rychlik, Michał; Samborski, Włodzimierz

2015-01-01

The aim of this study was to assess the validity and test-retest reliability of Thermovision Technique of Dry Needling (TTDN) for the gluteus minimus muscle. TTDN is a new thermography approach used to support trigger points (TrPs) diagnostic criteria by presence of short-term vasomotor reactions occurring in the area where TrPs refer pain. Method. Thirty chronic sciatica patients (n=15 TrP-positive and n=15 TrPs-negative) and 15 healthy volunteers were evaluated by TTDN three times during two consecutive days based on TrPs of the gluteus minimus muscle confirmed additionally by referred pain presence. TTDN employs average temperature (T avr), maximum temperature (T max), low/high isothermal-area, and autonomic referred pain phenomenon (AURP) that reflects vasodilatation/vasoconstriction. Validity and test-retest reliability were assessed concurrently. Results. Two components of TTDN validity and reliability, T avr and AURP, had almost perfect agreement according to κ (e.g., thigh: 0.880 and 0.938; calf: 0.902 and 0.956, resp.). The sensitivity for T avr, T max, AURP, and high isothermal-area was 100% for everyone, but specificity of 100% was for T avr and AURP only. Conclusion. TTDN is a valid and reliable method for T avr and AURP measurement to support TrPs diagnostic criteria for the gluteus minimus muscle when digitally evoked referred pain pattern is present. PMID:26137486
Establishing Reliability and Validity of the Criterion Referenced Exam of GeoloGy Standards EGGS

NASA Astrophysics Data System (ADS)

Guffey, S. K.; Slater, S. J.; Slater, T. F.; Schleigh, S.; Burrows, A. C.

2016-12-01

Discipline-based geoscience education researchers have considerable need for a criterion-referenced, easy-to-administer and -score conceptual diagnostic survey for undergraduates taking introductory science survey courses in order for faculty to better be able to monitor the learning impacts of various interactive teaching approaches. To support ongoing education research across the geosciences, we are continuing to rigorously and systematically work to firmly establish the reliability and validity of the recently released Exam of GeoloGy Standards, EGGS. In educational testing, reliability refers to the consistency or stability of test scores whereas validity refers to the accuracy of the inferences or interpretations one makes from test scores. There are several types of reliability measures being applied to the iterative refinement of the EGGS survey, including test-retest, alternate form, split-half, internal consistency, and interrater reliability measures. EGGS rates strongly on most measures of reliability. For one, Cronbach's alpha provides a quantitative index indicating the extent to which if students are answering items consistently throughout the test and measures inter-item correlations. Traditional item analysis methods further establish the degree to which a particular item is reliably assessing students is actually quantifiable, including item difficulty and item discrimination. Validity, on the other hand, is perhaps best described by the word accuracy. For example, content validity is the to extent to which a measurement reflects the specific intended domain of the content, stemming from judgments of people who are either experts in the testing of that particular content area or are content experts. Perhaps more importantly, face validity is a judgement of how representative an instrument is reflective of the science "at face value" and refers to the extent to which a test appears to measure a the targeted scientific domain as viewed by laypersons, examinees, test users, the public, and other invested stakeholders.
Reliability and Validity Evidence of Multiple Balance Assessments in Athletes With a Concussion

PubMed Central

Murray, Nicholas; Salvatore, Anthony; Powell, Douglas; Reed-Jones, Rebecca

2014-01-01

Context: An estimated 300 000 sport-related concussion injuries occur in the United States annually. Approximately 30% of individuals with concussions experience balance disturbances. Common methods of balance assessment include the Clinical Test of Sensory Organization and Balance (CTSIB), the Sensory Organization Test (SOT), the Balance Error Scoring System (BESS), and the Romberg test; however, the National Collegiate Athletic Association recommended the Wii Fit as an alternative measure of balance in athletes with a concussion. A central concern regarding the implementation of the Wii Fit is whether it is reliable and valid for measuring balance disturbance in athletes with concussion. Objective: To examine the reliability and validity evidence for the CTSIB, SOT, BESS, Romberg test, and Wii Fit for detecting balance disturbance in athletes with a concussion. Data Sources: Literature considered for review included publications with reliability and validity data for the assessments of balance (CTSIB, SOT, BESS, Romberg test, and Wii Fit) from PubMed, PsycINFO, and CINAHL. Data Extraction: We identified 63 relevant articles for consideration in the review. Of the 63 articles, 28 were considered appropriate for inclusion and 35 were excluded. Data Synthesis: No current reliability or validity information supports the use of the CTSIB, SOT, Romberg test, or Wii Fit for balance assessment in athletes with a concussion. The BESS demonstrated moderate to high reliability (interclass correlation coefficient = 0.87) and low to moderate validity (sensitivity = 34%, specificity = 87%). However, the Romberg test and Wii Fit have been shown to be reliable tools in the assessment of balance in Parkinson patients. Conclusions: The BESS can evaluate balance problems after a concussion. However, it lacks the ability to detect balance problems after the third day of recovery. Further investigation is needed to establish the use of the CTSIB, SOT, Romberg test, and Wii Fit for assessing balance in athletes with concussions. PMID:24933431
Validity and Reliability of the Turkish Chronic Pain Acceptance Questionnaire

PubMed

Akmaz, Hazel Ekin; Uyar, Meltem; Kuzeyli Yıldırım, Yasemin; Akın Korhan, Esra

2018-05-29

Pain acceptance is the process of giving up the struggle with pain and learning to live a worthwhile life despite it. In assessing patients with chronic pain in Turkey, making a diagnosis and tracking the effectiveness of treatment is done with scales that have been translated into Turkish. However, there is as yet no valid and reliable scale in Turkish to assess the acceptance of pain. To validate a Turkish version of the Chronic Pain Acceptance Questionnaire developed by McCracken and colleagues. Methodological and cross sectional study. A simple randomized sampling method was used in selecting the study sample. The sample was composed of 201 patients, more than 10 times the number of items examined for validity and reliability in the study, which totaled 20. A patient identification form, the Chronic Pain Acceptance Questionnaire, and the Brief Pain Inventory were used to collect data. Data were collected by face-to-face interviews. In the validity testing, the content validity index was used to evaluate linguistic equivalence, content validity, construct validity, and expert views. In reliability testing of the scale, Cronbach’s α coefficient was calculated, and item analysis and split-test reliability methods were used. Principal component analysis and varimax rotation were used in factor analysis and to examine factor structure for construct concept validity. The item analysis established that the scale, all items, and item-total correlations were satisfactory. The mean total score of the scale was 21.78. The internal consistency coefficient was 0.94, and the correlation between the two halves of the scale was 0.89. The Chronic Pain Acceptance Questionnaire, which is intended to be used in Turkey upon confirmation of its validity and reliability, is an evaluation instrument with sufficient validity and reliability, and it can be reliably used to examine patients’ acceptance of chronic pain.
An In vitro evaluation of the reliability of QR code denture labeling technique.

PubMed

Poovannan, Sindhu; Jain, Ashish R; Krishnan, Cakku Jalliah Venkata; Chandran, Chitraa R

2016-01-01

Positive identification of the dead after accidents and disasters through labeled dentures plays a key role in forensic scenario. A number of denture labeling methods are available, and studies evaluating their reliability under drastic conditions are vital. This study was conducted to evaluate the reliability of QR (Quick Response) Code labeled at various depths in heat-cured acrylic blocks after acid treatment, heat treatment (burns), and fracture in forensics. It was an in vitro study. This study included 160 specimens of heat-cured acrylic blocks (1.8 cm × 1.8 cm) and these were divided into 4 groups (40 samples per group). QR Codes were incorporated in the samples using clear acrylic sheet and they were assessed for reliability under various depths, acid, heat, and fracture. Data were analyzed using Chi-square test, test of proportion. The QR Code inclusion technique was reliable under various depths of acrylic sheet, acid (sulfuric acid 99%, hydrochloric acid 40%) and heat (up to 370°C). Results were variable with fracture of QR Code labeled acrylic blocks. Within the limitations of the study, by analyzing the results, it was clearly indicated that the QR Code technique was reliable under various depths of acrylic sheet, acid, and heat (370°C). Effectiveness varied in fracture and depended on the level of distortion. This study thus suggests that QR Code is an effective and simpler denture labeling method.
The relationship between ground reaction force in sit-to-stand movement and lower extremity function in community-dwelling Japanese older adults using long-term care insurance services

PubMed Central

Shen, Shaoshuai; Abe, Takumi; Tsuji, Taishi; Fujii, Keisuke; Ma, Jingyu; Okura, Tomohiro

2017-01-01

[Purpose] The purpose of this study was to investigate which of the four chair-rising methods has low-load and the highest success rate, and whether the GRF parameters in that method are useful for measuring lower extremity function among physically frail Japanese older adults. [Subjects and Methods] Fifty-two individuals participated in this study. The participants voluntarily attempted four types of Sit-to-stand test (one variation without and three variations with the use of their arms). The following parameters were measured: peak reaction force (F/w), two force development rate parameters (RFD1.25/w, RFD8.75/w) and two time-related parameters (T1, T2). Three additional commonly employed clinical tests (One-leg balance with eyes open, Timed up and go and 5-meter walk test) were also conducted. [Results] “Hands on a chair” chair-rising method produced the highest success rate among the four methods. All parameters were highly reliable between testing occasions. T2 showed strongly significant associations with Timed up and go and 5-meter walk test in males. RFD8.75/w showed significant associations with Timed up and go and 5-meter walk test in females. [Conclusion] Ground reaction force parameters in the Sit-to-stand test are a reliable and useful method for assessment of lower extremity function in physically frail Japanese older adults. PMID:28931988
Intra-rater reliability of hallux flexor strength measures using the Nintendo Wii Balance Board.

PubMed

Quek, June; Treleaven, Julia; Brauer, Sandra G; O'Leary, Shaun; Clark, Ross A

2015-01-01

The purpose of this study was to investigate the intra-rater reliability of a new method in combination with the Nintendo Wii Balance Board (NWBB) to measure the strength of hallux flexor muscle. Thirty healthy individuals (age: 34.9 ± 12.9 years, height: 170.4 ± 10.5 cm, weight: 69.3 ± 15.3 kg, female = 15) participated. Repeated testing was completed within 7 days. Participants performed strength testing in sitting using a wooden platform in combination with the NWBB. This new method was set up to selectively recruit an intrinsic muscle of the foot, specifically the flexor hallucis brevis muscle. Statistical analysis was performed using intra-class coefficients and ordinary least product analysis. To estimate measurement error, standard error of measurement (SEM), minimal detectable change (MDC) and percentage error were calculated. Results indicate excellent intra-rater reliability (ICC = 0.982, CI = 0.96-0.99) with an absence of systematic bias. SEM, MDC and percentage error value were 0.5, 1.4 and 12 % respectively. This study demonstrates that a new method in combination with the NWBB application is reliable to measure hallux flexor strength and has potential to be used for future research and clinical application.
An Examination of Rater Performance on a Local Oral English Proficiency Test: A Mixed-Methods Approach

ERIC Educational Resources Information Center

Yan, Xun

2014-01-01

This paper reports on a mixed-methods approach to evaluate rater performance on a local oral English proficiency test. Three types of reliability estimates were reported to examine rater performance from different perspectives. Quantitative results were also triangulated with qualitative rater comments to arrive at a more representative picture of…
What does an MRI scan cost?

PubMed

Young, David W

2015-11-01

Historically, hospital departments have computed the costs of individual tests or procedures using the ratio of cost to charges (RCC) method, which can produce inaccurate results. To determine a more accurate cost of a test or procedure, the activity-based costing (ABC) method must be used. Accurate cost calculations will ensure reliable information about the profitability of a hospital's DRGs.
Testing the Self-Efficacy Questionnaire with Korean Children in Institutionalized Care

ERIC Educational Resources Information Center

Kim, Youngmi; Kim, Kyeongmo; Lee, Shinhye

2017-01-01

Purpose: We tested the reliability and validity of the Self-Efficacy Questionnaire for Children (SEQ-C) in a sample of children living in orphanages in South Korea. Methods: Our study sample consisted of 334 children aged 13-18 obtained using a convenience sampling method. We conducted a confirmatory factor analysis to identify the factor…
Analysis instrument test on mathematical power the material geometry of space flat side for grade 8

NASA Astrophysics Data System (ADS)

Kusmaryono, Imam; Suyitno, Hardi; Dwijanto, Karomah, Nur

2017-08-01

The main problem of research to determine the quality of test items on the material side of flat geometry to assess students' mathematical power. The method used is quantitative descriptive. The subjects were students of class 8 as many as 20 students. The object of research is the quality of test items in terms of the power of mathematics: validity, reliability, level of difficulty and power differentiator. Instrument mathematical power ratings are tested include: written tests and questionnaires about the disposition of mathematical power. Data were obtained from the field, in the form of test data on the material geometry of space flat side and questionnaires. The results of the test instrument to the reliability of the test item is influenced by many factors. Factors affecting the reliability of the instrument is the number of items, homogeneity test questions, the time required, the uniformity of conditions of the test taker, the homogeneity of the group, the variability problem, and motivation of the individual (person taking the test). Overall, the evaluation results of this study stated that the test instrument can be used as a tool to measure students' mathematical power.
A New Tool for Nutrition App Quality Evaluation (AQEL): Development, Validation, and Reliability Testing

PubMed Central

Huang, Wenhao; Chapman-Novakofski, Karen M

2017-01-01

Background The extensive availability and increasing use of mobile apps for nutrition-based health interventions makes evaluation of the quality of these apps crucial for integration of apps into nutritional counseling. Objective The goal of this research was the development, validation, and reliability testing of the app quality evaluation (AQEL) tool, an instrument for evaluating apps’ educational quality and technical functionality. Methods Items for evaluating app quality were adapted from website evaluations, with additional items added to evaluate the specific characteristics of apps, resulting in 79 initial items. Expert panels of nutrition and technology professionals and app users reviewed items for face and content validation. After recommended revisions, nutrition experts completed a second AQEL review to ensure clarity. On the basis of 150 sets of responses using the revised AQEL, principal component analysis was completed, reducing AQEL into 5 factors that underwent reliability testing, including internal consistency, split-half reliability, test-retest reliability, and interrater reliability (IRR). Two additional modifiable constructs for evaluating apps based on the age and needs of the target audience as selected by the evaluator were also tested for construct reliability. IRR testing using intraclass correlations (ICC) with all 7 constructs was conducted, with 15 dietitians evaluating one app. Results Development and validation resulted in the 51-item AQEL. These were reduced to 25 items in 5 factors after principal component analysis, plus 9 modifiable items in two constructs that were not included in principal component analysis. Internal consistency and split-half reliability of the following constructs derived from principal components analysis was good (Cronbach alpha >.80, Spearman-Brown coefficient >.80): behavior change potential, support of knowledge acquisition, app function, and skill development. App purpose split half-reliability was .65. Test-retest reliability showed no significant change over time (P>.05) for all but skill development (P=.001). Construct reliability was good for items assessing age appropriateness of apps for children, teens, and a general audience. In addition, construct reliability was acceptable for assessing app appropriateness for various target audiences (Cronbach alpha >.70). For the 5 main factors, ICC (1,k) was >.80, with a P value of <.05. When 15 nutrition professionals evaluated one app, ICC (2,15) was .98, with a P value of <.001 for all 7 constructs when the modifiable items were specified for adults seeking weight loss support. Conclusions Our preliminary effort shows that AQEL is a valid, reliable instrument for evaluating nutrition apps’ qualities for clinical interventions by nutrition clinicians, educators, and researchers. Further efforts in validating AQEL in various contexts are needed. PMID:29079554

Validity and reliability of wii fit balance board for the assessment of balance of healthy young adults and the elderly.

PubMed

Chang, Wen-Dien; Chang, Wan-Yi; Lee, Chia-Lun; Feng, Chi-Yen

2013-10-01

[Purpose] Balance is an integral part of human ability. The smart balance master system (SBM) is a balance test instrument with good reliability and validity, but it is expensive. Therefore, we modified a Wii Fit balance board, which is a convenient balance assessment tool, and analyzed its reliability and validity. [Subjects and Methods] We recruited 20 healthy young adults and 20 elderly people, and administered 3 balance tests. The correlation coefficient and intraclass correlation of both instruments were analyzed. [Results] There were no statistically significant differences in the 3 tests between the Wii Fit balance board and the SBM. The Wii Fit balance board had a good intraclass correlation (0.86-0.99) for the elderly people and positive correlations (r = 0.58-0.86) with the SBM. [Conclusions] The Wii Fit balance board is a balance assessment tool with good reliability and high validity for elderly people, and we recommend it as an alternative tool for assessing balance ability.
[Authorization, translation, back translation and language modification of the simplified Chinese adult comorbidity-27 index].

PubMed

Gao, L; Mao, C; Yu, G Y; Peng, X

2016-10-09

Objective: To translate the adult comorbidity evaluation-27(ACE-27) index authored by professor JF Piccirillo into Chinese and for the purpose of assessing the possible impact of comorbidity on survival of oral cancer patients and improving cancer staging. Methods: The translation included the following steps, obtaining permission from professor Piccirillo, translation, back translation, language modification, adjusted by the advice from the professors of oral and maxillofacial surgery. The test population included 154 patients who were admitted to Peking University of Stomatology during March 2011. Questionnaire survey was conducted on these patients. Retest of reliability, internal consistency reliability, content validity, and structure validity were performed. Results: The simplified Chinese ACE-27 index was established. The Cronbach's α was 0.821 in the internal consistency reliability test. The Kaiser-Meyer-Olkin (KMO) value of 8 items was 0.859 in the structure validity test. Conclusions: The simplified Chinese ACE-27 index has good feasibility and reliability. It is useful to assess the comorbidity of oral cancer patients.
INTRA-RATER RELIABILITY OF THE MULTIPLE SINGLE-LEG HOP-STABILIZATION TEST AND RELATIONSHIPS WITH AGE, LEG DOMINANCE AND TRAINING.

PubMed

Sawle, Leanne; Freeman, Jennifer; Marsden, Jonathan

2017-04-01

Balance is a complex construct, affected by multiple components such as strength and co-ordination. However, whilst assessing an athlete's dynamic balance is an important part of clinical examination, there is no gold standard measure. The multiple single-leg hop-stabilization test is a functional test which may offer a method of evaluating the dynamic attributes of balance, but it needs to show adequate intra-tester reliability. The purpose of this study was to assess the intra-rater reliability of a dynamic balance test, the multiple single-leg hop-stabilization test on the dominant and non-dominant legs. Intra-rater reliability study. Fifteen active participants were tested twice with a 10-minute break between tests. The outcome measure was the multiple single-leg hop-stabilization test score, based on a clinically assessed numerical scoring system. Results were analysed using an Intraclass Correlations Coefficient (ICC 2,1 ) and Bland-Altman plots. Regression analyses explored relationships between test scores, leg dominance, age and training (an alpha level of p = 0.05 was selected). ICCs for intra-rater reliability were 0.85 for the dominant and non-dominant legs (confidence intervals = 0.62-0.95 and 0.61-0.95 respectively). Bland-Altman plots showed scores within two standard deviations. A significant correlation was observed between the dominant and non-dominant leg on balance scores (R 2 =0.49, p<0.05), and better balance was associated with younger participants in their non-dominant leg (R 2 =0.28, p<0.05) and their dominant leg (R 2 =0.39, p<0.05), and a higher number of hours spent training for the non-dominant leg R 2 =0.37, p<0.05). The multiple single-leg hop-stabilisation test demonstrated strong intra-tester reliability with active participants. Younger participants who trained more, have better balance scores. This test may be a useful measure for evaluating the dynamic attributes of balance. 3.
Manual unloading of the lumbar spine: can it identify immediate responders to mechanical traction in a low back pain population? A study of reliability and criterion referenced predictive validity

PubMed Central

Swanson, Brian T.; Riley, Sean P.; Cote, Mark P.; Leger, Robin R.; Moss, Isaac L.; Carlos,, John

2016-01-01

Background To date, no research has examined the reliability or predictive validity of manual unloading tests of the lumbar spine to identify potential responders to lumbar mechanical traction. Purpose To determine: (1) the intra and inter-rater reliability of a manual unloading test of the lumbar spine and (2) the criterion referenced predictive validity for the manual unloading test. Methods Ten volunteers with low back pain (LBP) underwent a manual unloading test to establish reliability. In a separate procedure, 30 consecutive patients with LBP (age 50·86±11·51) were assessed for pain in their most provocative standing position (visual analog scale (VAS) 49·53±25·52 mm). Patients were assessed with a manual unloading test in their most provocative position followed by a single application of intermittent mechanical traction. Post traction, pain in the provocative position was reassessed and utilized as the outcome criterion. Results The test of unloading demonstrated substantial intra and inter-rater reliability K = 1·00, P = 0·002, K = 0·737, P = 0·001, respectively. There were statistically significant within group differences for pain response following traction for patients with a positive manual unloading test (P<0·001), while patients with a negative manual unloading test did not demonstrate a statistically significant change (P>0·05). There were significant between group differences for proportion of responders to traction based on manual unloading response (P = 0·031), and manual unloading response demonstrated a moderate to strong relationship with traction response Phi = 0·443, P = 0·015. Discussion and conclusion The manual unloading test appears to be a reliable test and has a moderate to strong correlation with pain relief that exceeds minimal clinically important difference (MCID) following traction supporting the validity of this test. PMID:27559274
Reliability of a device for the knee and ankle isometric and isokinetic strength testing in older adults

PubMed Central

Bergamin, Marco; Gobbo, Stefano; Bullo, Valentina; Vendramin, Barbara; Duregon, Federica; Frizziero, Antonio; Di Blasio, Andrea; Cugusi, Lucia; Zaccaria, Marco; Ermolao, Andrea

2017-01-01

Summary Background Lower extremity muscle mass, strength, power, and physical performance are critical determinants of independent functioning in later life. Isokinetic dynamometers are becoming very common in assessing different features of muscle strength, in both research and clinical practice; however, reliability studies are still needed to support the extended use of those devices. Objective The purpose of this study is to assess the test-retest reliability of knee and ankle isokinetic and isometric strength testing protocols in a sample of older healthy subjects, using a new and untested isokinetic multi-joint evaluation system. Methods Sixteen male and fourteen female older adults (mean age 65.2 ± 4.6 years) were assessed in two testing sessions. Each participant performed a randomized testing procedure that includes different isometric and isokinetic tests for knee and ankle joints. Results All participants concluded the trial safety and no subject reported any discomfort throughout the overall assessment. Coefficients of correlation between measures were calculated showing moderate to strong effects among all test-retest assessments and paired-sample t test showed only one significant difference (p<0.05) in the maximal isokinetic bilateral knee flexion torque. Conclusions The multi-joint evaluation system for the assessment of knee and ankle isokinetic and isometric strength provided reliable test-retest measures in healthy older adults. Level of evidence Ib. PMID:29264344
A COMPARISON OF BULK SEDIMENT TOXICITY TESTING METHODS AND SEDIMENT ELUTRIATE TOXICITY

EPA Science Inventory

Bulk sediment toxicity tests are routinely used to assess the level and extent of contamination in natural sediments. While reliable, these tests can be resource intensive, requiring significant outlays of time and materials. The purpose of this study was to compare the results ...
Small-Scale System for Evaluation of Stretch-Flangeability with Excellent Reliability

NASA Astrophysics Data System (ADS)

Yoon, Jae Ik; Jung, Jaimyun; Lee, Hak Hyeon; Kim, Hyoung Seop

2018-02-01

We propose a system for evaluating the stretch-flangeability of small-scale specimens based on the hole-expansion ratio (HER). The system has no size effect and shows excellent reproducibility, reliability, and economic efficiency. To verify the reliability and reproducibility of the proposed hole-expansion testing (HET) method, the deformation behavior of the conventional standard stretch-flangeability evaluation method was compared with the proposed method using finite-element method simulations. The distribution of shearing defects in the hole-edge region of the specimen, which has a significant influence on the HER, was investigated using scanning electron microscopy. The stretch-flangeability of several kinds of advanced high-strength steel determined using the conventional standard method was compared with that using the proposed small-scale HET method. It was verified that the deformation behavior, morphology and distribution of shearing defects, and stretch-flangeability results for the specimens were the same for the conventional standard method and the proposed small-scale stretch-flangeability evaluation system.
Small-Scale System for Evaluation of Stretch-Flangeability with Excellent Reliability

NASA Astrophysics Data System (ADS)

Yoon, Jae Ik; Jung, Jaimyun; Lee, Hak Hyeon; Kim, Hyoung Seop

2018-06-01

We propose a system for evaluating the stretch-flangeability of small-scale specimens based on the hole-expansion ratio (HER). The system has no size effect and shows excellent reproducibility, reliability, and economic efficiency. To verify the reliability and reproducibility of the proposed hole-expansion testing (HET) method, the deformation behavior of the conventional standard stretch-flangeability evaluation method was compared with the proposed method using finite-element method simulations. The distribution of shearing defects in the hole-edge region of the specimen, which has a significant influence on the HER, was investigated using scanning electron microscopy. The stretch-flangeability of several kinds of advanced high-strength steel determined using the conventional standard method was compared with that using the proposed small-scale HET method. It was verified that the deformation behavior, morphology and distribution of shearing defects, and stretch-flangeability results for the specimens were the same for the conventional standard method and the proposed small-scale stretch-flangeability evaluation system.
The Reliability of Recorded Text Test Scores: Widespread Inconsistent Intelligibility Testing in Minority Languages

ERIC Educational Resources Information Center

Yoder, Zachariah

2017-01-01

The recorded text test (RTT) is commonly used to test dialect intelligibility, often to inform language development decisions. More than 25 papers using the RTT method were published on www.sil.org/silesr from January 2009 to March 2013. As introduced by Casad [1974. "Dialect Intelligibility Testing." Summer Institute of Linguistics…
The Pareidolia Test: A Simple Neuropsychological Test Measuring Visual Hallucination-Like Illusions

PubMed Central

Mamiya, Yasuyuki; Nishio, Yoshiyuki; Watanabe, Hiroyuki; Yokoi, Kayoko; Uchiyama, Makoto; Baba, Toru; Iizuka, Osamu; Kanno, Shigenori; Kamimura, Naoto; Kazui, Hiroaki; Hashimoto, Mamoru; Ikeda, Manabu; Takeshita, Chieko; Shimomura, Tatsuo; Mori, Etsuro

2016-01-01

Background Visual hallucinations are a core clinical feature of dementia with Lewy bodies (DLB), and this symptom is important in the differential diagnosis and prediction of treatment response. The pareidolia test is a tool that evokes visual hallucination-like illusions, and these illusions may be a surrogate marker of visual hallucinations in DLB. We created a simplified version of the pareidolia test and examined its validity and reliability to establish the clinical utility of this test. Methods The pareidolia test was administered to 52 patients with DLB, 52 patients with Alzheimer’s disease (AD) and 20 healthy controls (HCs). We assessed the test-retest/inter-rater reliability using the intra-class correlation coefficient (ICC) and the concurrent validity using the Neuropsychiatric Inventory (NPI) hallucinations score as a reference. A receiver operating characteristic (ROC) analysis was used to evaluate the sensitivity and specificity of the pareidolia test to differentiate DLB from AD and HCs. Results The pareidolia test required approximately 15 minutes to administer, exhibited good test-retest/inter-rater reliability (ICC of 0.82), and moderately correlated with the NPI hallucinations score (rs = 0.42). Using an optimal cut-off score set according to the ROC analysis, and the pareidolia test differentiated DLB from AD with a sensitivity of 81% and a specificity of 92%. Conclusions Our study suggests that the simplified version of the pareidolia test is a valid and reliable surrogate marker of visual hallucinations in DLB. PMID:27171377
Motivational Interviewing Skills in Health Care Encounters (MISHCE): Development and psychometric testing of an assessment tool.

PubMed

Petrova, Tatjana; Kavookjian, Jan; Madson, Michael B; Dagley, John; Shannon, David; McDonough, Sharon K

2015-01-01

Motivational interviewing (MI) has demonstrated a significant impact as an intervention strategy for addiction management, change in lifestyle behaviors, and adherence to prescribed medication and other treatments. Key elements to studying MI include training in MI of professionals who will use it, assessment of skills acquisition in trainees, and the use of a validated skills assessment tool. The purpose of this research project was to develop a psychometrically valid and reliable tool that has been designed to assess MI skills competence in health care provider trainees. The goal was to develop an assessment tool that would evaluate the acquisition and use of specific MI skills and principles, as well as the quality of the patient-provider therapeutic alliance in brief health care encounters. To address this purpose, specific steps were followed, beginning with a literature review. This review contributed to the development of relevant conceptual and operational definitions, selecting a scaling technique and response format, and methods for analyzing validity and reliability. Internal consistency reliability was established on 88 video recorded interactions. The inter-rater and test-retest reliability were established using randomly selected 18 from the 88 interactions. The assessment tool Motivational Interviewing Skills for Health Care Encounters (MISHCE) and a manual for use of the tool were developed. Validity and reliability of MISHCE were examined. Face and content validity were supported with well-defined conceptual and operational definitions and feedback from an expert panel. Reliability was established through internal consistency, inter-rater reliability, and test-retest reliability. The overall internal consistency reliability (Cronbach's alpha) for all fifteen items was 0.75. MISHCE demonstrated good inter-rater reliability and good to excellent test-retest reliability. MISHCE assesses the health provider's level of knowledge and skills in brief disease management encounters. MISHCE also evaluates quality of the patient-provider therapeutic alliance, i.e., the "flow" of the interaction. Copyright © 2015 Elsevier Inc. All rights reserved.
A method for recording verbal behavior in free-play settings.

PubMed

Nordquist, V M

1971-01-01

The present study attempted to test the reliability of a new method of recording verbal behavior in a free-play preschool setting. Six children, three normal and three speech impaired, served as subjects. Videotaped records of verbal behavior were scored by two experimentally naive observers. The results suggest that the system provides a means of obtaining reliable records of both normal and impaired speech, even when the subjects exhibit nonverbal behaviors (such as hyperactivity) that interfere with direct observation techniques.
Geotechnical Descriptions of Rock and Rock Masses.

DTIC Science & Technology

1985-04-01

determined in the field on core speci ns by the standard Rock Testing Handbook Methods . afls GA DTIC TAB thannounod 13 Justifiatlo By Distributin...to provide rock strength descriptions from the field. The point-load test has proven to be a reliable method of determining rock strength properties...report should qualify the reported spacing values by stating the methods used to determine spacing. Preferably the report should make the determination
RELIABILITY OF ANKLE-FOOT MORPHOLOGY, MOBILITY, STRENGTH, AND MOTOR PERFORMANCE MEASURES.

PubMed

Fraser, John J; Koldenhoven, Rachel M; Saliba, Susan A; Hertel, Jay

2017-12-01

Assessment of foot posture, morphology, intersegmental mobility, strength and motor control of the ankle-foot complex are commonly used clinically, but measurement properties of many assessments are unclear. To determine test-retest and inter-rater reliability, standard error of measurement, and minimal detectable change of morphology, joint excursion and play, strength, and motor control of the ankle-foot complex. Reliability study. 24 healthy, recreationally-active young adults without history of ankle-foot injury were assessed by two clinicians on two occasions, three to ten days apart. Measurement properties were assessed for foot morphology (foot posture index, total and truncated length, width, arch height), joint excursion (weight-bearing dorsiflexion, rearfoot and hallux goniometry, forefoot inclinometry, 1 st metatarsal displacement) and joint play, strength (handheld dynamometry), and motor control rating during intrinsic foot muscle (IFM) exercises. Clinician order was randomized using a Latin Square. The clinicians performed independent examinations and did not confer on the findings for the duration of the study. Test-retest and inter-tester reliability and agreement was assessed using intraclass correlation coefficients (ICC 2,k ) and weighted kappa ( K w ). Test-retest reliability ICC were as follows: morphology: .80-1.00, joint excursion: .58-.97, joint play: -.67-.84, strength: .67-.92, IFM motor rating: K W -.01-.71. Inter-rater reliability ICC were as follows: morphology: .81-1.00, joint excursion: .32-.97, joint play: -1.06-1.00, strength: .53-.90, and IFM motor rating: K w .02-.56. Measures of ankle-foot posture, morphology, joint excursion, and strength demonstrated fair to excellent test-retest and inter-rater reliability. Test-retest reliability for rating of perceived difficulty and motor performance was good to excellent for short-foot, toe-spread-out, and hallux exercises and poor to fair for lesser toe extension. Joint play measures had poor to fair reliability overall. The findings of this study should be considered when choosing methods of clinical assessment and outcome measures in practice and research. 3.
A hydrostatic weighing method using total lung capacity and a small tank.

PubMed Central

Warner, J G; Yeater, R; Sherwood, L; Weber, K

1986-01-01

The purpose of this study was to establish the validity and reliability of a hydrostatic weighing method using total lung capacity (measuring vital capacity with a respirometer at the time of weighing) the prone position, and a small oblong tank. The validity of the method was established by comparing the TLC prone (tank) method against three hydrostatic weighing methods administered in a pool. The three methods included residual volume seated, TLC seated and TLC prone. Eighty male and female subjects were underwater weighed using each of the four methods. Validity coefficients for per cent body fat between the TLC prone (tank) method and the RV seated (pool), TLC seated (pool) and TLC prone (pool) methods were .98, .99 and .99, respectively. A randomised complete block ANOVA found significant differences between the RV seated (pool) method and each of the three TLC methods with respect to both body density and per cent body fat. The differences were negligible with respect to HW error. Reliability of the TLC prone (tank) method was established by weighing twenty subjects three different times with ten-minute time intervals between testing. Multiple correlations yielded reliability coefficients for body density and per cent body fat values of .99 and .99, respectively. It was concluded that the TLC prone (tank) method is valid, reliable and a favourable method of hydrostatic weighing. PMID:3697596
A hydrostatic weighing method using total lung capacity and a small tank.

PubMed

Warner, J G; Yeater, R; Sherwood, L; Weber, K

1986-03-01

The purpose of this study was to establish the validity and reliability of a hydrostatic weighing method using total lung capacity (measuring vital capacity with a respirometer at the time of weighing) the prone position, and a small oblong tank. The validity of the method was established by comparing the TLC prone (tank) method against three hydrostatic weighing methods administered in a pool. The three methods included residual volume seated, TLC seated and TLC prone. Eighty male and female subjects were underwater weighed using each of the four methods. Validity coefficients for per cent body fat between the TLC prone (tank) method and the RV seated (pool), TLC seated (pool) and TLC prone (pool) methods were .98, .99 and .99, respectively. A randomised complete block ANOVA found significant differences between the RV seated (pool) method and each of the three TLC methods with respect to both body density and per cent body fat. The differences were negligible with respect to HW error. Reliability of the TLC prone (tank) method was established by weighing twenty subjects three different times with ten-minute time intervals between testing. Multiple correlations yielded reliability coefficients for body density and per cent body fat values of .99 and .99, respectively. It was concluded that the TLC prone (tank) method is valid, reliable and a favourable method of hydrostatic weighing.
Community Food Environment, Home Food Environment, and Fruit and Vegetable Intake of Children and Adolescents

ERIC Educational Resources Information Center

Ding, Ding; Sallis, James F.; Norman, Gregory J.; Saelens, Brian E.; Harris, Sion Kim; Kerr, Jacqueline; Rosenberg, Dori; Durant, Nefertiti; Glanz, Karen

2012-01-01

Objectives: To determine (1) reliability of new food environment measures; (2) association between home food environment and fruit and vegetable (FV) intake; and (3) association between community and home food environment. Methods: In 2005, a cross-sectional survey was conducted with readministration to assess test-retest reliability. Adolescents,…
Validity and Reliability of Internalized Stigma of Mental Illness (Cantonese)

ERIC Educational Resources Information Center

Young, Daniel Kim-Wan; Ng, Petrus Y. N.; Pan, Jia-Yan; Cheng, Daphne

2017-01-01

Purpose: This study aims to translate and test the reliability and validity of the Internalized Stigma of Mental Illness-Cantonese (ISMI-C). Methods: The original English version of ISMI is translated into the ISMI-C by going through forward and backward translation procedure. A cross-sectional research design is adopted that involved 295…
Reliability of an experimental method to analyse the impact point on a golf ball during putting.

PubMed

Richardson, Ashley K; Mitchell, Andrew C S; Hughes, Gerwyn

2015-06-01

This study aimed to examine the reliability of an experimental method identifying the location of the impact point on a golf ball during putting. Forty trials were completed using a mechanical putting robot set to reproduce a putt of 3.2 m, with four different putter-ball combinations. After locating the centre of the dimple pattern (centroid) the following variables were tested; distance of the impact point from the centroid, angle of the impact point from the centroid and distance of the impact point from the centroid derived from the X, Y coordinates. Good to excellent reliability was demonstrated in all impact variables reflected in very strong relative (ICC = 0.98-1.00) and absolute reliability (SEM% = 0.9-4.3%). The highest SEM% observed was 7% for the angle of the impact point from the centroid. In conclusion, the experimental method was shown to be reliable at locating the centroid location of a golf ball, therefore allowing for the identification of the point of impact with the putter head and is suitable for use in subsequent studies.
The reliability and reproducibility of cephalometric measurements: a comparison of conventional and digital methods

PubMed Central

AlBarakati, SF; Kula, KS; Ghoneima, AA

2012-01-01

Objective The aim of this study was to assess the reliability and reproducibility of angular and linear measurements of conventional and digital cephalometric methods. Methods A total of 13 landmarks and 16 skeletal and dental parameters were defined and measured on pre-treatment cephalometric radiographs of 30 patients. The conventional and digital tracings and measurements were performed twice by the same examiner with a 6 week interval between measurements. The reliability within the method was determined using Pearson's correlation coefficient (r2). The reproducibility between methods was calculated by paired t-test. The level of statistical significance was set at p < 0.05. Results All measurements for each method were above 0.90 r2 (strong correlation) except maxillary length, which had a correlation of 0.82 for conventional tracing. Significant differences between the two methods were observed in most angular and linear measurements except for ANB angle (p = 0.5), angle of convexity (p = 0.09), anterior cranial base (p = 0.3) and the lower anterior facial height (p = 0.6). Conclusion In general, both methods of conventional and digital cephalometric analysis are highly reliable. Although the reproducibility of the two methods showed some statistically significant differences, most differences were not clinically significant. PMID:22184624

Reliability and Validity of the Hip Stability Isometric Test (HipSIT): A New Method to Assess Hip Posterolateral Muscle Strength.

PubMed

Almeida, Gabriel Peixoto Leão; das Neves Rodrigues, Helena Larissa; de Freitas, Bruno Wesley; de Paula Lima, Pedro Olavo

2017-12-01

Study Design Cross-sectional study. Background The Hip Stability Isometric Test (HipSIT) evaluates the strength of the hip posterolateral stabilizers in a position that favors greater activation of the gluteus maximus and gluteus medius and lower activation of the tensor fascia lata. Objectives To check the validity and reliability of the HipSIT and to evaluate the HipSIT in women with patellofemoral pain (PFP). Methods The HipSIT was evaluated with a handheld dynamometer. During testing, the participants were sidelying, with their legs positioned at 45° of hip flexion and 90° of knee flexion. Participants were instructed to raise the knee of the upper leg while keeping the upper and lower heels in contact. To establish reliability and validity, 49 women were tested with the HipSIT by 2 different evaluators on day 1, and then again 7 days later. The strength of the hip extensors, abductors, and external rotators was also evaluated. Twenty women with unilateral PFP were also evaluated. Results The HipSIT has excellent intrarater and interrater reliability. The standard error of measurement was 0.01 kgf/kg, and the minimal detectable change was 0.036 kgf/kg. The HipSIT showed good validity in isolated hip abduction, external rotation, and extension (P<.01). Women with PFP showed a 10% deficit in the HipSIT results for the symptomatic limb (P = .01). Conclusion The HipSIT showed excellent interrater and intrarater reliability, moderate to good validity in women, and was able to identify strength deficits in women with PFP. J Orthop Sports Phys Ther 2017;47(12):906-913. Epub 9 Oct 2017. doi:10.2519/jospt.2017.7274.
Reliability of measures of transient evoked otoacoustic emissions with contralateral suppression.

PubMed

Stuart, Andrew; Cobb, Kensi M

2015-01-01

The reliability of measures of transient evoked otoacoustic emissions (TEOAEs) with contralateral suppression was examined. The effect of test session (i.e., initial test; retest without probe removal; retest with probe removal; and retest 1-2 days post initial test), gender, and ear was examined in 14 young adult females and 14 young adult males. TEOAEs were obtained bilaterally with 60 dB peSPL linear click stimuli with and without a contralateral 65 dB SPL broadband noise suppressor. Absolute TEOAE suppression and a normalized index of TEOAE suppression (i.e., percentage of suppression) were examined. Reliability of these measures was assessed with repeated measures linear mixed model analysis of variance, a coefficient of reliability, and Bland-Altman analyses. There were no statistically significant (p>0.05) main effects of test, gender, and ear or interactions for both absolute dB and % TEOAE suppression values. Cronbach's α were greater than 0.90 across the four tests for both TEOAE measures. Mean test differences or bias (i.e., between the initial and subsequent tests) for absolute and % TEOAE suppression ranged from -0.05 to 0.11 dB and -1.5% to 1.1%, respectively. There was no proportional/systematic bias with the mean differences of the first and subsequent measurements. Data herein were consistent with the view that bilateral TEOAE suppression measures are reliable across test sessions of 1-2 days among females and males and may provide a method to monitor medial olivocochlear efferent reflex status over time. Copyright © 2015 Elsevier Inc. All rights reserved.
Can Physicians Identify Inappropriate Nuclear Stress Tests? An Examination of Inter-rater Reliability for the 2009 Appropriate Use Criteria for Radionuclide Imaging

PubMed Central

Ye, Siqin; Rabbani, LeRoy E.; Kelly, Christopher R.; Kelly, Maureen R.; Lewis, Matthew; Paz, Yehuda; Peck, Clara L.; Rao, Shaline; Bokhari, Sabahat; Weiner, Shepard D.; Einstein, Andrew J.

2014-01-01

Background We sought to determine inter-rater reliability of the 2009 Appropriate Use Criteria (AUC) for radionuclide imaging (RNI) and whether physicians at various levels of training can effectively identify nuclear stress tests with inappropriate indications. Methods and Results Four hundred patients were randomly selected from a consecutive cohort of patients undergoing nuclear stress testing at an academic medical center. Raters with different levels of training (including cardiology attending physicians, cardiology fellows, internal medicine hospitalists, and internal medicine interns) classified individual nuclear stress tests using the 2009 AUC. Consensus classification by two cardiologists was considered the operational gold standard, and sensitivity and specificity of individual raters for identifying inappropriate tests was calculated. Inter-rater reliability of the AUC was assessed using Cohen’s kappa statistics for pairs of different raters. The mean age of patients was 61.5 years; 214 (54%) were female. The cardiologists rated 256 (64%) of 400 NSTs as appropriate, 68 (18%) as uncertain, 55 (14%) as inappropriate; 21 (5%) tests were unable to be classified. Inter-rater reliability for non-cardiologist raters was modest (unweighted Cohen’s kappa, 0.51, 95% confidence interval, 0.45 to 0.55). Sensitivity of individual raters for identifying inappropriate tests ranged from 47% to 82%, while specificity ranged from 85% to 97%. Conclusions Inter-rater reliability for the 2009 AUC for RNI is modest, and there is considerable variation in the ability of raters at different levels of training to identify inappropriate tests. PMID:25563660
A novel mobile phone application to assess nutrition environment measures in low- and middle-income countries.

PubMed

Kanter, Rebecca; Alvey, Jeniece; Fuentes, Deborah

2014-09-01

Consumer nutrition environment measures are important to understanding the food environment, which affects individual dietary intake. A nutrition environment measures survey for supermarkets (NEMS-S) has been designed on paper for use in Guatemala. However, a paper survey is not an inconspicuous data collection method. To design, pilot test, and validate the Guatemala NEMS-S in the form of a mobile phone application (mobile app). CommCare, a free and open-source software application, was used to design the NEMS-S for Guatemala in the form of a mobile app. Two raters tested the mobile app in a single Guatemalan supermarket. Both the interrater and the test-retest reliability of the mobile app were determined using percent agreement and Cohen's kappa score and compared with the interrater and test-retest reliability of the paper version. Interrater reliability was very high between the paper survey and the mobile app (Cohen's kappa > 0.90). Test-retest reliability ranged from kappa 0.78 to 0.91. Between two certified NEMS-S raters, survey completion time using the mobile app was 5 minutes less than that with the paper form (35 vs. 40 minutes). The NEMS-S mobile app provides for more rapid data collection, with equivalent reliability and validity to the NEMS-S paper version, with advantages over a paper-based survey of multiple language capability and concomitant data entry.
[Turkish validity and reliability study of fear of pain questionnaire-III].

PubMed

Ünver, Seher; Turan, Fatma Nesrin

2018-01-01

This study aimed to develop a Turkish version of the Fear of Pain Questionnaire-III developed by McNeil and Rainwater (1998) and examine its validity and reliability indicators. The study was conducted with 459 university students studying in the nursing department. The Turkish translation of the scale was conducted by language experts and the original scale owner. Expert opinions were taken for language validity, and the Lawshe's content validity ratio formula was used to calculate the content validity. Exploratory factor analysis was used to assess the construct validity. The factors were rotated using the Varimax rotation (orthogonal) method. For reliability indicators of the questionnaire, the internal consistency coefficient and test re-test reliability were utilized. Explanatory factor analyses using the three-factor model (explaining 50.5% of the total variance) revealed that the item factor loads varied were above the limit value of 0.30 which indicated that the questionnaire had good construct validity. The Cronbach's alpha value for the total questionnaire was 0.938, and test re-test value was 0.846 for the total scale. The Turkish version of the Fear of Pain Questionnaire-III had sufficiently high reliability and validity to be used as a tool in evaluating the fear of pain among the young Turkish population.
Validity and Reliability of the Bahasa Melayu Version of the Migraine Disability Assessment Questionnaire

PubMed Central

Shaik, Munvar Miya; Hassan, Norul Badriah; Bhaskar, Shalini; Gan, Siew Hua

2014-01-01

Background. The study was designed to determine the validity and reliability of the Bahasa Melayu version (MIDAS-M) of the Migraine Disability Assessment (MIDAS) questionnaire. Methods. Patients having migraine for more than six months attending the Neurology Clinic, Hospital Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia, were recruited. Standard forward and back translation procedures were used to translate and adapt the MIDAS questionnaire to produce the Bahasa Melayu version. The translated Malay version was tested for face and content validity. Validity and reliability testing were further conducted with 100 migraine patients (1st administration) followed by a retesting session 21 days later (2nd administration). Results. A total of 100 patients between 15 and 60 years of age were recruited. The majority of the patients were single (66%) and students (46%). Cronbach's alpha values were 0.84 (1st administration) and 0.80 (2nd administration). The test-retest reliability for the total MIDAS score was 0.73, indicating that the MIDAS-M questionnaire is stable; for the five disability questions, the test-retest values ranged from 0.77 to 0.87. Conclusion. The MIDAS-M questionnaire is comparable with the original English version in terms of validity and reliability and may be used for the assessment of migraine in clinical settings. PMID:25121099
Family Self-Efficacy for Diabetes Management: Psychometric Testing

PubMed Central

McEwen, Marylyn M.; Pasvogel, Alice; Murdaugh, Carolyn L.

2017-01-01

Background and Purpose Type 2 diabetes mellitus (T2DM) self-management among Hispanic adults occurs in a family context. Self-efficacy (SE) affects T2DM self-management behaviors; however, no instruments are available to measure family diabetes self-efficacy. The study’s purpose was to test the psychometric properties of the Family Self-Efficacy for Diabetes Scale (FSE). Methods Family members (n = 113) of adults with T2DM participated. Psychometric analysis included internal consistency reliability and concurrent and construct validity. Results Internal consistency reliability was .86. Items loaded on 2 factors, Family SE for Supporting Healthy Behaviors and Family SE for Supporting General Health, accounting for 71% of the variance. FSE correlated significantly with 3 diabetes-related instruments. Conclusions The FSE is a reliable and valid instrument. Further testing is needed in diverse populations and geographic areas. PMID:27103242
The Unsupported Upper Limb Exercise Test in People Without Disabilities: Assessing the Within-Day Test-Retest Reliability and the Effects of Age and Gender.

PubMed

Oliveira, Ana; Cruz, Joana; Jácome, Cristina; Marques, Alda

2018-01-01

Purpose: To estimate the within-day test-retest reliability and standard error of measurement (SEM) of the unsupported upper limb exercise test (UULEX) in adults without disabilities and to determine the effects of age and gender on performance of the UULEX. Method: A cross-sectional study was conducted with 100 adults without disabilities (44 men, mean age 44.2 [SD 26] y; 56 women, mean age 38.1 [SD 24.1] y). Participants performed three UULEX tests to establish within-day reliability, measured using an intra-class correlation coefficient (ICC) model 2 (two-way random effects) with a single rater (ICC[2,1]) and SEM. The effects of age and gender were examined using two-factor mixed-design analysis of variance (ANOVA) and one-way repeated-measures ANOVA. For analysis purposes, four sub-groups were created: younger adults, older adults, men, and women. Results: Excellent within-day reliability and a small SEM were found in the four sub-groups (younger adults: ICC[2,1]=0.88; 95% CI: 0.82, 0.92; SEM∼40 s; older adults: ICC[2,1]=0.82; 95% CI: 0.72, 0.90; SEM∼50 s; men: ICC[2,1]=0.93; 95% CI: 0.88, 0.96; SEM∼30 s; women: ICC[2,1]=0.85; 95% CI: 0.78, 0.91; SEM∼45 s). Younger adults took, on average, 308.24 seconds longer than older adults to perform the test; older adults performed significantly better on the third test ( p <0.0001; η 2 =0.096). Gender effects were not found ( p >0.05). Conclusion: The within-day test-retest reliability and SEM values of the UULEX may be used to define the magnitude of the error obtained with repeated measures. One UULEX test seems to be adequate for younger adults to achieve reliable results, whereas three tests seem to be needed for older adults.
Test-Retest Reliability of a Serious Game for Delirium Screening in the Emergency Department.

PubMed

Tong, Tiffany; Chignell, Mark; Tierney, Mary C; Lee, Jacques S

2016-01-01

Introduction: Cognitive screening in settings such as emergency departments (ED) is frequently carried out using paper-and-pencil tests that require administration by trained staff. These assessments often compete with other clinical duties and thus may not be routinely administered in these busy settings. Literature has shown that the presence of cognitive impairments such as dementia and delirium are often missed in older ED patients. Failure to recognize delirium can have devastating consequences including increased mortality (Kakuma et al., 2003). Given the demands on emergency staff, an automated cognitive test to screen for delirium onset could be a valuable tool to support delirium prevention and management. In earlier research we examined the concurrent validity of a serious game, and carried out an initial assessment of its potential as a delirium screening tool (Tong et al., 2016). In this paper, we examine the test-retest reliability of the game, as it is an important criterion in a cognitive test for detecting risk of delirium onset. Objective: To demonstrate the test-retest reliability of the screening tool over time in a clinical sample of older emergency patients. A secondary objective is to assess whether there are practice effects that might make game performance unstable over repeated presentations. Materials and Methods: Adults over the age of 70 were recruited from a hospital ED. Each patient played our serious game in an initial session soon after they arrived in the ED, and in follow up sessions conducted at 8-h intervals (for each participant there were up to five follow up sessions, depending on how long the person stayed in the ED). Results: A total of 114 adults (61 females, 53 males) between the ages of 70 and 104 years ( M = 81 years, SD = 7) participated in our study after screening out delirious patients. We observed a test-retest reliability of the serious game (as assessed by correlation r -values) between 0.5 and 0.8 across adjacent sessions. Conclusion: The game-based assessment for cognitive screening has relatively strong test-retest reliability and little evidence of practice effects among elderly emergency patients, and may be a useful supplement to existing cognitive assessment methods.
Reliability of functional and predictive methods to estimate the hip joint centre in human motion analysis in healthy adults.

PubMed

Kainz, Hans; Hajek, Martin; Modenese, Luca; Saxby, David J; Lloyd, David G; Carty, Christopher P

2017-03-01

In human motion analysis predictive or functional methods are used to estimate the location of the hip joint centre (HJC). It has been shown that the Harrington regression equations (HRE) and geometric sphere fit (GSF) method are the most accurate predictive and functional methods, respectively. To date, the comparative reliability of both approaches has not been assessed. The aims of this study were to (1) compare the reliability of the HRE and the GSF methods, (2) analyse the impact of the number of thigh markers used in the GSF method on the reliability, (3) evaluate how alterations to the movements that comprise the functional trials impact HJC estimations using the GSF method, and (4) assess the influence of the initial guess in the GSF method on the HJC estimation. Fourteen healthy adults were tested on two occasions using a three-dimensional motion capturing system. Skin surface marker positions were acquired while participants performed quite stance, perturbed and non-perturbed functional trials, and walking trials. Results showed that the HRE were more reliable in locating the HJC than the GSF method. However, comparison of inter-session hip kinematics during gait did not show any significant difference between the approaches. Different initial guesses in the GSF method did not result in significant differences in the final HJC location. The GSF method was sensitive to the functional trial performance and therefore it is important to standardize the functional trial performance to ensure a repeatable estimate of the HJC when using the GSF method. Copyright © 2017 Elsevier B.V. All rights reserved.
Reading Ability as an Estimator of Premorbid Intelligence: Does It Remain Stable Among Ethnically Diverse HIV+ Adults?

PubMed Central

Olsen, J. Pat; Fellows, Robert P.; Rivera-Mindt, Monica; Morgello, Susan; Byrd, Desiree A.

2015-01-01

The Wide Range Achievement Test, 3rd edition, Reading-Recognition subtest (WRAT-3 RR) is an established measure of premorbid ability. Furthermore, its long-term reliability is not well documented, particularly in diverse populations with CNS-relevant disease. Objective: We examined test-retest reliability of the WRAT-3 RR over time in an HIV+ sample of predominantly racial/ethnic minority adults. Method: Participants (N = 88) completed a comprehensive neuropsychological battery, including the WRAT-3 RR, on at least two separate study visits. Intraclass correlation coefficients (ICCs) were computed using scores from baseline and follow-up assessments to determine the test-retest reliability of the WRAT-3 RR across racial/ethnic groups and changes in medical (immunological) and clinical (neurocognitive) factors. Additionally, Fisher’s Z tests were used to determine the significance of the differences between ICCs. Results: The average test-retest interval was 58.7 months (SD=36.4). The overall WRAT-3 RR test-retest reliability was high (r = .97, p < .001), and remained robust across all demographic, medical, and clinical variables (all r’s > .92). Intraclass correlation coefficients did not differ significantly between the subgroups tested (all Fisher’s Z p’s > .05). Conclusions: Overall, this study supports the appropriateness of word-reading tests, such as the WRAT-3 RR, for use as stable premorbid IQ estimates among ethnically diverse groups. Moreover, this study supports the reliability of this measure in the context of change in health and neurocognitive status, and in lengthy inter-test intervals. These findings offer strong rationale for reading as a “hold” test, even in the presence of a chronic, variable disease such as HIV. PMID:26689235
Journal: Efficient Hydrologic Tracer-Test Design for Tracer-Mass Estimation and Sample Collection Frequency, 1 Method Development

EPA Science Inventory

Hydrological tracer testing is the most reliable diagnostic technique available for the determination of basic hydraulic and geometric parameters necessary for establishing operative solute-transport processes. Tracer-test design can be difficult because of a lack of prior knowl...
Proof test methodology for composites

NASA Technical Reports Server (NTRS)

Wu, Edward M.; Bell, David K.

1992-01-01

The special requirements for proof test of composites are identified based on the underlying failure process of composites. Two proof test methods are developed to eliminate the inevitable weak fiber sites without also causing flaw clustering which weakens the post-proof-test composite. Significant reliability enhancement by these proof test methods has been experimentally demonstrated for composite strength and composite life in tension. This basic proof test methodology is relevant to the certification and acceptance of critical composite structures. It can also be applied to the manufacturing process development to achieve zero-reject for very large composite structures.
Ocular dominance stability and reading skill: a controversial relationship.

PubMed

Zeri, Fabrizio; De Luca, Maria; Spinelli, Donatella; Zoccolotti, Pierluigi

2011-11-01

Evidence is mixed concerning the relationship between stability of ocular dominance and reading deficits. Contrasting results may be due to the use of different tests of dominance, different samples of readers, and different scoring methods. The aim of this study was to investigate the relationship among ocular dominance, general visual abilities, and reading performance, and to evaluate the consistency and reliability of different tests of ocular dominance and the effects of different types of eye dominance scoring. In a group of young adults, we measured: (a) main optometric parameters; (b) reading time and accuracy; and (c) ocular dominance in two sighting and four motor tests. Dominance was determined using different scoring methods (relative, absolute, and binary scores). All dominance tests showed good levels of internal reliability. Sighting tests were consistent regardless of the scoring method, and all participants had stable dominance. Three of four motor tests were moderately consistent when dominance was measured with relative scores but not when it was measured with absolute or binary scores. No relationship was found between stability of dominance and reading performance, regardless of the type of test or scoring method. No systematic pattern of correlation was found between binocular vision variables and dominance measures. Choosing the type of motor test to measure ocular dominance is crucial, because the level of consistency among tests is low to moderate. Furthermore, motor tests were not correlated with reading performances. Present results suggest caution when trying to link reading difficulties with specific profiles of ocular dominance.
Reliability and validity of pendulum test measures of spasticity obtained with the Polhemus tracking system from patients with chronic stroke

PubMed Central

Bohannon, Richard W; Harrison, Steven; Kinsella-Shaw, Jeffrey

2009-01-01

Background Spasticity is a common impairment accompanying stroke. Spasticity of the quadriceps femoris muscle can be quantified using the pendulum test. The measurement properties of pendular kinematics captured using a magnetic tracking system has not been studied among patients who have experienced a stroke. Therefore, this study describes the test-retest reliability and known groups and convergent validity of the pendulum test measures obtained with the Polhemus tracking system. Methods Eight patients with chronic stroke underwent pendulum tests with their affected and unaffected lower limbs, with and without the addition of a 2.2 kg cuff weight at the ankle, using the Polhemus magnetic tracking system. Also measured bilaterally were knee resting angles, Ashworth scores (grades 0–4) of quadriceps femoris muscles, patellar tendon (knee jerk) reflexes (grades 0–4), and isometric knee extension force. Results Three measures obtained from pendular traces of the affected side were reliable (intraclass correlation coefficient ≥ .844). Known groups validity was confirmed by demonstration of a significant difference in the measurements between sides. Convergent validity was supported by correlations ≥ .57 between pendulum test measures and other measures reflective of spasticity. Conclusion Pendulum test measures obtained with the Polhemus tracking system from the affected side of patients with stroke have good test-retest reliability and both known groups and convergent validity. PMID:19642989
Validity and reliability of Internet-based physiotherapy assessment for musculoskeletal disorders: a systematic review.

PubMed

Mani, Suresh; Sharma, Shobha; Omar, Baharudin; Paungmali, Aatit; Joseph, Leonard

2017-04-01

Purpose The purpose of this review is to systematically explore and summarise the validity and reliability of telerehabilitation (TR)-based physiotherapy assessment for musculoskeletal disorders. Method A comprehensive systematic literature review was conducted using a number of electronic databases: PubMed, EMBASE, PsycINFO, Cochrane Library and CINAHL, published between January 2000 and May 2015. The studies examined the validity, inter- and intra-rater reliabilities of TR-based physiotherapy assessment for musculoskeletal conditions were included. Two independent reviewers used the Quality Appraisal Tool for studies of diagnostic Reliability (QAREL) and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool to assess the methodological quality of reliability and validity studies respectively. Results A total of 898 hits were achieved, of which 11 articles based on inclusion criteria were reviewed. Nine studies explored the concurrent validity, inter- and intra-rater reliabilities, while two studies examined only the concurrent validity. Reviewed studies were moderate to good in methodological quality. The physiotherapy assessments such as pain, swelling, range of motion, muscle strength, balance, gait and functional assessment demonstrated good concurrent validity. However, the reported concurrent validity of lumbar spine posture, special orthopaedic tests, neurodynamic tests and scar assessments ranged from low to moderate. Conclusion TR-based physiotherapy assessment was technically feasible with overall good concurrent validity and excellent reliability, except for lumbar spine posture, orthopaedic special tests, neurodynamic testa and scar assessment.
Simulation-Based Training for Colonoscopy

PubMed Central

Preisler, Louise; Svendsen, Morten Bo Søndergaard; Nerup, Nikolaj; Svendsen, Lars Bo; Konge, Lars

2015-01-01

Abstract The aim of this study was to create simulation-based tests with credible pass/fail standards for 2 different fidelities of colonoscopy models. Only competent practitioners should perform colonoscopy. Reliable and valid simulation-based tests could be used to establish basic competency in colonoscopy before practicing on patients. Twenty-five physicians (10 consultants with endoscopic experience and 15 fellows with very little endoscopic experience) were tested on 2 different simulator models: a virtual-reality simulator and a physical model. Tests were repeated twice on each simulator model. Metrics with discriminatory ability were identified for both modalities and reliability was determined. The contrasting-groups method was used to create pass/fail standards and the consequences of these were explored. The consultants significantly performed faster and scored higher than the fellows on both the models (P < 0.001). Reliability analysis showed Cronbach α = 0.80 and 0.87 for the virtual-reality and the physical model, respectively. The established pass/fail standards failed one of the consultants (virtual-reality simulator) and allowed one fellow to pass (physical model). The 2 tested simulations-based modalities provided reliable and valid assessments of competence in colonoscopy and credible pass/fail standards were established for both the tests. We propose to use these standards in simulation-based training programs before proceeding to supervised training on patients. PMID:25634177
Test-retest reliability of the safe driving behavior measure for community-dwelling elderly drivers.

PubMed

Song, Chiang-Soon; Lee, Joo-Hyun; Han, Sang-Woo

2016-06-01

[Purpose] The Safe Driving Behavior Measure (SDBM) is a self-report measurement tools that assesses the safe-driving behaviors of the elderly. The purpose of this study was to evaluate the test-retest reliability of the SDBM among community-dwelling elderly drivers. [Subjects and Methods] A total of sixty-one community-dwelling elderly were enrolled to investigate the reliability of the SDBM. The SDBM was assessed in two sessions that were conducted three days apart in a quiet and well-organized assessment room. That test-retest reliability of overall scores and three domain scores of the SDBM were statistically evaluated using intraclass correlation coefficients [ICC (2.1)]. Pearson correlation coefficients were used to quantify bivariate associations among the three domains of the SDBM. [Results] The SDBM demonstrated excellent rest-retest reliability for community-dwelling elderly drivers. The Cronbach alpha coefficients of the three domains of person-vehicle (0.979), person-environment (0.944), and person-vehicle-environment (0.971) of the SDBM indicate high internal consistency. [Conclusion] The results of this study suggest that the SDBM is a reliable measure for evaluating the safe- driving of automobiles by community-dwelling elderly, and is adequate for detecting changes in scores in clinical settings.
Validity and Reliability of the Turkish Version of Needs Based Biopsychosocial Distress Instrument for Cancer Patients (CANDI)

PubMed Central

Beyhun, Nazim Ercument; Can, Gamze; Tiryaki, Ahmet; Karakullukcu, Serdar; Bulut, Bekir; Yesilbas, Sehbal; Kavgaci, Halil; Topbas, Murat

2016-01-01

Background Needs based biopsychosocial distress instrument for cancer patients (CANDI) is a scale based on needs arising due to the effects of cancer. Objectives The aim of this research was to determine the reliability and validity of the CANDI scale in the Turkish language. Patients and Methods The study was performed with the participation of 172 cancer patients aged 18 and over. Factor analysis (principal components analysis) was used to assess construct validity. Criterion validities were tested by computing Spearman correlation between CANDI and hospital anxiety depression scale (HADS), and brief symptom inventory (BSI) (convergent validity) and quality of life scales (FACT-G) (divergent validity). Test-retest reliabilities and internal consistencies were measured with intraclass correlation (ICC) and Cronbach-α. Results A three-factor solution (emotional, physical and social) was found with factor analysis. Internal reliability (α = 0.94) and test-retest reliability (ICC = 0.87) were significantly high. Correlations between CANDI and HADS (rs = 0.67), and BSI (rs = 0.69) and FACT-G (rs = -0.76) were moderate and significant in the expected direction. Conclusions CANDI is a valid and reliable scale in cancer patients with a three-factor structure (emotional, physical and social) in the Turkish language. PMID:27621931
The spaced antenna drift method

NASA Technical Reports Server (NTRS)

Hocking, W. K.

1983-01-01

The spaced antenna drift method is a simple and relatively inexpensive method for determination of atmospheric wind velocities using radars. The technique has been extensively tested in the mesosphere at high and medium frequencies, and found to give reliable results. Recently, the method has also been applied to VHF observations of the troposphere and stratosphere, and results appear to be reliable. This paper discusses briefly the principle of the method, and investigates both its strengths and weaknesses. Some discussions concerning criticisms of the technique are also given, and it is concluded that while these criticisms may be of some concern at times, appropriate care can ensure that the method is at least as viable as any other method of remote wind measurement. At times, the technique has definite advantages.

Testing Standard Reliability Criteria

ERIC Educational Resources Information Center

Sherry, David

2017-01-01

Maul's paper, "Rethinking Traditional Methods of Survey Validation" (Andrew Maul), contains two stages. First he presents empirical results that cast doubt on traditional methods for validating psychological measurement instruments. These results motivate the second stage, a critique of current conceptions of psychological measurement…
Hypervelocity Impact (HVI). Volume 1; General Introduction

NASA Technical Reports Server (NTRS)

Gorman, Michael R.; Ziola, Steven M.

2007-01-01

During 2003 and 2004, the Johnson Space Center's White Sands Testing Facility in Las Cruces, New Mexico conducted hypervelocity impact tests on the space shuttle wing leading edge. Hypervelocity impact tests were conducted to determine if Micro-Meteoroid/Orbital Debris impacts could be reliably detected and located using simple passive ultrasonic methods. This volume contains an executive summary, overview of the method, brief descriptions of all targets, and highlights of results and conclusions.
Used Solvent Testing and Reclamation. Volume 2. Vapor Degreasing and Precision Cleaning Solvents

DTIC Science & Technology

1988-12-01

of 5 to 500 ppm in halogenated solvents using Karl - Fischer reagent. Arbitrary criteria to identify a spent solvent have evolved in various industries... methods of managing waste solvent. Some DOD installations are reclaiming used solvents rather than discarding them. Reclamation is feasible because the...most E E CT E reliable methods for testing solvent quality. Further testing isnecessary for chlorinated solvents to determine the inhibitor con- FEB 24
Use of the smartphone for end vertebra selection in scoliosis.

PubMed

Pepe, Murad; Kocadal, Onur; Iyigun, Abdullah; Gunes, Zafer; Aksahin, Ertugrul; Aktekin, Cem Nuri

2017-03-01

The aim of our study was to develop a smartphone-aided end vertebra selection method and to investigate its effectiveness in Cobb angle measurement. Twenty-nine adolescent idiopathic scoliosis patients' pre-operative posteroanterior scoliosis radiographs were used for end vertebra selection and Cobb angle measurement by standard method and smartphone-aided method. Measurements were performed by 7 examiners. The intraclass correlation coefficient was used to analyze selection and measurement reliability. Summary statistics of variance calculations were used to provide 95% prediction limits for the error in Cobb angle measurements. A paired 2-tailed t test was used to analyze end vertebra selection differences. Mean absolute Cobb angle difference was 3.6° for the manual method and 1.9° for the smartphone-aided method. Both intraobserver and interobserver reliability were found excellent in manual and smartphone set for Cobb angle measurement. Both intraobserver and interobserver reliability were found excellent in manual and smartphone set for end vertebra selection. But reliability values of manual set were lower than smartphone. Two observers selected significantly different end vertebra in their repeated selections for manual method. Smartphone-aided method for end vertebra selection and Cobb angle measurement showed excellent reliability. We can expect a reduction in measurement error rates with the widespread use of this method in clinical practice. Level III, Diagnostic study. Copyright © 2016 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.
Validation of an Instrument to Measure High School Students' Attitudes toward Fitness Testing

ERIC Educational Resources Information Center

Mercier, Kevin; Silverman, Stephen

2014-01-01

Purpose: The purpose of this investigation was to develop an instrument that has scores that are valid and reliable for measuring students' attitudes toward fitness testing. Method: The method involved the following steps: (a) an elicitation study, (b) item development, (c) a pilot study, and (d) a validation study. The pilot study included 427…
Reliability of TMS phosphene threshold estimation: Toward a standardized protocol.

PubMed

Mazzi, Chiara; Savazzi, Silvia; Abrahamyan, Arman; Ruzzoli, Manuela

Phosphenes induced by transcranial magnetic stimulation (TMS) are a subjectively described visual phenomenon employed in basic and clinical research as index of the excitability of retinotopically organized areas in the brain. Phosphene threshold estimation is a preliminary step in many TMS experiments in visual cognition for setting the appropriate level of TMS doses; however, the lack of a direct comparison of the available methods for phosphene threshold estimation leaves unsolved the reliability of those methods in setting TMS doses. The present work aims at fulfilling this gap. We compared the most common methods for phosphene threshold calculation, namely the Method of Constant Stimuli (MOCS), the Modified Binary Search (MOBS) and the Rapid Estimation of Phosphene Threshold (REPT). In two experiments we tested the reliability of PT estimation under each of the three methods, considering the day of administration, participants' expertise in phosphene perception and the sensitivity of each method to the initial values used for the threshold calculation. We found that MOCS and REPT have comparable reliability when estimating phosphene thresholds, while MOBS estimations appear less stable. Based on our results, researchers and clinicians can estimate phosphene threshold according to MOCS or REPT equally reliably, depending on their specific investigation goals. We suggest several important factors for consideration when calculating phosphene thresholds and describe strategies to adopt in experimental procedures. Copyright © 2017 Elsevier Inc. All rights reserved.
Wearable Lactate Threshold Predicting Device is Valid and Reliable in Runners.

PubMed

Borges, Nattai R; Driller, Matthew W

2016-08-01

Borges, NR and Driller, MW. Wearable lactate threshold predicting device is valid and reliable in runners. J Strength Cond Res 30(8): 2212-2218, 2016-A commercially available device claiming to be the world's first wearable lactate threshold predicting device (WLT), using near-infrared LED technology, has entered the market. The aim of this study was to determine the levels of agreement between the WLT-derived lactate threshold workload and traditional methods of lactate threshold (LT) calculation and the interdevice and intradevice reliability of the WLT. Fourteen (7 male, 7 female; mean ± SD; age: 18-45 years, height: 169 ± 9 cm, mass: 67 ± 13 kg, V[Combining Dot Above]O2max: 53 ± 9 ml·kg·min) subjects ranging from recreationally active to highly trained athletes completed an incremental exercise test to exhaustion on a treadmill. Blood lactate samples were taken at the end of each 3-minute stage during the test to determine lactate threshold using 5 traditional methods from blood lactate analysis which were then compared against the WLT predicted value. In a subset of the population (n = 12), repeat trials were performed to determine both inter-reliability and intrareliability of the WLT device. Intraclass correlation coefficient (ICC) found high to very high agreement between the WLT and traditional methods (ICC > 0.80), with TEMs and mean differences ranging between 3.9-10.2% and 1.3-9.4%. Both interdevice and intradevice reliability resulted in highly reproducible and comparable results (CV < 1.2%, TEM <0.2 km·h, ICC > 0.97). This study suggests that the WLT is a practical, reliable, and noninvasive tool for use in predicting LT in runners.
Reliability and Validity of Dual-Task Mobility Assessments in People with Chronic Stroke

PubMed Central

Yang, Lei; He, Chengqi; Pang, Marco Yiu Chung

2016-01-01

Background The ability to perform a cognitive task while walking simultaneously (dual-tasking) is important in real life. However, the psychometric properties of dual-task walking tests have not been well established in stroke. Objective To assess the test-retest reliability, concurrent and known-groups validity of various dual-task walking tests in people with chronic stroke. Design Observational measurement study with a test-retest design. Methods Eighty-eight individuals with chronic stroke participated. The testing protocol involved four walking tasks (walking forward at self-selected and maximal speed, walking backward at self-selected speed, and crossing over obstacles) performed simultaneously with each of the three attention-demanding tasks (verbal fluency, serial 3 subtractions or carrying a cup of water). For each dual-task condition, the time taken to complete the walking task, the correct response rate (CRR) of the cognitive task, and the dual-task effect (DTE) for the walking time and CRR were calculated. Forty-six of the participants were tested twice within 3–4 days to establish test-retest reliability. Results The walking time in various dual-task assessments demonstrated good to excellent reliability [Intraclass correlation coefficient (ICC2,1) = 0.70–0.93; relative minimal detectable change at 95% confidence level (MDC95%) = 29%-45%]. The reliability of the CRR (ICC2,1 = 0.58–0.81) and the DTE in walking time (ICC2,1 = 0.11–0.80) was more varied. The reliability of the DTE in CRR (ICC2,1 = -0.31–0.40) was poor to fair. The walking time and CRR obtained in various dual-task walking tests were moderately to strongly correlated with those of the dual-task Timed-up-and-Go test, thus demonstrating good concurrent validity. None of the tests could discriminate fallers (those who had sustained at least one fall in the past year) from non-fallers. Limitation The results are generalizable to community-dwelling individuals with chronic stroke only. Conclusions The walking time derived from the various dual-task assessments generally demonstrated good to excellent reliability, making them potentially useful in clinical practice and future research endeavors. However, the usefulness of these measurements in predicting falls needs to be further explored. Relatively low reliability was shown in the cognitive outcomes and DTE, which may not be preferred measurements for assessing dual-task performance. PMID:26808662
Testing cognitive function in elderly populations: the PROSPER study. PROspective Study of Pravastatin in the Elderly at Risk.

PubMed

Houx, P J; Shepherd, J; Blauw, G-J; Murphy, M B; Ford, I; Bollen, E L; Buckley, B; Stott, D J; Jukema, W; Hyland, M; Gaw, A; Norrie, J; Kamper, A M; Perry, I J; MacFarlane, P W; Meinders, A Edo; Sweeney, B J; Packard, C J; Twomey, C; Cobbe, S M; Westendorp, R G

2002-10-01

For large scale follow up studies with non-demented patients in which cognition is an endpoint, there is a need for short, inexpensive, sensitive, and reliable neuropsychological tests that are suitable for repeated measurements. The commonly used Mini-Mental-State-Examination fulfils only the first two requirements. In the PROspective Study of Pravastatin in the Elderly at Risk (PROSPER), 5804 elderly subjects aged 70 to 82 years were examined using a learning test (memory), a coding test (general speed), and a short version of the Stroop test (attention). Data presented here were collected at dual baseline, before randomisation for active treatment. The tests proved to be reliable (with test/retest reliabilities ranging from acceptable (r=0.63) to high (r=0.88) and sensitive to detect small differences in subjects from different age categories. All tests showed significant practice effects: performance increased from the first measurement to the first follow up after two weeks. Normative data are provided that can be used for one time neuropsychological testing as well as for assessing individual and group change. Methods for analysing cognitive change are proposed.
Psychometric Properties of Difficulties of Working with Patients with Personality Disorders and Attitudes Towards Patients with Personality Disorders Scales.

PubMed

Eren, Nurhan

2014-12-01

In this study, we aimed to develop two reliable and valid assessment instruments for investigating the level of difficulties mental health workers experience while working with patients with personality disorders and the attitudes they develop tt the patients. The research was carried out based on the general screening model. The study sample consisted of 332 mental health workers in several mental health clinics of Turkey, with a certain amount of experience in working with personality disorders, who were selected with a random assignment method. In order to collect data, the Personal Information Questionnaire, Difficulty of Working with Personality Disorders Scale (PD-DWS), and Attitudes Towards Patients with Personality Disorders Scale (PD-APS), which are being examined for reliability and validity, were applied. To determine construct validity, the Adjective Check List, Maslach Burnout Inventory, and State and Trait Anxiety Inventory were used. Explanatory factor analysis was used for investigating the structural validity, and Cronbach alpha, Spearman-Brown, Guttman Split-Half reliability analyses were utilized to examine the reliability. Also, item reliability and validity computations were carried out by investigating the corrected item-total correlations and discriminative indexes of the items in the scales. For the PD-DWS KMO test, the value was .946; also, a significant difference was found for the Bartlett sphericity test (p<.001). The computed test-retest coefficient reliability was .702; the Cronbach alpha value of the total test score was .952. For PD-APS KMO, the value was .925; a significant difference was found in Bartlett sphericity test (p<.001); the computed reliability coefficient based on continuity was .806; and the Cronbach alpha value of the total test score was .913. Analyses on both scales were based on total scores. It was found that PD-DWS and PD-APS have good psychometric properties, measuring the structure that is being investigated, are compatible with other scales, have high levels of internal reliability between their items, and are consistent across time. Therefore, it was concluded that both scales are valid and reliable instruments.
Validation of alternative methods for toxicity testing.

PubMed Central

Bruner, L H; Carr, G J; Curren, R D; Chamberlain, M

1998-01-01

Before nonanimal toxicity tests may be officially accepted by regulatory agencies, it is generally agreed that the validity of the new methods must be demonstrated in an independent, scientifically sound validation program. Validation has been defined as the demonstration of the reliability and relevance of a test method for a particular purpose. This paper provides a brief review of the development of the theoretical aspects of the validation process and updates current thinking about objectively testing the performance of an alternative method in a validation study. Validation of alternative methods for eye irritation testing is a specific example illustrating important concepts. Although discussion focuses on the validation of alternative methods intended to replace current in vivo toxicity tests, the procedures can be used to assess the performance of alternative methods intended for other uses. Images Figure 1 PMID:9599695
Accelerated stress testing of thin film solar cells: Development of test methods and preliminary results

NASA Technical Reports Server (NTRS)

Lathrop, J. W.

1985-01-01

If thin film cells are to be considered a viable option for terrestrial power generation their reliability attributes will need to be explored and confidence in their stability obtained through accelerated testing. Development of a thin film accelerated test program will be more difficult than was the case for crystalline cells because of the monolithic construction nature of the cells. Specially constructed test samples will need to be fabricated, requiring committment to the concept of accelerated testing by the manufacturers. A new test schedule appropriate to thin film cells will need to be developed which will be different from that used in connection with crystalline cells. Preliminary work has been started to seek thin film schedule variations to two of the simplest tests: unbiased temperature and unbiased temperature humidity. Still to be examined are tests which involve the passage of current during temperature and/or humidity stress, either by biasing in the forward (or reverse) directions or by the application of light during stress. Investigation of these current (voltage) accelerated tests will involve development of methods of reliably contacting the thin conductive films during stress.
Large-Scale Multiobjective Static Test Generation for Web-Based Testing with Integer Programming

ERIC Educational Resources Information Center

Nguyen, M. L.; Hui, Siu Cheung; Fong, A. C. M.

2013-01-01

Web-based testing has become a ubiquitous self-assessment method for online learning. One useful feature that is missing from today's web-based testing systems is the reliable capability to fulfill different assessment requirements of students based on a large-scale question data set. A promising approach for supporting large-scale web-based…
A Methodology for the Development of a Reliability Database for an Advanced Reactor Probabilistic Risk Assessment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grabaskas, Dave; Brunett, Acacia J.; Bucknor, Matthew

GE Hitachi Nuclear Energy (GEH) and Argonne National Laboratory are currently engaged in a joint effort to modernize and develop probabilistic risk assessment (PRA) techniques for advanced non-light water reactors. At a high level the primary outcome of this project will be the development of next-generation PRA methodologies that will enable risk-informed prioritization of safety- and reliability-focused research and development, while also identifying gaps that may be resolved through additional research. A subset of this effort is the development of a reliability database (RDB) methodology to determine applicable reliability data for inclusion in the quantification of the PRA. The RDBmore » method developed during this project seeks to satisfy the requirements of the Data Analysis element of the ASME/ANS Non-LWR PRA standard. The RDB methodology utilizes a relevancy test to examine reliability data and determine whether it is appropriate to include as part of the reliability database for the PRA. The relevancy test compares three component properties to establish the level of similarity to components examined as part of the PRA. These properties include the component function, the component failure modes, and the environment/boundary conditions of the component. The relevancy test is used to gauge the quality of data found in a variety of sources, such as advanced reactor-specific databases, non-advanced reactor nuclear databases, and non-nuclear databases. The RDB also establishes the integration of expert judgment or separate reliability analysis with past reliability data. This paper provides details on the RDB methodology, and includes an example application of the RDB methodology for determining the reliability of the intermediate heat exchanger of a sodium fast reactor. The example explores a variety of reliability data sources, and assesses their applicability for the PRA of interest through the use of the relevancy test.« less
Reliability and validity of urinary nerve growth factor measurement in women with lower urinary tract symptoms.

PubMed

Vijaya, Gopalan; Cartwright, Rufus; Bhide, Alka; Derpapas, Alexandros; Fernando, Ruwan; Khullar, Vik

2016-11-01

The validity and reliability of measurement of urinary NGF as a diagnostic biomarker in women with lower urinary tract dysfunction (LUTD) is uncertain. We aimed to evaluate both the diagnostic and discriminant validity, and the test-retest reliability of urinary NGF measurement in women with LUTD. Urinary NGF was measured in women with LUTD (n = 205) and asymptomatic subjects (n = 31). Urinary NGF was assayed using an ELISA method and normalized against urinary creatinine. NGF/creatinine ratios were compared between symptom subgroups using Mann-Whitney U test, and between different urodynamic diagnoses using the Kruskal-Wallis test. Receiver Operator Characteristic (ROC) analysis was employed to evaluate the diagnostic performance of urinary NGF. Test-retest reliability of NGF measurement was assessed using intra-class correlation (ICC). Urinary NGF was significantly but non-specifically increased in symptomatic patients when compared to controls (13.33 vs. 2.05 ng NGF/g Cr, P < 0.001). On multivariate logistic regression NGF was a good predictor of patients having OAB or not, however, the adjusted odds ratio only 1.006. ROC analysis demonstrated poor discriminant ability between different symptomatic groups and urodynamic groups. Using a cut off of 13.0 ng NGF/g creatinine the test provides a sensitivity of 81%, but a specificity of only 39% for overactive bladder. The assays demonstrated good test-retest reliability with ICC of 0.889. Although urinary NGF can be reliably assayed, and is increased in various LUTDs, it discriminates poorly between these disorders therefore has very limited potential as a biomarker. Neurourol. Urodynam. 35:944-948, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Measuring the Process and Quality of Informed Consent for Clinical Research: Development and Testing

PubMed Central

Cohn, Elizabeth Gross; Jia, Haomiao; Smith, Winifred Chapman; Erwin, Katherine; Larson, Elaine L.

2013-01-01

Purpose/Objectives To develop and assess the reliability and validity of an observational instrument, the Process and Quality of Informed Consent (P-QIC). Design A pilot study of the psychometrics of a tool designed to measure the quality and process of the informed consent encounter in clinical research. The study used professionally filmed, simulated consent encounters designed to vary in process and quality. Setting A major urban teaching hospital in the northeastern region of the United States. Sample 63 students enrolled in health-related programs participated in psychometric testing, 16 students participated in test-retest reliability, and 5 investigator-participant dyads were observed for the actual consent encounters. Methods For reliability and validity testing, students watched and rated videotaped simulations of four consent encounters intentionally varied in process and content and rated them with the proposed instrument. Test-retest reliability was established by raters watching the videotaped simulations twice. Inter-rater reliability was demonstrated by two simultaneous but independent raters observing an actual consent encounter. Main Research Variables The essential elements of information and communication for informed consent. Findings The initial testing of the P-QIC demonstrated reliable and valid psychometric properties in both the simulated standardized consent encounters and actual consent encounters in the hospital setting. Conclusions The P-QIC is an easy-to-use observational tool that provides a quick assessment of the areas of strength and areas that need improvement in a consent encounter. It can be used in the initial trainings of new investigators or consent administrators and in ongoing programs of improvement for informed consent. Implications for Nursing The development of a validated observational instrument will allow investigators to assess the consent process more accurately and evaluate strategies designed to improve it. PMID:21708532
The impact of symptom stability on time frame and recall reliability in CFS.

PubMed

Evans, Meredyth; Jason, Leonard A

This study is an investigation of the potential impact of perceived symptom stability on the recall reliability of symptom severity and frequency as reported by individuals with chronic fatigue syndrome (CFS). Symptoms were recalled using three different recall timeframes (the past week, the past month, and the past six months) and at two assessment points (with one week in between each assessment). Participants were 51 adults (45 women and 6 men), between the ages of 29 and 66 with a current diagnosis of CFS. Multilevel Model (MLM) Analyses were used to determine the optimal recall timeframe (in terms of test-retest reliability) for reporting symptoms perceived as variable and as stable over time. Headaches were recalled more reliably when they were reported as stable over time. Furthermore, the optimal timeframe in terms of test-retest reliability for stable symptoms was highly uniform, such that all Fukuda 1 CFS symptoms were more reliably recalled at the six month timeframe. Furthermore, the optimal timeframe for CFS symptoms perceived as variable, differed across symptoms. Symptom stability and recall timeframe are important to consider in order to improve the accuracy and reliability of the current methods for diagnosing this illness.
An Adaptation of the Original Fresno Test to Measure Evidence-Based Practice Competence in Pediatric Bedside Nurses.

PubMed

Laibhen-Parkes, Natasha; Kimble, Laura P; Melnyk, Bernadette Mazurek; Sudia, Tanya; Codone, Susan

2018-06-01

Instruments used to assess evidence-based practice (EBP) competence in nurses have been subjective, unreliable, or invalid. The Fresno test was identified as the only instrument to measure all the steps of EBP with supportive reliability and validity data. However, the items and psychometric properties of the original Fresno test are only relevant to measure EBP with medical residents. Therefore, the purpose of this paper is to describe the development of the adapted Fresno test for pediatric nurses, and provide preliminary validity and reliability data for its use with Bachelor of Science in Nursing-prepared pediatric bedside nurses. General adaptations were made to the original instrument's case studies, item content, wording, and format to meet the needs of a pediatric nursing sample. The scoring rubric was also modified to complement changes made to the instrument. Content and face validity, and intrarater reliability of the adapted Fresno test were assessed during a mixed-methods pilot study conducted from October to December 2013 with 29 Bachelor of Science in Nursing-prepared pediatric nurses. Validity data provided evidence for good content and face validity. Intrarater reliability estimates were high. The adapted Fresno test presented here appears to be a valid and reliable assessment of EBP competence in Bachelor of Science in Nursing-prepared pediatric nurses. However, further testing of this instrument is warranted using a larger sample of pediatric nurses in diverse settings. This instrument can be a starting point for evaluating the impact of EBP competence on patient outcomes. © 2018 Sigma Theta Tau International.
R&D of high reliable refrigeration system for superconducting generators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hosoya, T.; Shindo, S.; Yaguchi, H.

1996-12-31

Super-GM carries out R&D of 70 MW class superconducting generators (model machines), refrigeration system and superconducting wires to apply superconducting technology to electric power apparatuses. The helium refrigeration system for keeping field windings of superconducting generator (SCG) in cryogenic environment must meet the requirement of high reliability for uninterrupted long term operation of the SCG. In FY 1992, a high reliable conventional refrigeration system for the model machines was integrated by combining components such as compressor unit, higher temperature cold box and lower temperature cold box which were manufactured utilizing various fundamental technologies developed in early stage of the projectmore » since 1988. Since FY 1993, its performance tests have been carried out. It has been confirmed that its performance was fulfilled the development target of liquefaction capacity of 100 L/h and impurity removal in the helium gas to < 0.1 ppm. Furthermore, its operation method and performance were clarified to all different modes as how to control liquefaction rate and how to supply liquid helium from a dewar to the model machine. In addition, the authors have made performance tests and system performance analysis of oil free screw type and turbo type compressors which greatly improve reliability of conventional refrigeration systems. The operation performance and operational control method of the compressors has been clarified through the tests and analysis.« less
Detection of ingested nitromethane and reliable creatinine assessment using multiple common analytical methods.

PubMed

Murphy, Christine M; Devlin, John J; Beuhler, Michael C; Cheifetz, Paul; Maynard, Susan; Schwartz, Michael D; Kacinko, Sherri

2018-04-01

Nitromethane, found in fuels used for short distance racing, model cars, and model airplanes, produces a falsely elevated serum creatinine with standard creatinine analysis via the Jaffé method. Erroneous creatinine elevation often triggers extensive testing, leads to inaccurate diagnoses, and delayed or inappropriate medical interventions. Multiple reports in the literature identify "enzymatic assays" as an alternative method to detect the true value of creatinine, but this ambiguity does not help providers translate what type of enzymatic assay testing can be done in real time to determine if there is indeed false elevation. We report seven cases of ingested nitromethane where creatinine was determined via Beckman Coulter ® analyser using the Jaffé method, Vitros ® analyser, or i-Stat ® point-of-care testing. Nitromethane was detected and semi-quantified using a common clinical toxic alcohol analysis method, and quantified by headspace-gas chromatography-mass spectrometry. When creatinine was determined using i-Stat ® point-of-care testing or a Vitros ® analyser, levels were within the normal range. Comparatively, all initial creatinine levels obtained via the Jaffé method were elevated. Nitromethane concentrations ranged from 42 to 310 μg/mL. These cases demonstrate reliable assessment of creatinine through other enzymatic methods using a Vitros ® analyser or i-STAT ® . Additionally, nitromethane is detectable and quantifiable using routine alcohols gas chromatography analysis and by headspace-gas chromatography-mass spectrometry.

A Test of the DSP Sexing Method on CT Images from a Modern French Sample.

PubMed

Mestekova, Sarka; Bruzek, Jaroslav; Veleminska, Jana; Chaumoitre, Kathia

2015-09-01

The hip bone is considered to be one of the most reliable indicators in sex determination. The aim of this study was to test the reliability of the DSP method for the hip bone proposed by Murail et al. (Bull Mem Soc Anthropol Paris, 17, 2005, 167) on a sample from a present-day population in France (52 males and 54 females). Ten linear measurements were collected from three-dimensional models derived from computed tomography images (CTI). To quantify the proportions of correct sex determinations, a more rigorous posterior probability threshold of 0.95 was applied. Using all 10 measurements, 92.3% of males and 97.2% of females were sexed correctly. The percentage of undetermined specimens varied depending on the used combination of measurements; however, all sexes were assigned with a 100% accuracy. This study proves that DSP is an appropriate and reliable tool for sex determination, based on dimensions obtained from CTI. © 2015 American Academy of Forensic Sciences.
Accuracy and reliability testing of two methods to measure internal rotation of the glenohumeral joint.

PubMed

Hall, Justin M; Azar, Frederick M; Miller, Robert H; Smith, Richard; Throckmorton, Thomas W

2014-09-01

We compared accuracy and reliability of a traditional method of measurement (most cephalad vertebral spinous process that can be reached by a patient with the extended thumb) to estimates made with the shoulder in abduction to determine if there were differences between the two methods. Six physicians with fellowship training in sports medicine or shoulder surgery estimated measurements in 48 healthy volunteers. Three were randomly chosen to make estimates of both internal rotation measurements for each volunteer. An independent observer made objective measurements on lateral scoliosis films (spinous process method) or with a goniometer (abduction method). Examiners were blinded to objective measurements as well as to previous estimates. Intraclass coefficients for interobserver reliability for the traditional method averaged 0.75, indicating good agreement among observers. The difference in vertebral level estimated by the examiner and the actual radiographic level averaged 1.8 levels. The intraclass coefficient for interobserver reliability for the abduction method averaged 0.81 for all examiners, indicating near-perfect agreement. Confidence intervals indicated that estimates were an average of 8° different from the objective goniometer measurements. Pearson correlation coefficients of intraobserver reliability for the abduction method averaged 0.94, indicating near-perfect agreement within observers. Confidence intervals demonstrated repeated estimates between 5° and 10° of the original. Internal rotation estimates made with the shoulder abducted demonstrated interobserver reliability superior to that of spinous process estimates, and reproducibility was high. On the basis of this finding, we now take glenohumeral internal rotation measurements with the shoulder in abduction and use a goniometer to maximize accuracy and objectivity. Copyright © 2014 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Mosby, Inc. All rights reserved.
Aerospace Payloads Leak Test Methodology

NASA Technical Reports Server (NTRS)

Lvovsky, Oleg; Grayson, Cynthia M.

2010-01-01

Pressurized and sealed aerospace payloads can leak on orbit. When dealing with toxic or hazardous materials, requirements for fluid and gas leakage rates have to be properly established, and most importantly, reliably verified using the best Nondestructive Test (NDT) method available. Such verification can be implemented through application of various leak test methods that will be the subject of this paper, with a purpose to show what approach to payload leakage rate requirement verification is taken by the National Aeronautics and Space Administration (NASA). The scope of this paper will be mostly a detailed description of 14 leak test methods recommended.
Testing two methods to create comparable scale scores between the Job Content Questionnaire (JCQ) and JCQ-like questionnaires in the European JACE Study.

PubMed

Karasek, Robert; Choi, BongKyoo; Ostergren, Per-Olof; Ferrario, Marco; De Smet, Patrick

2007-01-01

Scale comparative properties of "JCQ-like" questionnaires with respect to the JCQ have been little known. Assessing validity and reliability of two methods for generating comparable scale scores between the Job Content Questionnaire (JCQ) and JCQ-like questionnaires in sub-populations of the large Job Stress, Absenteeism and Coronary Heart Disease European Cooperative (JACE) study: the Swedish version of Demand-Control Questionnaire (DCQ) and a transformed Multinational Monitoring of Trends and Determinants in Cardiovascular Disease Project (MONICA) questionnaire. A random population sample of all Malmo males and females aged 52-58 (n = 682) years was given a new test questionnaire with both instruments (the JCQ and the DCQ). Comparability-facilitating algorithms were created (Method I). For the transformed Milan MONICA questionnaire, a simple weighting system was used (Method II). The converted scale scores from the JCQ-like questionnaires were found to be reliable and highly correlated to those of the original JCQ. However, agreements for the high job strain group between the JCQ and the DCQ, and between the JCQ and the DCQ (Method I applied) were only moderate (Kappa). Use of a multiple level job strain scale generated higher levels of job strain agreement, as did a new job strain definition that excludes the intermediate levels of the job strain distribution. The two methods were valid and generally reliable.
Paediatric Automatic Phonological Analysis Tools (APAT).

PubMed

Saraiva, Daniela; Lousada, Marisa; Hall, Andreia; Jesus, Luis M T

2017-12-01

To develop the pediatric Automatic Phonological Analysis Tools (APAT) and to estimate inter and intrajudge reliability, content validity, and concurrent validity. The APAT were constructed using Excel spreadsheets with formulas. The tools were presented to an expert panel for content validation. The corpus used in the Portuguese standardized test Teste Fonético-Fonológico - ALPE produced by 24 children with phonological delay or phonological disorder was recorded, transcribed, and then inserted into the APAT. Reliability and validity of APAT were analyzed. The APAT present strong inter- and intrajudge reliability (>97%). The content validity was also analyzed (ICC = 0.71), and concurrent validity revealed strong correlations between computerized and manual (traditional) methods. The development of these tools contributes to fill existing gaps in clinical practice and research, since previously there were no valid and reliable tools/instruments for automatic phonological analysis, which allowed the analysis of different corpora.
Lifetime Reliability Evaluation of Structural Ceramic Parts with the CARES/LIFE Computer Program

NASA Technical Reports Server (NTRS)

Nemeth, Noel N.; Powers, Lynn M.; Janosik, Lesley A.; Gyekenyesi, John P.

1993-01-01

The computer program CARES/LIFE calculates the time-dependent reliability of monolithic ceramic components subjected to thermomechanical and/or proof test loading. This program is an extension of the CARES (Ceramics Analysis and Reliability Evaluation of Structures) computer program. CARES/LIFE accounts for the phenomenon of subcritical crack growth (SCG) by utilizing the power law, Paris law, or Walker equation. The two-parameter Weibull cumulative distribution function is used to characterize the variation in component strength. The effects of multiaxial stresses are modeled using either the principle of independent action (PIA), Weibull's normal stress averaging method (NSA), or Batdorf's theory. Inert strength and fatigue parameters are estimated from rupture strength data of naturally flawed specimens loaded in static, dynamic, or cyclic fatigue. Two example problems demonstrating cyclic fatigue parameter estimation and component reliability analysis with proof testing are included.
Reliability and validity of the Attributional Style Questionnaire- Survey in people with multiple sclerosis

PubMed Central

Kneebone, Ian I.; Dewar, Sophie J.

2016-01-01

Background: The current study aimed to examine the psychometric properties of an attributional style measure that can be administered remotely, to people who have multiple sclerosis (MS). Methods: A total of 495 participants with MS were recruited. Participants completed the Attributional Style Questionnaire-Survey (ASQ-S) and two comparison measures of cognitive variables via postal survey on three occasions, each 12 months apart. Internal reliability, test-retest reliability and congruent validity were considered. Results: The internal reliability of the ASQ-S was good (α > 0.7). The test-retest correlations were significant, but failed to reach the 0.7 set. The congruent validity of the ASQ-S was established relative to the comparisons. Conclusions: The psychometric properties of the ASQ-S indicate that it shows promise as a tool for researchers investigating depression in people with MS and is likely sound to use clinically in this population. PMID:28450893
An Evaluation method for C2 Cyber-Physical Systems Reliability Based on Deep Learning

DTIC Science & Technology

2014-06-01

the reliability testing data of the system, we obtain the prior distribution of the relia- bility is 1 1( ) ( ; , )R LG R r  . By Bayes theo- rem ...criticality cyber-physical sys- tems[C]//Proc of ICDCS. Piscataway, NJ: IEEE, 2010:169-178. [17] Zimmer C, Bhat B, Muller F, et al. Time-based intrusion de
A Motor Speech Assessment for Children with Severe Speech Disorders: Reliability and Validity Evidence

ERIC Educational Resources Information Center

Strand, Edythe A.; McCauley, Rebecca J.; Weigand, Stephen D.; Stoeckel, Ruth E.; Baas, Becky S.

2013-01-01

Purpose: In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Method: Participants were 81 children between 36 and 79 months of age who were referred to the…
Valid and Reliable Measures of Cognitive Behaviors toward Fruits and Vegetables for Children Aged 9 to 11 Years

ERIC Educational Resources Information Center

Lohse, Barbara; Cunningham-Sabo, Leslie; Walters, Lynn M.; Stacey, Jane E.

2011-01-01

Objective: To examine reliability of validity-tested instruments measuring fruit and vegetable (FV) preference and self-efficacy (SE) for and attitude (AT) toward cooking. Methods: In Santa Fe, New Mexico, following cognitive interviews with 123 fourth- and fifth-graders, surveys were administered twice, less than 2 weeks apart, to students in 16…
Chemical dependency and drug testing in the workplace.

PubMed

Osterloh, J D; Becker, C E

1990-01-01

Urine testing for drug use in the workplace is now widespread, with the prevalence of positive drug tests in the work force being 0% to 15%. The prevalence of marijuana use is highest of the illicit drugs being tested. Highly prevalent drugs can be reliably tested. Although it is prudent to rid the workplace of drug use, there is little scientific study on the relationship of drug use and workplace outcomes, such as productivity and safety. Probable-cause testing and preemployment testing are the most common applications. Random testing has been less accepted owing to its higher costs, unresolved legal issues, and predictably poor test reliability. Legal issues have focused on the right to privacy, policy agreements, discrimination, and the lack of due process. The legal cornerstone of a good program is a policy that is planned and agreed on by both labor and management, which serves both as a contract and as a procedure in which expectations and consequences are known. Moreover, NIDA is certifying laboratories doing employee drug testing. Testing methods, when done correctly, are less prone to error than in the past, but screening tests can be defeated by adulterants. Although the incidence of false-positive results is low, such tests are less reliable when the prevalence of drug abuse is also low.
Development of an International Odor Identification Test for Children: The Universal Sniff Test.

PubMed

Schriever, Valentin A; Agosin, Eduardo; Altundag, Aytug; Avni, Hadas; Cao Van, Helene; Cornejo, Carlos; de Los Santos, Gonzalo; Fishman, Gad; Fragola, Claudio; Guarneros, Marco; Gupta, Neelima; Hudson, Robyn; Kamel, Reda; Knaapila, Antti; Konstantinidis, Iordanis; Landis, Basile N; Larsson, Maria; Lundström, Johan N; Macchi, Alberto; Mariño-Sánchez, Franklin; Martinec Nováková, Lenka; Mori, Eri; Mullol, Joaquim; Nord, Marie; Parma, Valentina; Philpott, Carl; Propst, Evan J; Rawan, Ahmed; Sandell, Mari; Sorokowska, Agnieszka; Sorokowski, Piotr; Sparing-Paschke, Lisa-Marie; Stetzler, Carolin; Valder, Claudia; Vodicka, Jan; Hummel, Thomas

2018-07-01

To assess olfactory function in children and to create and validate an odor identification test to diagnose olfactory dysfunction in children, which we called the Universal Sniff (U-Sniff) test. This is a multicenter study involving 19 countries. The U-Sniff test was developed in 3 phases including 1760 children age 5-7 years. Phase 1: identification of potentially recognizable odors; phase 2: selection of odorants for the odor identification test; and phase 3: evaluation of the test and acquisition of normative data. Test-retest reliability was evaluated in a subgroup of children (n = 27), and the test was validated using children with congenital anosmia (n = 14). Twelve odors were familiar to children and, therefore, included in the U-Sniff test. Children scored a mean ± SD of 9.88 ± 1.80 points out of 12. Normative data was obtained and reported for each country. The U-Sniff test demonstrated a high test-retest reliability (r 27 = 0.83, P < .001) and enabled discrimination between normosmia and children with congenital anosmia with a sensitivity of 100% and specificity of 86%. The U-Sniff is a valid and reliable method of testing olfaction in children and can be used internationally. Copyright © 2018 Elsevier Inc. All rights reserved.
Markov chains for testing redundant software

NASA Technical Reports Server (NTRS)

White, Allan L.; Sjogren, Jon A.

1988-01-01

A preliminary design for a validation experiment has been developed that addresses several problems unique to assuring the extremely high quality of multiple-version programs in process-control software. The procedure uses Markov chains to model the error states of the multiple version programs. The programs are observed during simulated process-control testing, and estimates are obtained for the transition probabilities between the states of the Markov chain. The experimental Markov chain model is then expanded into a reliability model that takes into account the inertia of the system being controlled. The reliability of the multiple version software is computed from this reliability model at a given confidence level using confidence intervals obtained for the transition probabilities during the experiment. An example demonstrating the method is provided.
Sport-specific endurance plank test for evaluation of global core muscle function.

PubMed

Tong, Tom K; Wu, Shing; Nie, Jinlei

2014-02-01

To examine the validity and reliability of a sports-specific endurance plank test for the evaluation of global core muscle function. Repeated-measures study. Laboratory environment. Twenty-eight male and eight female young athletes. Surface electromyography (sEMG) of selected trunk flexors and extensors, and an intervention of pre-fatigue core workout were applied for test validation. Intraclass correlation coefficient (ICC), coefficient of variation (CV), and the measurement bias ratio */÷ ratio limits of agreement (LOA) were calculated to assess reliability and measurement error. Test validity was shown by the sEMG of selected core muscles, which indicated >50% increase in muscle activation during the test; and the definite discrimination of the ∼30% reduction in global core muscle endurance subsequent to a pre-fatigue core workout. For test-retest reliability, when the first attempt of three repeated trials was considered as familiarisation, the ICC was 0.99 (95% CI: 0.98-0.99), CV was 2.0 ± 1.56% and the measurement bias ratio */÷ ratio LOA was 0.99 */÷ 1.07. The findings suggest that the sport-specific endurance plank test is a valid, reliable and practical method for assessing global core muscle endurance in athletes given that at least one familiarisation trial takes place prior to measurement. Copyright © 2013 Elsevier Ltd. All rights reserved.
[Translation and Development of the Chinese-Version Patient Privacy Scale].

PubMed

Chen, Li; Feng, Xian-Qiong; Yang, Xiao-Li; Li, Luo-Hong

2017-06-01

The unauthorized releasing of confidential patient information is a serious problem worldwide. Nurses, the healthcare professionals who are in most frequent contact with patients, have access to a significant amount of confidential patient information and play a key role in protecting patient privacy. However, currently, there is no proper tool to measure the level to which clinical nurses protect the privacy of their patients in China. To translate the patient privacy scale (PPS) into Chinese and to test the reliability and validity of this Chinese version. The original scale was developed by Özturk, Bahcecik, and Özçelik (2014) to identify whether nurses protect or violate patient privacy in the workplace. This study used the "back translation" method to translate the scale. A total of 616 nurses in two tertiary hospitals in the Western region of China were enrolled to test the internal consistency, test-retest reliability, and construct validity of the translated scale. The Cronbach's coefficients of the total scale and its 5 factors ranged from .84 to .94; the split half reliability was .91; the test-retest reliability was .82; and the content validity index was .95. Explanatory factor analysis revealed that the 5 factors explained 64.98% of the total variance. The Chinese version of the PPS is reliable and valid, and may be used to reliably assess the behaviors of nurses with regard to protecting the privacy of their patients. The scale may also be used to evaluate the effects of training on patient privacy protection.
Reliability of Causality Assessment for Drug, Herbal and Dietary Supplement Hepatoxicity in the Drug-Induced Liver Injury Network (DILIN)

PubMed Central

Hayashi, Paul H.; Barnhart, Huiman X.; Fontana, Robert J.; Chalasani, Naga; Davern, Timothy J.; Talwalkar, Jayant A.; Reddy, K. Rajender; Stolz, Andrew A.; Hoofnagle, Jay H.; Rockey, Don C.

2014-01-01

Background Due to the lack of objective tests to diagnose drug induced liver injury (DILI), causality assessment is a matter of debate. Expert opinion is often used in research and industry but its test-retest reliability is unknown. Aims To determine the test-retest reliability of the expert opinion process used by the Drug-Induced Liver Injury Network (DILIN) Methods Three DILIN hepatologists adjudicate suspected hepatotoxicity cases to 1 of 5 categories representing levels of likelihood of DILI. Adjudication is based on retrospective assessment of gathered case data that includes prospective follow-up information. One hundred randomly selected DILIN cases were re-assessed using the same processes for initial assessment but by 3 different reviewers in 92% of cases. Results The median time between assessments was 938 days (range: 140–2352). Thirty-one cases involved >1 agent. Weighted kappa statistics for overall case and individual agent category agreement were 0.60 (95% CI: 0.50–0.71) and 0.60 (0.52–0.68), respectively. Overall case adjudications were within one category of each other 93% of the time, while 5% differed by 2 categories and 2% differed by 3 categories. Fourteen-percent crossed the 50% threshold of likelihood due to competing diagnoses or atypical timing between drug exposure and injury. Conclusions The DILIN expert opinion causality assessment method has moderate inter-observer reliability but very good agreement within 1 category. A small but important proportion of cases could not be reliably diagnosed as ≥ 50% likely to be DILI. PMID:24661785
The Assumption of a Reliable Instrument and Other Pitfalls to Avoid When Considering the Reliability of Data

PubMed Central

Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.

2012-01-01

The purpose of this article is to help researchers avoid common pitfalls associated with reliability including incorrectly assuming that (a) measurement error always attenuates observed score correlations, (b) different sources of measurement error originate from the same source, and (c) reliability is a function of instrumentation. To accomplish our purpose, we first describe what reliability is and why researchers should care about it with focus on its impact on effect sizes. Second, we review how reliability is assessed with comment on the consequences of cumulative measurement error. Third, we consider how researchers can use reliability generalization as a prescriptive method when designing their research studies to form hypotheses about whether or not reliability estimates will be acceptable given their sample and testing conditions. Finally, we discuss options that researchers may consider when faced with analyzing unreliable data. PMID:22518107
Reliability design and verification for launch-vehicle propulsion systems - Report of an AIAA Workshop, Washington, DC, May 16, 17, 1989

NASA Astrophysics Data System (ADS)

Launch vehicle propulsion system reliability considerations during the design and verification processes are discussed. The tools available for predicting and minimizing anomalies or failure modes are described and objectives for validating advanced launch system propulsion reliability are listed. Methods for ensuring vehicle/propulsion system interface reliability are examined and improvements in the propulsion system development process are suggested to improve reliability in launch operations. Also, possible approaches to streamline the specification and procurement process are given. It is suggested that government and industry should define reliability program requirements and manage production and operations activities in a manner that provides control over reliability drivers. Also, it is recommended that sufficient funds should be invested in design, development, test, and evaluation processes to ensure that reliability is not inappropriately subordinated to other management considerations.
A tonic heat test stimulus yields a larger and more reliable conditioned pain modulation effect compared to a phasic heat test stimulus

PubMed Central

Lie, Marie Udnesseter; Matre, Dagfinn; Hansson, Per; Stubhaug, Audun; Zwart, John-Anker; Nilsen, Kristian Bernhard

2017-01-01

Abstract Introduction: The interest in conditioned pain modulation (CPM) as a clinical tool for measuring endogenously induced analgesia is increasing. There is, however, large variation in the CPM methodology, hindering comparison of results across studies. Research comparing different CPM protocols is needed in order to obtain a standardized test paradigm. Objectives: The aim of the study was to assess whether a protocol with phasic heat stimuli as test-stimulus is preferable to a protocol with tonic heat stimulus as test-stimulus. Methods: In this experimental crossover study, we compared 2 CPM protocols with different test-stimulus; one with tonic test-stimulus (constant heat stimulus of 120-second duration) and one with phasic test-stimuli (3 heat stimulations of 5 seconds duration separated by 10 seconds). Conditioning stimulus was a 7°C water bath in parallel with the test-stimulus. Twenty-four healthy volunteers were assessed on 2 occasions with minimum 1 week apart. Differences in the magnitude and test–retest reliability of the CPM effect in the 2 protocols were investigated with repeated-measures analysis of variance and by relative and absolute reliability indices. Results: The protocol with tonic test-stimulus induced a significantly larger CPM effect compared to the protocol with phasic test-stimuli (P < 0.001). Fair and good relative reliability was found with the phasic and tonic test-stimuli, respectively. Absolute reliability indices showed large intraindividual variability from session to session in both protocols. Conclusion: The present study shows that a CPM protocol with a tonic test-stimulus is preferable to a protocol with phasic test-stimuli. However, we emphasize that one should be cautious to use the CPM effect as biomarker or in clinical decision making on an individual level due to large intraindividual variability. PMID:29392240
Accelerated Test Method for Corrosion Protective Coatings Project

NASA Technical Reports Server (NTRS)

Falker, John; Zeitlin, Nancy; Calle, Luz

2015-01-01

This project seeks to develop a new accelerated corrosion test method that predicts the long-term corrosion protection performance of spaceport structure coatings as accurately and reliably as current long-term atmospheric exposure tests. This new accelerated test method will shorten the time needed to evaluate the corrosion protection performance of coatings for NASA's critical ground support structures. Lifetime prediction for spaceport structure coatings has a 5-year qualification cycle using atmospheric exposure. Current accelerated corrosion tests often provide false positives and negatives for coating performance, do not correlate to atmospheric corrosion exposure results, and do not correlate with atmospheric exposure timescales for lifetime prediction.

[Research progress on mechanical performance evaluation of artificial intervertebral disc].

PubMed

Li, Rui; Wang, Song; Liao, Zhenhua; Liu, Weiqiang

2018-03-01

The mechanical properties of artificial intervertebral disc (AID) are related to long-term reliability of prosthesis. There are three testing methods involved in the mechanical performance evaluation of AID based on different tools: the testing method using mechanical simulator, in vitro specimen testing method and finite element analysis method. In this study, the testing standard, testing equipment and materials of AID were firstly introduced. Then, the present status of AID static mechanical properties test (static axial compression, static axial compression-shear), dynamic mechanical properties test (dynamic axial compression, dynamic axial compression-shear), creep and stress relaxation test, device pushout test, core pushout test, subsidence test, etc. were focused on. The experimental techniques using in vitro specimen testing method and testing results of available artificial discs were summarized. The experimental methods and research status of finite element analysis were also summarized. Finally, the research trends of AID mechanical performance evaluation were forecasted. The simulator, load, dynamic cycle, motion mode, specimen and test standard would be important research fields in the future.
Patch testing in non-immediate cutaneous adverse drug reactions: value of extemporaneous patch tests.

PubMed

Assier, Haudrey; Valeyrie-Allanore, Laurence; Gener, Gwendeline; Verlinde Carvalh, Muriel; Chosidow, Olivier; Wolkenstein, Pierre

2017-11-01

Patch testing following a standardized protocol is reliable for identifying the culprit drug in cutaneous adverse drug reactions (CADRs). However, these patch tests (PTs) require pharmaceutical material and staff, which are not always easily available. To evaluate an extemporaneous PT method in CADRs. We retrospectively analysed data for all patients referred to our department between March 2009 and June 2013 for patch testing after a non-immediate CADR. The patients who supplied their own suspected drugs were tested both with extemporaneous PTs and with conventional PTs. Extemporaneous PTs involved a nurse crushing and diluting the drug in pet. in a ratio of approximately one-third to two-thirds. Standardized PTs were performed according to guidelines, with commercial drugs diluted to 30% or with active ingredients diluted to 10%. We analysed the data for the two PT methods in terms of the number of positive test reactions, drugs tested, and type of CADR for patients in whom the two PT methods were used. In total, 75 of 156 patients underwent the two PT procedures, including 91 double tests. Overall, 21 tests gave positive reactions with the two methods, and 69 other tests gave negative results with the two methods. Our series yielded results similar to those of published series concerning the types of CADR and the drugs responsible. Our results suggest that, for CADRs, if a patient supplies a suspected drug but if the pharmaceutical material and staff are not available for conventional PTs, extemporaneous PTs performed by the nurse with the commercial drug used by the patient can be useful and reliable. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Quantitative method for gait pattern detection based on fiber Bragg grating sensors

NASA Astrophysics Data System (ADS)

Ding, Lei; Tong, Xinglin; Yu, Lie

2017-03-01

This paper presents a method that uses fiber Bragg grating (FBG) sensors to distinguish the temporal gait patterns in gait cycles. Unlike most conventional methods that focus on electronic sensors to collect those physical quantities (i.e., strains, forces, pressure, displacements, velocity, and accelerations), the proposed method utilizes the backreflected peak wavelength from FBG sensors to describe the motion characteristics in human walking. Specifically, the FBG sensors are sensitive to external strain with the result that their backreflected peak wavelength will be shifted according to the extent of the influence of external strain. Therefore, when subjects walk in different gait patterns, the strains on FBG sensors will be different such that the magnitude of the backreflected peak wavelength varies. To test the reliability of the FBG sensor platform for gait pattern detection, the gold standard method using force-sensitive resistors (FSRs) for defining gait patterns is introduced as a reference platform. The reliability of the FBG sensor platform is determined by comparing the detection results between the FBG sensors and FSRs platforms. The experimental results show that the FBG sensor platform is reliable in gait pattern detection and gains high reliability when compared with the reference platform.
Test Review: Prueba del Desarrollo Inicial del Lenguaje.

ERIC Educational Resources Information Center

Crawford, Alan N.

1985-01-01

Concludes that the PDIL (the Spanish version of the Test of Early Language Development) should be used with caution. Since its reliability and validity were determined with the English language version, the method used to translate test items may have some ambiguities, and some illustrations on picture cards may not be culturally appropriate for…
Design and Validation of a Straight-Copy Typewriting Prognostic Test Using Kinesthetic Sensitivity.

ERIC Educational Resources Information Center

Olson, Norma Jean

1979-01-01

Describes the development and application of a kinesthetic sensitivity test to determine whether it is a valid and reliable measure of straight-copy typing speed and accuracy. The author states that this kinesthetic sensitivity instrument may be used as a prognostic aptitude test and recommends administration methods. (MF)
New NREL Method Reduces Uncertainty in Photovoltaic Module Calibrations |

Science.gov Websites

calibration traceability to certified test laboratories. This reliable calibration, in turn, determines the of a spire flash simulator, SOMS outdoor test bed, and LACSS continuous simulator. In NREL's Cell and % (k=2 coverage factor). This value is the lowest reported Pmax uncertainty of any accredited test
The Qtracer2 Program for Tracer-Breakthrough Curve Analysis for Tracer Tests in Karstic Aquifers and Other Hydrologic Systems (2002)

EPA Science Inventory

Tracer testing is generally regarded as the most reliable and efficient method of gathering surface and subsurface hydraulic information. This is especially true for karstic and fractured-rock aquifers. Qualitative tracing tests have been conventionally employed in most karst s...
A Psychometric Review of Norm-Referenced Tests Used to Assess Phonological Error Patterns

ERIC Educational Resources Information Center

Kirk, Celia; Vigeland, Laura

2014-01-01

Purpose: The authors provide a review of the psychometric properties of 6 norm-referenced tests designed to measure children's phonological error patterns. Three aspects of the tests' psychometric adequacy were evaluated: the normative sample, reliability, and validity. Method: The specific criteria used for determining the psychometric…
Chemical dependency and drug testing in the workplace.

PubMed

Osterloh, J D; Becker, C E

1990-05-01

Urine testing for drug use in the workplace is now widespread, with the prevalence of positive drug tests in the work force being 0% to 15%. The prevalence of marijuana use is highest, and this can be reliably tested. Though it is prudent to rid the workplace of drug use, there is little scientific study on the relationship of drug use and workplace outcomes, such as productivity and safety. Probable-cause testing and preemployment testing are the most common applications. Random testing has been less accepted owing to its higher costs, unresolved legal issues, and predictably poor test reliability. Legal issues have focused on the right to policy, discrimination, and the lack of due process. The legal cornerstone of a good program is a policy that is planned and agreed on by both labor and management, which serves both as a contract and as a procedure in which expectations and consequences are known. The National Institute on Drug Abuse is certifying laboratories doing employee drug testing. Testing methods when done correctly are less prone to error than in the past, but screening tests can be defeated by adulterants. Although the incidence of false-positive results is low, such tests are less reliable when the prevalence of drug abuse is also low.
Reliability and validity of the adolescent health profile-types.

PubMed

Riley, A W; Forrest, C B; Starfield, B; Green, B; Kang, M; Ensminger, M

1998-08-01

The purpose of this study was to demonstrate the preliminary reliability and validity of a set 13 profiles of adolescent health that describe distinct patterns of health and health service requirements on four domains of health. Reliability and validity were tested in four ethnically diverse population samples of urban and rural youths aged 11 to 17-years-old in public schools (N = 4,066). The reliability of the classification procedure and construct validity were examined in terms of the predicted and actual distributions of age, gender, race, socioeconomic status, and family type. School achievement, medical conditions, and the proportion of youths with a psychiatric disorder also were examined as tests of construct validity. The classification method was shown to produce consistent results across the four populations in terms of proportions of youths assigned with specific sociodemographic characteristics. Variations in health described by specific profiles showed expected relations to sociodemographic characteristics, family structure, school achievement, medical disorders, and psychiatric disorders. This taxonomy of health profile-types appears to effectively describe a set of patterns that characterize adolescent health. The profile-types provide a unique and practical method for identifying subgroups having distinct needs for health services, with potential utility for health policy and planning. Such integrative reporting methods are critical for more effective utilization of health status instruments in health resource planning and policy development.
The reliability and validity of fatigue measures during multiple-sprint work: an issue revisited.

PubMed

Glaister, Mark; Howatson, Glyn; Pattison, John R; McInnes, Gill

2008-09-01

The ability to repeatedly produce a high-power output or sprint speed is a key fitness component of most field and court sports. The aim of this study was to evaluate the validity and reliability of eight different approaches to quantify this parameter in tests of multiple-sprint performance. Ten physically active men completed two trials of each of two multiple-sprint running protocols with contrasting recovery periods. Protocol 1 consisted of 12 x 30-m sprints repeated every 35 seconds; protocol 2 consisted of 12 x 30-m sprints repeated every 65 seconds. All testing was performed in an indoor sports facility, and sprint times were recorded using twin-beam photocells. All but one of the formulae showed good construct validity, as evidenced by similar within-protocol fatigue scores. However, the assumptions on which many of the formulae were based, combined with poor or inconsistent test-retest reliability (coefficient of variation range: 0.8-145.7%; intraclass correlation coefficient range: 0.09-0.75), suggested many problems regarding logical validity. In line with previous research, the results support the percentage decrement calculation as the most valid and reliable method of quantifying fatigue in tests of multiple-sprint performance.
Measuring first-line nurse manager work: instrument: development and testing.

PubMed

Cadmus, Edna; Wisniewska, Edyta K

2013-12-01

The objective of this study was to develop and test a 1st-line nurse manager (FLNM) work instrument to measure categories of work and frequency of activities. First-line nurse managers have been demonstrated to be key contributors in meeting organizational outcomes and patient and nurse satisfaction. Identifying the work of FLNMs is essential to help in the development of prioritization and sequence. The need for an instrument that can measure and categorize the work of FLNMs is indicated. The author-developed instrument was administered as a pilot study to 173 FLNMs in New Jersey. Descriptive statistics were analyzed, and validity and reliability were measured. Content validity was established through 2 focus groups using 10 FLNMs and conducting a survey of 5 chief nursing officers. Reliability was assessed by 13 of 16 FLNM participants using the test/retest method and quantified using percent agreement within a 10-day period. Those items with 70% agreement or more were identified as reliable and retained on the instrument. The content validity of the instrument is strong; further refinement and testing of the tool are indicated to improve the reliability and generalizability across multiple populations of leaders and settings.
Validation of the breast evaluation questionnaire for breast hypertrophy and breast reduction.

PubMed

Lewin, Richard; Elander, Anna; Lundberg, Jonas; Hansson, Emma; Thorarinsson, Andri; Claudelin, Malin; Bladh, Helena; Lidén, Mattias

2018-06-13

There is a lack of published, validated questionnaires for evaluating psychosocial morbidity in patients with breast hypertrophy undergoing breast reduction surgery. To validate the breast evaluation questionnaire (BEQ), originally developed for the assessment of breast augmentation patients, for the assessment of psychosocial morbidity in patients with breast hypertrophy undergoing breast reduction surgery. Validation study Subjects: Women with macromastia Methods: The validation of the BEQ, adapted to breast reduction, was performed in several steps. Content validity, reliability, construct validity and responsiveness were assessed. The original version was adjusted according to the results for content validity and resulted in item reduction and a modified BEQ (mBEQ) that was then assessed for reliability, construct validity and responsiveness. Internal and external validation was performed for the modified BEQ. Convergent validity was tested against Breast-Q (reduction) and discriminate validity was tested against the SF-36. Known-groups validation revealed significant differences between the normal population and patients undergoing breast reduction surgery. The BEQ showed good reliability by test-re-test analysis and high responsiveness. The modified BEQ may be reliable, valid and responsive instrument for assessing women who undergo breast reduction.
Reliability of Strength Testing using the Advanced Resistive Exercise Device and Free Weights

NASA Technical Reports Server (NTRS)

English, Kirk L.; Loehr, James A.; Laughlin, Mitzi A.; Lee, Stuart M. C.; Hagan, R. Donald

2008-01-01

The Advanced Resistive Exercise Device (ARED) was developed for use on the International Space Station as a countermeasure against muscle atrophy and decreased strength. This investigation examined the reliability of one-repetition maximum (1RM) strength testing using ARED and traditional free weight (FW) exercise. Methods: Six males (180.8 +/- 4.3 cm, 83.6 +/- 6.4 kg, 36 +/- 8 y, mean +/- SD) who had not engaged in resistive exercise for at least six months volunteered to participate in this project. Subjects completed four 1RM testing sessions each for FW and ARED (eight total sessions) using a balanced, randomized, crossover design. All testing using one device was completed before progressing to the other. During each session, 1RM was measured for the squat, heel raise, and deadlift exercises. Generalizability (G) and intraclass correlation coefficients (ICC) were calculated for each exercise on each device and were used to predict the number of sessions needed to obtain a reliable 1RM measurement (G . 0.90). Interclass reliability coefficients and Pearson's correlation coefficients (R) also were calculated for the highest 1RM value (1RM9sub peak)) obtained for each exercise on each device to quantify 1RM relationships between devices.
Hypothesis Testing Using Factor Score Regression: A Comparison of Four Methods

ERIC Educational Resources Information Center

Devlieger, Ines; Mayer, Axel; Rosseel, Yves

2016-01-01

In this article, an overview is given of four methods to perform factor score regression (FSR), namely regression FSR, Bartlett FSR, the bias avoiding method of Skrondal and Laake, and the bias correcting method of Croon. The bias correcting method is extended to include a reliable standard error. The four methods are compared with each other and…
Psychometric properties of Persian version of the Sustained Auditory Attention Capacity Test in children with attention deficit-hyperactivity disorder.

PubMed

Soltanparast, Sanaz; Jafari, Zahra; Sameni, Seyed Jalal; Salehi, Masoud

2014-01-01

The purpose of the present study was to evaluate the psychometric properties (validity and reliability) of the Persian version of the Sustained Auditory Attention Capacity Test in children with attention deficit hyperactivity disorder. The Persian version of the Sustained Auditory Attention Capacity Test was constructed to assess sustained auditory attention using the method provided by Feniman and colleagues (2007). In this test, comments were provided to assess the child's attentional deficit by determining inattention and impulsiveness error, the total scores of the sustained auditory attention capacity test and attention span reduction index. In the present study for determining the validity and reliability of in both Rey Auditory Verbal Learning test and the Persian version of the Sustained Auditory Attention Capacity Test (SAACT), 46 normal children and 41 children with Attention Deficit Hyperactivity (ADHD), all right-handed and aged between 7 and 11 of both genders, were evaluated. In determining convergent validity, a negative significant correlation was found between the three parts of the Rey Auditory Verbal Learning test (first, fifth, and immediate recall) and all indicators of the SAACT except attention span reduction. By comparing the test scores between the normal and ADHD groups, discriminant validity analysis showed significant differences in all indicators of the test except for attention span reduction (p< 0.001). The Persian version of the Sustained Auditory Attention Capacity test has good validity and reliability, that matches other reliable tests, and it can be used for the identification of children with attention deficits and if they suspected to have Attention Deficit Hyperactivity Disorder.
Measurement of glenohumeral joint translation using real-time ultrasound imaging: A physiotherapist and sonographer intra-rater and inter-rater reliability study.

PubMed

Rathi, Sangeeta; Taylor, Nicholas F; Gee, Jamie; Green, Rodney A

2016-12-01

Ultrasonography is an economical and non-invasive method for measuring real-time joint movements. Although physiotherapists are increasingly using ultrasound imaging for rotator cuff disorders, there is a lack of evidence on their reliability in using ultrasonography to measure glenohumeral translation. The aim of this study was to evaluate the reliability of a physiotherapist in measuring anterior and posterior glenohumeral joint translation with ultrasound. Study design: within day reliability. Anterior and posterior glenohumeral translations were measured at rest, in response to passive accessory motion testing force, and with isometric internal and external rotation in 12 young healthy adults. All the measurements were made in real time by a physiotherapist and an experienced sonographer in two positions (neutral and abducted) and in two views (anterior and posterior). Intra-rater and inter-rater reliability were expressed using intraclass correlation coefficients (ICC) and measurement error (mm). Intra-rater reliability was good for both raters (ICC P : 0.86-0.98; ICC S : 0.85-0.96). The inter-rater reliability between the physiotherapist and sonographer was moderate to good for posterior measurements (ICC 0.50-0.75) and poor to moderate for anterior measurements (ICC 0.31-0.53). For both intra-rater and inter-rater measurements, posterior translation was more reliable than the anterior translation with smaller measurement errors (posterior: 0.1-0.2 mm, anterior: 0.2-0.3 mm). A physiotherapist with minimal training was reliable in measuring glenohumeral joint translations. The ultrasound method was reliable for repeated measurement of both anterior and posterior glenohumeral translations with posterior measurements being more reliable than anterior. This method is recommended for future research to investigate the stabilising role of rotator cuff muscles. Copyright © 2016 Elsevier Ltd. All rights reserved.
Development of a reliable method to assess footwear comfort during running.

PubMed

Mündermann, Anne; Nigg, Benno M; Stefanyshyn, Darren J; Humble, R Neil

2002-08-01

The purposes of this study were: (a) to determine whether subjects are able to distinguish between differences in footwear with respect to footwear comfort; and (b) to determine how reliably footwear comfort can be assessed using a visual analogue scale (VAS) and a protocol including a control condition during running. Intraclass correlation coefficients (ICCs) between comfort ratings for repeated conditions were high (ICC = 0.799). Differences in comfort ratings between the insert conditions were significant. A paired t-test revealed a significant difference in overall comfort ratings for the control insert when tested after the soft insert compared to when tested after the hard insert (P = 0.008). The results of this study showed that VASs provide a reliable measure to assess footwear comfort during running under the conditions that: (a) a control condition is included; and (b) the average comfort rating of sessions 4-6 is used. Copyright 2002 Elsevier Science B.V.
Test-retest reliability of knee extensor rate of velocity and power development in older adults using the isotonic mode on a Biodex System 3 dynamometer.

PubMed

Van Driessche, Stijn; Van Roie, Evelien; Vanwanseele, Benedicte; Delecluse, Christophe

2018-01-01

Isotonic testing and measures of rapid power production are emerging as functionally relevant test methods for detection of muscle aging. Our objective was to assess reliability of rapid velocity and power measures in older adults using the isotonic mode of an isokinetic dynamometer. Sixty-three participants (aged 65 to 82 years) underwent a test-retest protocol with one week time interval. Isotonic knee extension tests were performed at four different loads: 0%, 25%, 50% and 75% of maximal isometric strength. Peak velocity (pV) and power (pP) were determined as the highest values of the velocity and power curve. Rate of velocity (RVD) and power development (RPD) were calculated as the linear slopes of the velocity- and power-time curve. Relative and absolute measures of test-retest reliability were analyzed using intraclass correlation coefficients (ICC), standard error of measurement (SEM) and Bland-Altman analyses. Overall, reliability was high for pV, pP, RVD and RPD at 0%, 25% and 50% load (ICC: .85 - .98, SEM: 3% - 10%). A trend for increased reliability at lower loads seemed apparent. The tests at 75% load led to range of motion failure and should be avoided. In addition, results demonstrated that caution is advised when interpreting early phase results (first 50ms). To conclude, our results support the use of the isotonic mode of an isokinetic dynamometer for testing rapid power and velocity characteristics in older adults, which is of high clinical relevance given that these muscle characteristics are emerging as the primary outcomes for preventive and rehabilitative interventions in aging research.
Ability of walking without a walking device in patients with spinal cord injury as determined using data from functional tests

PubMed Central

Poncumhak, Puttipong; Saengsuwan, Jiamjit; Amatachaya, Sugalya

2014-01-01

Background/Objectives More than half of independent ambulatory patients with spinal cord injury (SCI) need a walking device to promote levels of independence. However, long-lasting use of a walking device may introduce negative impacts for the patients. Using a standard objective test relating to the requirement of a walking device may offer a quantitative criterion to effectively monitor levels of independence of the patients. Therefore, this study investigated (1) ability of the three functional tests, including the five times sit-to-stand test (FTSST), timed up and go test (TUGT), and 10-meter walk test (10MWT) to determine the ability of walking without a walking device, and (2) the inter-tester reliability of the tests to assess functional ability in patients with SCI. Methods Sixty independent ambulatory patients with SCI, who walked with and without a walking device (30 subjects/group), were assessed cross-sectionally for their functional ability using the three tests. The first 20 subjects also participated in the inter-tester reliability test. Results The time required to complete the FTSST <14 seconds, the TUGT < 18 seconds, and the 10MWT < 6 seconds had good-to-excellent capability to determine the ability of walking without a walking device of subjects with SCI. These tests also showed excellent inter-tester reliability. Conclusions Methods of clinical evaluation for walking are likely performed using qualitative observation, which makes the results difficult to compare among testers and test intervals. Findings of this study offer a quantitative target criterion or a clear level of ability that patients with SCI could possibly walk without a walking device, which would benefit monitoring process for the patients. PMID:24621030

Initial Development and Validation of the BullyHARM: The Bullying, Harassment, and Aggression Receipt Measure.

PubMed

Hall, William J

2016-11-01

This article describes the development and preliminary validation of the Bullying, Harassment, and Aggression Receipt Measure (BullyHARM). The development of the BullyHARM involved a number of steps and methods, including a literature review, expert review, cognitive testing, readability testing, data collection from a large sample, reliability testing, and confirmatory factor analysis. A sample of 275 middle school students was used to examine the psychometric properties and factor structure of the BullyHARM, which consists of 22 items and 6 subscales: physical bullying, verbal bullying, social/relational bullying, cyber-bullying, property bullying, and sexual bullying. First-order and second-order factor models were evaluated. Results demonstrate that the first-order factor model had superior fit. Results of reliability testing indicate that the BullyHARM scale and subscales have very good internal consistency reliability. Findings indicate that the BullyHARM has good properties regarding content validation and respondent-related validation and is a promising instrument for measuring bullying victimization in school.
Thermal shock testing for assuring reliability of glass-sealed microelectronic packages

NASA Technical Reports Server (NTRS)

Thomas, Walter B., III; Lewis, Michael D.

1991-01-01

Tests were performed to determine if thermal shocking is destructive to glass-to-metal seal microelectronic packages and if thermal shock step stressing can compare package reliabilities. Thermal shocking was shown to be not destructive to highly reliable glass seals. Pin-pull tests used to compare the interfacial pin glass strengths showed no differences between thermal shocked and not-thermal shocked headers. A 'critical stress resistance temperature' was not exhibited by the 14 pin Dual In-line Package (DIP) headers evaluated. Headers manufactured in cryogenic nitrogen based and exothermically generated atmospheres showed differences in as-received leak rates, residual oxide depths and pin glass interfacial strengths; these were caused by the different manufacturing methods, in particular, by the chemically etched pins used by one manufacturer. Both header types passed thermal shock tests to temperature differentials of 646 C. The sensitivity of helium leak rate measurements was improved up to 70 percent by baking headers for two hours at 200 C after thermal shocking.
Initial Development and Validation of the BullyHARM: The Bullying, Harassment, and Aggression Receipt Measure

PubMed Central

Hall, William J.

2017-01-01

This article describes the development and preliminary validation of the Bullying, Harassment, and Aggression Receipt Measure (BullyHARM). The development of the BullyHARM involved a number of steps and methods, including a literature review, expert review, cognitive testing, readability testing, data collection from a large sample, reliability testing, and confirmatory factor analysis. A sample of 275 middle school students was used to examine the psychometric properties and factor structure of the BullyHARM, which consists of 22 items and 6 subscales: physical bullying, verbal bullying, social/relational bullying, cyber-bullying, property bullying, and sexual bullying. First-order and second-order factor models were evaluated. Results demonstrate that the first-order factor model had superior fit. Results of reliability testing indicate that the BullyHARM scale and subscales have very good internal consistency reliability. Findings indicate that the BullyHARM has good properties regarding content validation and respondent-related validation and is a promising instrument for measuring bullying victimization in school. PMID:28194041
GNSS Single Frequency, Single Epoch Reliable Attitude Determination Method with Baseline Vector Constraint.

PubMed

Gong, Ang; Zhao, Xiubin; Pang, Chunlei; Duan, Rong; Wang, Yong

2015-12-02

For Global Navigation Satellite System (GNSS) single frequency, single epoch attitude determination, this paper proposes a new reliable method with baseline vector constraint. First, prior knowledge of baseline length, heading, and pitch obtained from other navigation equipment or sensors are used to reconstruct objective function rigorously. Then, searching strategy is improved. It substitutes gradually Enlarged ellipsoidal search space for non-ellipsoidal search space to ensure correct ambiguity candidates are within it and make the searching process directly be carried out by least squares ambiguity decorrelation algorithm (LAMBDA) method. For all vector candidates, some ones are further eliminated by derived approximate inequality, which accelerates the searching process. Experimental results show that compared to traditional method with only baseline length constraint, this new method can utilize a priori baseline three-dimensional knowledge to fix ambiguity reliably and achieve a high success rate. Experimental tests also verify it is not very sensitive to baseline vector error and can perform robustly when angular error is not great.
Inter-rater reliability of three standardized functional tests in patients with low back pain

PubMed Central

Tidstrand, Johan; Horneij, Eva

2009-01-01

Background Of all patients with low back pain, 85% are diagnosed as "non-specific lumbar pain". Lumbar instability has been described as one specific diagnosis which several authors have described as delayed muscular responses, impaired postural control as well as impaired muscular coordination among these patients. This has mostly been measured and evaluated in a laboratory setting. There are few standardized and evaluated functional tests, examining functional muscular coordination which are also applicable in the non-laboratory setting. In ordinary clinical work, tests of functional muscular coordination should be easy to apply. The aim of this present study was to therefore standardize and examine the inter-rater reliability of three functional tests of muscular functional coordination of the lumbar spine in patients with low back pain. Methods Nineteen consecutive individuals, ten men and nine women were included. (Mean age 42 years, SD ± 12 yrs). Two independent examiners assessed three tests: "single limb stance", "sitting on a Bobath ball with one leg lifted" and "unilateral pelvic lift" on the same occasion. The standardization procedure took altered positions of the spine or pelvis and compensatory movements of the free extremities into account. The inter-rater reliability was analyzed by Cohen's kappa coefficient (κ) and by percentage agreement. Results The inter-rater reliability for the right and the left leg respectively was: for the single limb stance very good (κ: 0.88–1.0), for sitting on a Bobath ball good (κ: 0.79) and very good (κ: 0.88) and for the unilateral pelvic lift: good (κ: 0.61) and moderate (κ: 0.47). Conclusion The present study showed good to very good inter-rater reliability for two standardized tests, that is, the single-limb stance and sitting on a Bobath-ball with one leg lifted. Inter-rater reliability for the unilateral pelvic lift test was moderate to good. Validation of the tests in their ability to evaluate lumbar stability is required. PMID:19490644
Reliability of High-Voltage Tantalum Capacitors. Parts 3 and 4)

NASA Technical Reports Server (NTRS)

Teverovsky, Alexander

2010-01-01

Weibull grading test is a powerful technique that allows selection and reliability rating of solid tantalum capacitors for military and space applications. However, inaccuracies in the existing method and non-adequate acceleration factors can result in significant, up to three orders of magnitude, errors in the calculated failure rate of capacitors. This paper analyzes deficiencies of the existing technique and recommends more accurate method of calculations. A physical model presenting failures of tantalum capacitors as time-dependent-dielectric-breakdown is used to determine voltage and temperature acceleration factors and select adequate Weibull grading test conditions. This model is verified by highly accelerated life testing (HALT) at different temperature and voltage conditions for three types of solid chip tantalum capacitors. It is shown that parameters of the model and acceleration factors can be calculated using a general log-linear relationship for the characteristic life with two stress levels.
A Reliable Method to Measure Lip Height Using Photogrammetry in Unilateral Cleft Lip Patients.

PubMed

van der Zeeuw, Frederique; Murabit, Amera; Volcano, Johnny; Torensma, Bart; Patel, Brijesh; Hay, Norman; Thorburn, Guy; Morris, Paul; Sommerlad, Brian; Gnarra, Maria; van der Horst, Chantal; Kangesu, Loshan

2015-09-01

There is still no reliable tool to determine the outcome of the repaired unilateral cleft lip (UCL). The aim of this study was therefore to develop an accurate, reliable tool to measure vertical lip height from photographs. The authors measured the vertical height of the cutaneous and vermilion parts of the lip in 72 anterior-posterior view photographs of 17 patients with repairs to a UCL. Points on the lip's white roll and vermillion were marked on both the cleft and the noncleft sides on each image. Two new concepts were tested. First, photographs were standardized using the horizontal (medial to lateral) eye fissure width (EFW) for calibration. Second, the authors tested the interpupillary line (IPL) and the alar base line (ABL) for their reliability as horizontal lines of reference. Measurements were taken by 2 independent researchers, at 2 different time points each. Overall 2304 data points were obtained and analyzed. Results showed that the method was very effective in measuring the height of the lip on the cleft side with the noncleft side. When using the IPL, inter- and intra-rater reliability was 0.99 to 1.0, with the ABL it varied from 0.91 to 0.99 with one exception at 0.84. The IPL was easier to define because in some subjects the overhanging nasal tip obscured the alar base and gave more consistent measurements possibly because the reconstructed alar base was sometimes indistinct. However, measurements from the IPL can only give the percentage difference between the left and right sides of the lip, whereas those from the ABL can also give exact measurements. Patient examples were given that show how the measurements correlate with clinical assessment. The authors propose this method of photogrammetry with the innovative use of the IPL as a reliable horizontal plane and use of the EFW for calibration as a useful and reliable tool to assess the outcome of UCL repair.
A Comprehensive Critique and Review of Published Measures of Acne Severity

PubMed Central

Furber, Gareth; Leach, Matthew; Segal, Leonie

2016-01-01

Objective: Acne vulgaris is a dynamic, complex condition that is notoriously difficult to evaluate. The authors set out to critically evaluate currently available measures of acne severity, particularly in terms of suitability for use in clinical trials. Design: A systematic review was conducted to identify methods used to measure acne severity, using MEDLINE, CINAHL, Scopus, and Wiley Online. Each method was critically reviewed and given a score out of 13 based on eight quality criteria under two broad groupings of psychometric testing and suitability for research and evaluation. Results: Twenty-four methods for assessing acne severity were identified. Four scales received a quality score of zero, and 11 scored ≤3. The highest rated scales achieved a total score of 6. Six scales reported strong inter-rater reliability (ICC>0.75), and four reported strong intra-rater reliability (ICC>0.75). The poor overall performance of most scales, largely characterized by the absence of reliability testing or evidence for independent assessment and validation indicates that generally, their application in clinical trials is not supported. Conclusion: This review and appraisal of instruments for measuring acne severity supports previously identified concerns regarding the quality of published measures. It highlights the need for a valid and reliable acne severity scale, especially for use in research and evaluation. The ideal scale would demonstrate adequate validation and reliability and be easily implemented for third-party analysis. The development of such a scale is critical to interpreting results of trials and facilitating the pooling of results for systematic reviews and meta-analyses. PMID:27672410
Lifetime validation of high-reliability (>30,000hr) rotary cryocoolers for specific customer profiles

NASA Astrophysics Data System (ADS)

Cauquil, Jean-Marc; Seguineau, Cédric; Vasse, Christophe; Raynal, Gaetan; Benschop, Tonny

2018-05-01

The cooler reliability is a major performance requested by the customers, especially for 24h/24h applications, which are a growing market. Thales has built a reliability policy based on accelerate ageing and tests to establish a robust knowledge on acceleration factors. The current trend seems to prove that the RM2 mean time to failure is now higher than 30,000hr. Even with accelerate ageing; the reliability growth becomes hardly manageable for such large figures. The paper focuses on these figures and comments the robustness of such a method when projections over 30,000hr of MTTF are needed.
Reliability Assessment for Low-cost Unmanned Aerial Vehicles

NASA Astrophysics Data System (ADS)

Freeman, Paul Michael

Existing low-cost unmanned aerospace systems are unreliable, and engineers must blend reliability analysis with fault-tolerant control in novel ways. This dissertation introduces the University of Minnesota unmanned aerial vehicle flight research platform, a comprehensive simulation and flight test facility for reliability and fault-tolerance research. An industry-standard reliability assessment technique, the failure modes and effects analysis, is performed for an unmanned aircraft. Particular attention is afforded to the control surface and servo-actuation subsystem. Maintaining effector health is essential for safe flight; failures may lead to loss of control incidents. Failure likelihood, severity, and risk are qualitatively assessed for several effector failure modes. Design changes are recommended to improve aircraft reliability based on this analysis. Most notably, the control surfaces are split, providing independent actuation and dual-redundancy. The simulation models for control surface aerodynamic effects are updated to reflect the split surfaces using a first-principles geometric analysis. The failure modes and effects analysis is extended by using a high-fidelity nonlinear aircraft simulation. A trim state discovery is performed to identify the achievable steady, wings-level flight envelope of the healthy and damaged vehicle. Tolerance of elevator actuator failures is studied using familiar tools from linear systems analysis. This analysis reveals significant inherent performance limitations for candidate adaptive/reconfigurable control algorithms used for the vehicle. Moreover, it demonstrates how these tools can be applied in a design feedback loop to make safety-critical unmanned systems more reliable. Control surface impairments that do occur must be quickly and accurately detected. This dissertation also considers fault detection and identification for an unmanned aerial vehicle using model-based and model-free approaches and applies those algorithms to experimental faulted and unfaulted flight test data. Flight tests are conducted with actuator faults that affect the plant input and sensor faults that affect the vehicle state measurements. A model-based detection strategy is designed and uses robust linear filtering methods to reject exogenous disturbances, e.g. wind, while providing robustness to model variation. A data-driven algorithm is developed to operate exclusively on raw flight test data without physical model knowledge. The fault detection and identification performance of these complementary but different methods is compared. Together, enhanced reliability assessment and multi-pronged fault detection and identification techniques can help to bring about the next generation of reliable low-cost unmanned aircraft.
Measuring eating disorder attitudes and behaviors: a reliability generalization study

PubMed Central

2014-01-01

Background Although score reliability is a sample-dependent characteristic, researchers often only report reliability estimates from previous studies as justification for employing particular questionnaires in their research. The present study followed reliability generalization procedures to determine the mean score reliability of the Eating Disorder Inventory and its most commonly employed subscales (Drive for Thinness, Bulimia, and Body Dissatisfaction) and the Eating Attitudes Test as a way to better identify those characteristics that might impact score reliability. Methods Published studies that used these measures were coded based on their reporting of reliability information and additional study characteristics that might influence score reliability. Results Score reliability estimates were included in 26.15% of studies using the EDI and 36.28% of studies using the EAT. Mean Cronbach’s alphas for the EDI (total score = .91; subscales = .75 to .89), EAT-40 (total score = .81) and EAT-26 (total score = .86; subscales = .56 to .80) suggested variability in estimated internal consistency. Whereas some EDI subscales exhibited higher score reliability in clinical eating disorder samples than in nonclinical samples, other subscales did not exhibit these differences. Score reliability information for the EAT was primarily reported for nonclinical samples, making it difficult to characterize the effect of type of sample on these measures. However, there was a tendency for mean score reliability to be higher in the adult (vs. adolescent) samples and in female (vs. male) samples. Conclusions Overall, this study highlights the importance of assessing and reporting internal consistency during every test administration because reliability is affected by characteristics of the participants being examined. PMID:24764530
A method for recording verbal behavior in free-play settings1

PubMed Central

Nordquist, Vey M.

1971-01-01

The present study attempted to test the reliability of a new method of recording verbal behavior in a free-play preschool setting. Six children, three normal and three speech impaired, served as subjects. Videotaped records of verbal behavior were scored by two experimentally naive observers. The results suggest that the system provides a means of obtaining reliable records of both normal and impaired speech, even when the subjects exhibit nonverbal behaviors (such as hyperactivity) that interfere with direct observation techniques. ImagesFig. 1Fig. 2 PMID:16795310
Methods for assessing reliability and validity for a measurement tool: a case study and critique using the WHO haemoglobin colour scale.

PubMed

White, Sarah A; van den Broek, Nynke R

2004-05-30

Before introducing a new measurement tool it is necessary to evaluate its performance. Several statistical methods have been developed, or used, to evaluate the reliability and validity of a new assessment method in such circumstances. In this paper we review some commonly used methods. Data from a study that was conducted to evaluate the usefulness of a specific measurement tool (the WHO Colour Scale) is then used to illustrate the application of these methods. The WHO Colour Scale was developed under the auspices of the WHO to provide a simple portable and reliable method of detecting anaemia. This Colour Scale is a discrete interval scale, whereas the actual haemoglobin values it is used to estimate are on a continuous interval scale and can be measured accurately using electrical laboratory equipment. The methods we consider are: linear regression, correlation coefficients, paired t-tests plotting differences against mean values and deriving limits of agreement; kappa and weighted kappa statistics, sensitivity and specificity, an intraclass correlation coefficient and the repeatability coefficient. We note that although the definition and properties of each of these methods is well established inappropriate methods continue to be used in medical literature for assessing reliability and validity, as evidenced in the context of the evaluation of the WHO Colour Scale. Copyright 2004 John Wiley & Sons, Ltd.
What is the best method for assessing lower limb force-velocity relationship?

PubMed

Giroux, C; Rabita, G; Chollet, D; Guilhem, G

2015-02-01

This study determined the concurrent validity and reliability of force, velocity and power measurements provided by accelerometry, linear position transducer and Samozino's methods, during loaded squat jumps. 17 subjects performed squat jumps on 2 separate occasions in 7 loading conditions (0-60% of the maximal concentric load). Force, velocity and power patterns were averaged over the push-off phase using accelerometry, linear position transducer and a method based on key positions measurements during squat jump, and compared to force plate measurements. Concurrent validity analyses indicated very good agreement with the reference method (CV=6.4-14.5%). Force, velocity and power patterns comparison confirmed the agreement with slight differences for high-velocity movements. The validity of measurements was equivalent for all tested methods (r=0.87-0.98). Bland-Altman plots showed a lower agreement for velocity and power compared to force. Mean force, velocity and power were reliable for all methods (ICC=0.84-0.99), especially for Samozino's method (CV=2.7-8.6%). Our findings showed that present methods are valid and reliable in different loading conditions and permit between-session comparisons and characterization of training-induced effects. While linear position transducer and accelerometer allow for examining the whole time-course of kinetic patterns, Samozino's method benefits from a better reliability and ease of processing. © Georg Thieme Verlag KG Stuttgart · New York.
Validity and Reliability of the Italian Version of the Functioning Assessment Short Test (FAST) in Bipolar Disorder

PubMed Central

Moro, Maria Francesca; Colom, Francesc; Floris, Francesca; Pintus, Elisa; Pintus, Mirra; Contini, Francesca; Carta, Mauro Giovanni

2012-01-01

Background: Functioning Assessment Short Test (FAST) is a brief instrument designed to assess the main functioning problems experienced by psychiatric patients, specifically bipolar patients. It includes 24 items assessing impairment or disability in six domains of functioning: autonomy, occupational functioning, cognitive functioning, financial issues, interpersonal relationships and leisure time. The aim of this study is to measure the validity and reliability of the Italian version of this instrument. Methods: Twenty-four patients with DSM-IV TR bipolar disorder and 20 healthy controls were recruited and evaluated in three private clinics in Cagliari (Sardinia, Italy). The psychometric properties of FAST (feasibility, internal consistency, concurrent validity, discriminant validity (patients vs controls and eutimic patients vs manic and depressed), and test-retest reliability were analyzed. Results: The internal consistency obtained was very high with a Cronbach's alpha of 0.955. A highly significant negative correlation with GAF was obtained (r = -0.9; p < 0.001) pointing to a reasonable degree of concurrent validity. FAST show a good test-retest reliability between two independent evaluation differing of one week (mean K =0.73). The total FAST scores were lower in controls as compared with Bipolar Patients and in Euthimic patients compared with Depressed or Manic. Conclusion: The Italian version of the FAST showed similar psychometrics properties as far as regard internal consistency and discriminant validity of the original version and show a good test retest reliability measure by means of K statistics. PMID:22905035
ASR testing : a new approach to aggregate classification and mix design verification : technical report.

DOT National Transportation Integrated Search

2014-04-01

The main objective of this study was to develop a fast, reliable test method to determine the aggregate alkali-silica reactivity : (ASR) with respect to the overall alkalinity of the concrete. A volumetric change measuring device (VCMD) developed at ...
75 FR 29908 - Prothioconazole; Pesticide Tolerances

Federal Register 2010, 2011, 2012, 2013, 2014

2010-05-28

....gpoaccess.gov/ecfr . To access the harmonized test guidelines referenced in this document electronically, please go http://www.epa.gov/ocspp and select ``Test Methods and Guidelines.'' C. Can I File an Objection... dietary exposures and all other exposures for which there is reliable information.'' This includes...
A Practical Method for Identifying Significant Change Scores

ERIC Educational Resources Information Center

Cascio, Wayne F.; Kurtines, William M.

1977-01-01

A test of significance for identifying individuals who are most influenced by an experimental treatment as measured by pre-post test change score is presented. The technique requires true difference scores, the reliability of obtained differences, and their standard error of measurement. (Author/JKS)
Validity and Reliability of Wii Fit Balance Board for the Assessment of Balance of Healthy Young Adults and the Elderly

PubMed Central

Chang, Wen-Dien; Chang, Wan-Yi; Lee, Chia-Lun; Feng, Chi-Yen

2013-01-01

[Purpose] Balance is an integral part of human ability. The smart balance master system (SBM) is a balance test instrument with good reliability and validity, but it is expensive. Therefore, we modified a Wii Fit balance board, which is a convenient balance assessment tool, and analyzed its reliability and validity. [Subjects and Methods] We recruited 20 healthy young adults and 20 elderly people, and administered 3 balance tests. The correlation coefficient and intraclass correlation of both instruments were analyzed. [Results] There were no statistically significant differences in the 3 tests between the Wii Fit balance board and the SBM. The Wii Fit balance board had a good intraclass correlation (0.86–0.99) for the elderly people and positive correlations (r = 0.58–0.86) with the SBM. [Conclusions] The Wii Fit balance board is a balance assessment tool with good reliability and high validity for elderly people, and we recommend it as an alternative tool for assessing balance ability. PMID:24259769
Reliability analysis of instrument design of noninvasive bone marrow disease detector

NASA Astrophysics Data System (ADS)

Su, Yu; Li, Ting; Sun, Yunlong

2016-02-01

Bone marrow is an important hematopoietic organ, and bone marrow lesions (BMLs) may cause a variety of complications with high death rate and short survival time. Early detection and follow up care are particularly important. But the current diagnosis methods rely on bone marrow biopsy/puncture, with significant limitations such as invasion, complex operation, high risk, and discontinuous. It is highly in need of a non-invasive, safe, easily operated, and continuous monitoring technology. So we proposed to design a device aimed for detecting bone marrow lesions, which was based on near infrared spectrum technology. Then we fully tested its reliabilities, including the sensitivity, specificity, signal-to-noise ratio (SNR), stability, and etc. Here, we reported this sequence of reliability test experiments, the experimental results, and the following data analysis. This instrument was shown to be very sensitive, with distinguishable concentration less than 0.002 and with good linearity, stability and high SNR. Finally, these reliability-test data supported the promising clinical diagnosis and surgery guidance of our novel instrument in detection of BMLs.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.