The Americleft Speech Project: A Training and Reliability Study.
Chapman, Kathy L; Baylis, Adriane; Trost-Cardamone, Judith; Cordero, Kelly Nett; Dixon, Angela; Dobbelsteyn, Cindy; Thurmes, Anna; Wilson, Kristina; Harding-Bell, Anne; Sweeney, Triona; Stoddard, Gregory; Sell, Debbie
2016-01-01
To describe the results of two reliability studies and to assess the effect of training on interrater reliability scores. The first study (1) examined interrater and intrarater reliability scores (weighted and unweighted kappas) and (2) compared interrater reliability scores before and after training on the use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A) with British English-speaking children. The second study examined interrater and intrarater reliability on a modified version of the CAPS-A (CAPS-A Americleft Modification) with American and Canadian English-speaking children. Finally, comparisons were made between the interrater and intrarater reliability scores obtained for Study 1 and Study 2. The participants were speech-language pathologists from the Americleft Speech Project. In Study 1, interrater reliability scores improved for 6 of the 13 parameters following training on the CAPS-A protocol. Comparison of the reliability results for the two studies indicated lower scores for Study 2 compared with Study 1. However, this appeared to be an artifact of the kappa statistic that occurred due to insufficient variability in the reliability samples for Study 2. When percent agreement scores were also calculated, the ratings appeared similar across Study 1 and Study 2. The findings of this study suggested that improvements in interrater reliability could be obtained following a program of systematic training. However, improvements were not uniform across all parameters. Acceptable levels of reliability were achieved for those parameters most important for evaluation of velopharyngeal function.
The Americleft Speech Project: A Training and Reliability Study
Chapman, Kathy L.; Baylis, Adriane; Trost-Cardamone, Judith; Cordero, Kelly Nett; Dixon, Angela; Dobbelsteyn, Cindy; Thurmes, Anna; Wilson, Kristina; Harding-Bell, Anne; Sweeney, Triona; Stoddard, Gregory; Sell, Debbie
2017-01-01
Objective To describe the results of two reliability studies and to assess the effect of training on interrater reliability scores. Design The first study (1) examined interrater and intrarater reliability scores (weighted and unweighted kappas) and (2) compared interrater reliability scores before and after training on the use of the Cleft Audit Protocol for Speech–Augmented (CAPS-A) with British English-speaking children. The second study examined interrater and intrarater reliability on a modified version of the CAPS-A (CAPS-A Americleft Modification) with American and Canadian English-speaking children. Finally, comparisons were made between the interrater and intrarater reliability scores obtained for Study 1 and Study 2. Participants The participants were speech-language pathologists from the Americleft Speech Project. Results In Study 1, interrater reliability scores improved for 6 of the 13 parameters following training on the CAPS-A protocol. Comparison of the reliability results for the two studies indicated lower scores for Study 2 compared with Study 1. However, this appeared to be an artifact of the kappa statistic that occurred due to insufficient variability in the reliability samples for Study 2. When percent agreement scores were also calculated, the ratings appeared similar across Study 1 and Study 2. Conclusion The findings of this study suggested that improvements in interrater reliability could be obtained following a program of systematic training. However, improvements were not uniform across all parameters. Acceptable levels of reliability were achieved for those parameters most important for evaluation of velopharyngeal function. PMID:25531738
Reliability studies of diagnostic methods in Indian traditional Ayurveda medicine: An overview
Kurande, Vrinda Hitendra; Waagepetersen, Rasmus; Toft, Egon; Prasad, Ramjee
2013-01-01
Recently, a need to develop supportive new scientific evidence for contemporary Ayurveda has emerged. One of the research objectives is an assessment of the reliability of diagnoses and treatment. Reliability is a quantitative measure of consistency. It is a crucial issue in classification (such as prakriti classification), method development (pulse diagnosis), quality assurance for diagnosis and treatment and in the conduct of clinical studies. Several reliability studies are conducted in western medicine. The investigation of the reliability of traditional Chinese, Japanese and Sasang medicine diagnoses is in the formative stage. However, reliability studies in Ayurveda are in the preliminary stage. In this paper, examples are provided to illustrate relevant concepts of reliability studies of diagnostic methods and their implication in practice, education, and training. An introduction to reliability estimates and different study designs and statistical analysis is given for future studies in Ayurveda. PMID:23930037
The reliability of a quality appraisal tool for studies of diagnostic reliability (QAREL).
Lucas, Nicholas; Macaskill, Petra; Irwig, Les; Moran, Robert; Rickards, Luke; Turner, Robin; Bogduk, Nikolai
2013-09-09
The aim of this project was to investigate the reliability of a new 11-item quality appraisal tool for studies of diagnostic reliability (QAREL). The tool was tested on studies reporting the reliability of any physical examination procedure. The reliability of physical examination is a challenging area to study given the complex testing procedures, the range of tests, and lack of procedural standardisation. Three reviewers used QAREL to independently rate 29 articles, comprising 30 studies, published during 2007. The articles were identified from a search of relevant databases using the following string: "Reproducibility of results (MeSH) OR reliability (t.w.) AND Physical examination (MeSH) OR physical examination (t.w.)." A total of 415 articles were retrieved and screened for inclusion. The reviewers undertook an independent trial assessment prior to data collection, followed by a general discussion about how to score each item. At no time did the reviewers discuss individual papers. Reliability was assessed for each item using multi-rater kappa (κ). Multi-rater reliability estimates ranged from κ = 0.27 to 0.92 across all items. Six items were recorded with good reliability (κ > 0.60), three with moderate reliability (κ = 0.41 - 0.60), and two with fair reliability (κ = 0.21 - 0.40). Raters found it difficult to agree about the spectrum of patients included in a study (Item 1) and the correct application and interpretation of the test (Item 10). In this study, we found that QAREL was a reliable assessment tool for studies of diagnostic reliability when raters agreed upon criteria for the interpretation of each item. Nine out of 11 items had good or moderate reliability, and two items achieved fair reliability. The heterogeneity in the tests included in this study may have resulted in an underestimation of the reliability of these two items. We discuss these and other factors that could affect our results and make recommendations for the use of QAREL.
Takasaki, Hiroshi; Okuyama, Kousuke; Rosedale, Richard
2017-02-01
Mechanical Diagnosis and Therapy (MDT) is used in the treatment of extremity problems. Classifying clinical problems is one method of providing effective treatment to a target population. Classification reliability is a key factor to determine the precise clinical problem and to direct an appropriate intervention. To explore inter-examiner reliability of the MDT classification for extremity problems in three reliability designs: 1) vignette reliability using surveys with patient vignettes, 2) concurrent reliability, where multiple assessors decide a classification by observing someone's assessment, 3) successive reliability, where multiple assessors independently assess the same patient at different times. Systematic review with data synthesis in a quantitative format. Agreement of MDT subgroups was examined using the Kappa value, with the operational definition of acceptable reliability set at ≥ 0.6. The level of evidence was determined considering the methodological quality of the studies. Six studies were included and all studies met the criteria for high quality. Kappa values for the vignette reliability design (five studies) were ≥ 0.7. There was data from two cohorts in one study for the concurrent reliability design and the Kappa values ranged from 0.45 to 1.0. Kappa values for the successive reliability design (data from three cohorts in one study) were < 0.6. The current review found strong evidence of acceptable inter-examiner reliability of MDT classification for extremity problems in the vignette reliability design, limited evidence of acceptable reliability in the concurrent reliability design and unacceptable reliability in the successive reliability design. Copyright © 2017 Elsevier Ltd. All rights reserved.
The reliability of the Glasgow Coma Scale: a systematic review.
Reith, Florence C M; Van den Brande, Ruben; Synnot, Anneliese; Gruen, Russell; Maas, Andrew I R
2016-01-01
The Glasgow Coma Scale (GCS) provides a structured method for assessment of the level of consciousness. Its derived sum score is applied in research and adopted in intensive care unit scoring systems. Controversy exists on the reliability of the GCS. The aim of this systematic review was to summarize evidence on the reliability of the GCS. A literature search was undertaken in MEDLINE, EMBASE and CINAHL. Observational studies that assessed the reliability of the GCS, expressed by a statistical measure, were included. Methodological quality was evaluated with the consensus-based standards for the selection of health measurement instruments checklist and its influence on results considered. Reliability estimates were synthesized narratively. We identified 52 relevant studies that showed significant heterogeneity in the type of reliability estimates used, patients studied, setting and characteristics of observers. Methodological quality was good (n = 7), fair (n = 18) or poor (n = 27). In good quality studies, kappa values were ≥0.6 in 85%, and all intraclass correlation coefficients indicated excellent reliability. Poor quality studies showed lower reliability estimates. Reliability for the GCS components was higher than for the sum score. Factors that may influence reliability include education and training, the level of consciousness and type of stimuli used. Only 13% of studies were of good quality and inconsistency in reported reliability estimates was found. Although the reliability was adequate in good quality studies, further improvement is desirable. From a methodological perspective, the quality of reliability studies needs to be improved. From a clinical perspective, a renewed focus on training/education and standardization of assessment is required.
Measuring eating disorder attitudes and behaviors: a reliability generalization study
2014-01-01
Background Although score reliability is a sample-dependent characteristic, researchers often only report reliability estimates from previous studies as justification for employing particular questionnaires in their research. The present study followed reliability generalization procedures to determine the mean score reliability of the Eating Disorder Inventory and its most commonly employed subscales (Drive for Thinness, Bulimia, and Body Dissatisfaction) and the Eating Attitudes Test as a way to better identify those characteristics that might impact score reliability. Methods Published studies that used these measures were coded based on their reporting of reliability information and additional study characteristics that might influence score reliability. Results Score reliability estimates were included in 26.15% of studies using the EDI and 36.28% of studies using the EAT. Mean Cronbach’s alphas for the EDI (total score = .91; subscales = .75 to .89), EAT-40 (total score = .81) and EAT-26 (total score = .86; subscales = .56 to .80) suggested variability in estimated internal consistency. Whereas some EDI subscales exhibited higher score reliability in clinical eating disorder samples than in nonclinical samples, other subscales did not exhibit these differences. Score reliability information for the EAT was primarily reported for nonclinical samples, making it difficult to characterize the effect of type of sample on these measures. However, there was a tendency for mean score reliability to be higher in the adult (vs. adolescent) samples and in female (vs. male) samples. Conclusions Overall, this study highlights the importance of assessing and reporting internal consistency during every test administration because reliability is affected by characteristics of the participants being examined. PMID:24764530
Du, Han; Wang, Lijuan
2018-04-23
Intraindividual variability can be measured by the intraindividual standard deviation ([Formula: see text]), intraindividual variance ([Formula: see text]), estimated hth-order autocorrelation coefficient ([Formula: see text]), and mean square successive difference ([Formula: see text]). Unresolved issues exist in the research on reliabilities of intraindividual variability indicators: (1) previous research only studied conditions with 0 autocorrelations in the longitudinal responses; (2) the reliabilities of [Formula: see text] and [Formula: see text] have not been studied. The current study investigates reliabilities of [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and the intraindividual mean, with autocorrelated longitudinal data. Reliability estimates of the indicators were obtained through Monte Carlo simulations. The impact of influential factors on reliabilities of the intraindividual variability indicators is summarized, and the reliabilities are compared across the indicators. Generally, all the studied indicators of intraindividual variability were more reliable with a more reliable measurement scale and more assessments. The reliabilities of [Formula: see text] were generally lower than those of [Formula: see text] and [Formula: see text], the reliabilities of [Formula: see text] were usually between those of [Formula: see text] and [Formula: see text] unless the scale reliability was large and/or the interindividual standard deviation in autocorrelation coefficients was large, and the reliabilities of the intraindividual mean were generally the highest. An R function is provided for planning longitudinal studies to ensure sufficient reliabilities of the intraindividual indicators are achieved.
ERIC Educational Resources Information Center
Lane, Ginny G.; White, Amy E.; Henson, Robin K.
2002-01-01
Conducted a reliability generalizability study on the Coopersmith Self-Esteem Inventory (CSEI; S. Coopersmith, 1967) to examine the variability of reliability estimates across studies and to identify study characteristics that may predict this variability. Results show that reliability for CSEI scores can vary considerably, especially at the…
Park, Ji Eun; Han, Kyunghwa; Sung, Yu Sub; Chung, Mi Sun; Koo, Hyun Jung; Yoon, Hee Mang; Choi, Young Jun; Lee, Seung Soo; Kim, Kyung Won; Shin, Youngbin; An, Suah; Cho, Hyo-Min
2017-01-01
Objective To evaluate the frequency and adequacy of statistical analyses in a general radiology journal when reporting a reliability analysis for a diagnostic test. Materials and Methods Sixty-three studies of diagnostic test accuracy (DTA) and 36 studies reporting reliability analyses published in the Korean Journal of Radiology between 2012 and 2016 were analyzed. Studies were judged using the methodological guidelines of the Radiological Society of North America-Quantitative Imaging Biomarkers Alliance (RSNA-QIBA), and COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative. DTA studies were evaluated by nine editorial board members of the journal. Reliability studies were evaluated by study reviewers experienced with reliability analysis. Results Thirty-one (49.2%) of the 63 DTA studies did not include a reliability analysis when deemed necessary. Among the 36 reliability studies, proper statistical methods were used in all (5/5) studies dealing with dichotomous/nominal data, 46.7% (7/15) of studies dealing with ordinal data, and 95.2% (20/21) of studies dealing with continuous data. Statistical methods were described in sufficient detail regarding weighted kappa in 28.6% (2/7) of studies and regarding the model and assumptions of intraclass correlation coefficient in 35.3% (6/17) and 29.4% (5/17) of studies, respectively. Reliability parameters were used as if they were agreement parameters in 23.1% (3/13) of studies. Reproducibility and repeatability were used incorrectly in 20% (3/15) of studies. Conclusion Greater attention to the importance of reporting reliability, thorough description of the related statistical methods, efforts not to neglect agreement parameters, and better use of relevant terminology is necessary. PMID:29089821
ERIC Educational Resources Information Center
Fazeli, Seyed Hossein
2010-01-01
The purpose of research described in the current study is the psychological reliability, its importance, application, and more to investigate on the impact analysis of psychological reliability of population pilot study for selection of particular reliable multi-choice item test in foreign language research work. The population for subject…
The Balanced Inventory of Desirable Responding (BIDR): A Reliability Generalization Study
ERIC Educational Resources Information Center
Li, Andrew; Bagger, Jessica
2007-01-01
The Balanced Inventory of Desirable Responding (BIDR) is one of the most widely used social desirability scales. The authors conducted a reliability generalization study to examine the typical reliability coefficients of BIDR scores and explored factors that explained the variability of reliability estimates across studies. The results indicated…
Lange, Toni; Freiberg, Alice; Dröge, Patrik; Lützner, Jörg; Schmitt, Jochen; Kopkow, Christian
2015-06-01
Systematic literature review. Despite their frequent application in routine care, a systematic review on the reliability of clinical examination tests to evaluate the integrity of the ACL is missing. To summarize and evaluate intra- and interrater reliability research on physical examination tests used for the diagnosis of ACL tears. A comprehensive systematic literature search was conducted in MEDLINE, EMBASE and AMED until May 30th 2013. Studies were included if they assessed the intra- and/or interrater reliability of physical examination tests for the integrity of the ACL. Methodological quality was evaluated with the Quality Appraisal of Reliability Studies (QAREL) tool by two independent reviewers. 110 hits were achieved of which seven articles finally met the inclusion criteria. These studies examined the reliability of four physical examination tests. Intrarater reliability was assessed in three studies and ranged from fair to almost perfect (Cohen's k = 0.22-1.00). Interrater reliability was assessed in all included studies and ranged from slight to almost perfect (Cohen's k = 0.02-0.81). The Lachman test is the physical tests with the highest intrarater reliability (Cohen's k = 1.00), the Lachman test performed in prone position the test with the highest interrater reliability (Cohen's k = 0.81). Included studies were partly of low methodological quality. A meta-analysis could not be performed due to the heterogeneity in study populations, reliability measures and methodological quality of included studies. Systematic investigations on the reliability of physical examination tests to assess the integrity of the ACL are scarce and of varying methodological quality. Copyright © 2014 Elsevier Ltd. All rights reserved.
General Aviation Aircraft Reliability Study
NASA Technical Reports Server (NTRS)
Pettit, Duane; Turnbull, Andrew; Roelant, Henk A. (Technical Monitor)
2001-01-01
This reliability study was performed in order to provide the aviation community with an estimate of Complex General Aviation (GA) Aircraft System reliability. To successfully improve the safety and reliability for the next generation of GA aircraft, a study of current GA aircraft attributes was prudent. This was accomplished by benchmarking the reliability of operational Complex GA Aircraft Systems. Specifically, Complex GA Aircraft System reliability was estimated using data obtained from the logbooks of a random sample of the Complex GA Aircraft population.
Gorgos, Kara S; Wasylyk, Nicole T; Van Lunen, Bonnie L; Hoch, Matthew C
2014-04-01
Joint mobilizations are commonly used by clinicians to decrease pain and restore joint arthrokinematics following musculoskeletal injury. The force applied during a joint mobilization treatment is subjective to the individual clinician but may have an effect on patient outcomes. The purpose of this systematic review was to critically appraise and synthesize the studies which examined the reliability of clinicians' force application during joint mobilization. A systematic search of PubMed and EBSCO Host databases from inception to March 1, 2013 was conducted to identify studies assessing the reliability of force application during joint mobilizations. Two reviewers utilized the Quality Appraisal of Reliability Studies (QAREL) assessment tool to determine the quality of included studies. The relative reliability of the included studies was examined through intraclass correlation coefficients (ICC) to synthesize study findings. All results were collated qualitatively with a level of evidence approach. A total of seven studies met the eligibility and were included. Five studies were included that assessed inter-clinician reliability, and six studies were included that assessed intra-clinician reliability. The overall level of evidence for inter-clinician reliability was strong for poor-to-moderate reliability (ICC = -0.04 to 0.70). The overall level of evidence for intra-clinician reliability was strong for good reliability (ICC = 0.75-0.99). This systematic review indicates there is variability in force application between clinicians but individual clinicians apply forces consistently. The results of this systematic review suggest innovative instructional methods are needed to improve consistency and validate the forces applied during of joint mobilization treatments. This is particularly evident for improving the consistency of force application across clinicians. Copyright © 2014 Elsevier Ltd. All rights reserved.
Citronberg, Jessica S; Wilkens, Lynne R; Lim, Unhee; Hullar, Meredith A J; White, Emily; Newcomb, Polly A; Le Marchand, Loïc; Lampe, Johanna W
2016-09-01
Plasma lipopolysaccharide-binding protein (LBP), a measure of internal exposure to bacterial lipopolysaccharide, has been associated with several chronic conditions and may be a marker of chronic inflammation; however, no studies have examined the reliability of this biomarker in a healthy population. We examined the temporal reliability of LBP measured in archived samples from participants in two studies. In Study one, 60 healthy participants had blood drawn at two time points: baseline and follow-up (either three, six, or nine months). In Study two, 24 individuals had blood drawn three to four times over a seven-month period. We measured LBP in archived plasma by ELISA. Test-retest reliability was estimated by calculating the intraclass correlation coefficient (ICC). Plasma LBP concentrations showed moderate reliability in Study one (ICC 0.60, 95 % CI 0.43-0.75) and Study two (ICC 0.46, 95 % CI 0.26-0.69). Restricting the follow-up period improved reliability. In Study one, the reliability of LBP over a three-month period was 0.68 (95 % CI: 0.41-0.87). In Study two, the ICC of samples taken ≤seven days apart was 0.61 (95 % CI 0.29-0.86). Plasma LBP concentrations demonstrated moderate test-retest reliability in healthy individuals with reliability improving over a shorter follow-up period.
Reliability of intracerebral hemorrhage classification systems: A systematic review.
Rannikmäe, Kristiina; Woodfield, Rebecca; Anderson, Craig S; Charidimou, Andreas; Chiewvit, Pipat; Greenberg, Steven M; Jeng, Jiann-Shing; Meretoja, Atte; Palm, Frederic; Putaala, Jukka; Rinkel, Gabriel Je; Rosand, Jonathan; Rost, Natalia S; Strbian, Daniel; Tatlisumak, Turgut; Tsai, Chung-Fen; Wermer, Marieke Jh; Werring, David; Yeh, Shin-Joe; Al-Shahi Salman, Rustam; Sudlow, Cathie Lm
2016-08-01
Accurately distinguishing non-traumatic intracerebral hemorrhage (ICH) subtypes is important since they may have different risk factors, causal pathways, management, and prognosis. We systematically assessed the inter- and intra-rater reliability of ICH classification systems. We sought all available reliability assessments of anatomical and mechanistic ICH classification systems from electronic databases and personal contacts until October 2014. We assessed included studies' characteristics, reporting quality and potential for bias; summarized reliability with kappa value forest plots; and performed meta-analyses of the proportion of cases classified into each subtype. We included 8 of 2152 studies identified. Inter- and intra-rater reliabilities were substantial to perfect for anatomical and mechanistic systems (inter-rater kappa values: anatomical 0.78-0.97 [six studies, 518 cases], mechanistic 0.89-0.93 [three studies, 510 cases]; intra-rater kappas: anatomical 0.80-1 [three studies, 137 cases], mechanistic 0.92-0.93 [two studies, 368 cases]). Reporting quality varied but no study fulfilled all criteria and none was free from potential bias. All reliability studies were performed with experienced raters in specialist centers. Proportions of ICH subtypes were largely consistent with previous reports suggesting that included studies are appropriately representative. Reliability of existing classification systems appears excellent but is unknown outside specialist centers with experienced raters. Future reliability comparisons should be facilitated by studies following recently published reporting guidelines. © 2016 World Stroke Organization.
Field reliability of competency and sanity opinions: A systematic review and meta-analysis.
Guarnera, Lucy A; Murrie, Daniel C
2017-06-01
We know surprisingly little about the interrater reliability of forensic psychological opinions, even though courts and other authorities have long called for known error rates for scientific procedures admitted as courtroom testimony. This is particularly true for opinions produced during routine practice in the field, even for some of the most common types of forensic evaluations-evaluations of adjudicative competency and legal sanity. To address this gap, we used meta-analytic procedures and study space methodology to systematically review studies that examined the interrater reliability-particularly the field reliability-of competency and sanity opinions. Of 59 identified studies, 9 addressed the field reliability of competency opinions and 8 addressed the field reliability of sanity opinions. These studies presented a wide range of reliability estimates; pairwise percentage agreements ranged from 57% to 100% and kappas ranged from .28 to 1.0. Meta-analytic combinations of reliability estimates obtained by independent evaluators returned estimates of κ = .49 (95% CI: .40-.58) for competency opinions and κ = .41 (95% CI: .29-.53) for sanity opinions. This wide range of reliability estimates underscores the extent to which different evaluation contexts tend to produce different reliability rates. Unfortunately, our study space analysis illustrates that available field reliability studies typically provide little information about contextual variables crucial to understanding their findings. Given these concerns, we offer suggestions for improving research on the field reliability of competency and sanity opinions, as well as suggestions for improving reliability rates themselves. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Suzuki, T; Sato, Y; Sotome, S; Arai, H; Arai, A; Yoshida, H
2017-06-01
This study was designed to investigate the reliability and validity of measurements of finger diameters with a ring gauge. A reliability study enrolled two independent samples (50 participants and seven examiners in Study I; 26 participants and 26 examiners in Study II). The sizes of each participant's little fingers were measured twice with a ring gauge by each examiner. To investigate the validity of the measurements, five hand therapists compared the finger size and hand volume of 30 participants with the ring gauge and with a figure-of-eight technique (Study III). The intra-class correlation coefficient for intra-observer reliability ranged from 0.97 to 0.99 in Study I, and 0.90 to 0.97 in Study II. The intra-class correlation coefficient for inter-observer reliability was 0.95 in Study I and 0.94 in Study II. The validity study showed a Pearson product moment correlation coefficient of 0.75. The ring gauge showed high reliability and validity for measurement of finger size. III, diagnostic.
Score Reliability: A Retrospective Look Back at 12 Years of Reliability Generalization Studies
ERIC Educational Resources Information Center
Vacha-Haase, Tammi; Thompson, Bruce
2011-01-01
The present study was conducted to characterize (a) the features of the thousands of primary reports synthesized in 47 reliability generalization (RG) measurement meta-analysis studies and (b) typical methodological practice within the RG literature to date. With respect to the treatment of score reliability in the literature, in an astounding…
A Reliability Generalization Study of the Marlowe-Crowne Social Desirability Scale.
ERIC Educational Resources Information Center
Beretvas, S, Natasha; Meyers, Jason L.; Leite, Walter L.
2002-01-01
Conducted a reliability generalization study of the Marlowe-Crowne Social Desirability Scale (D. Crowne and D. Marlowe, 1960). Analysis of 93 studies show that the predicted score reliability for male adolescents was 0.53, and reliability for men's responses was lower than for women's. Discusses the need for further analysis of the scale. (SLD)
Koch, Michael S; DeSesso, John M; Williams, Amy Lavin; Michalek, Suzanne; Hammond, Bruce
2016-01-01
To determine the reliability of food safety studies carried out in rodents with genetically modified (GM) crops, a Food Safety Study Reliability Tool (FSSRTool) was adapted from the European Centre for the Validation of Alternative Methods' (ECVAM) ToxRTool. Reliability was defined as the inherent quality of the study with regard to use of standardized testing methodology, full documentation of experimental procedures and results, and the plausibility of the findings. Codex guidelines for GM crop safety evaluations indicate toxicology studies are not needed when comparability of the GM crop to its conventional counterpart has been demonstrated. This guidance notwithstanding, animal feeding studies have routinely been conducted with GM crops, but their conclusions on safety are not always consistent. To accurately evaluate potential risks from GM crops, risk assessors need clearly interpretable results from reliable studies. The development of the FSSRTool, which provides the user with a means of assessing the reliability of a toxicology study to inform risk assessment, is discussed. Its application to the body of literature on GM crop food safety studies demonstrates that reliable studies report no toxicologically relevant differences between rodents fed GM crops or their non-GM comparators.
Reliability of diagnosis and clinical efficacy of visceral osteopathy: a systematic review.
Guillaud, Albin; Darbois, Nelly; Monvoisin, Richard; Pinsault, Nicolas
2018-02-17
In 2010, the World Health Organization published benchmarks for training in osteopathy in which osteopathic visceral techniques are included. The purpose of this study was to identify and critically appraise the scientific literature concerning the reliability of diagnosis and the clinical efficacy of techniques used in visceral osteopathy. Databases MEDLINE, OSTMED.DR, the Cochrane Library, Osteopathic Research Web, Google Scholar, Journal of American Osteopathic Association (JAOA) website, International Journal of Osteopathic Medicine (IJOM) website, and the catalog of Académie d'ostéopathie de France website were searched through December 2017. Only inter-rater reliability studies including at least two raters or the intra-rater reliability studies including at least two assessments by the same rater were included. For efficacy studies, only randomized-controlled-trials (RCT) or crossover studies on unhealthy subjects (any condition, duration and outcome) were included. Risk of bias was determined using a modified version of the quality appraisal tool for studies of diagnostic reliability (QAREL) in reliability studies. For the efficacy studies, the Cochrane risk of bias tool was used to assess their methodological design. Two authors performed data extraction and analysis. Eight reliability studies and six efficacy studies were included. The analysis of reliability studies shows that the diagnostic techniques used in visceral osteopathy are unreliable. Regarding efficacy studies, the least biased study shows no significant difference for the main outcome. The main risks of bias found in the included studies were due to the absence of blinding of the examiners, an unsuitable statistical method or an absence of primary study outcome. The results of the systematic review lead us to conclude that well-conducted and sound evidence on the reliability and the efficacy of techniques in visceral osteopathy is absent. The review is registered PROSPERO 12th of December 2016. Registration number is CRD4201605286 .
Estimating Between-Person and Within-Person Subscore Reliability with Profile Analysis.
Bulut, Okan; Davison, Mark L; Rodriguez, Michael C
2017-01-01
Subscores are of increasing interest in educational and psychological testing due to their diagnostic function for evaluating examinees' strengths and weaknesses within particular domains of knowledge. Previous studies about the utility of subscores have mostly focused on the overall reliability of individual subscores and ignored the fact that subscores should be distinct and have added value over the total score. This study introduces a profile reliability approach that partitions the overall subscore reliability into within-person and between-person subscore reliability. The estimation of between-person reliability and within-person reliability coefficients is demonstrated using subscores from number-correct scoring, unidimensional and multidimensional item response theory scoring, and augmented scoring approaches via a simulation study and a real data study. The effects of various testing conditions, such as subtest length, correlations among subscores, and the number of subtests, are examined. Results indicate that there is a substantial trade-off between within-person and between-person reliability of subscores. Profile reliability coefficients can be useful in determining the extent to which subscores provide distinct and reliable information under various testing conditions.
Reliability generalization: a viable key for establishing validity generalization
NASA Technical Reports Server (NTRS)
Kennedy, R. S.; Turnage, J. J.
1991-01-01
Even with radical restriction of range, reliability coefficients from 10 studies gave an average interstudy value of .74, suggesting constancy of reliability over diverse experiments. A value from a new test can help index reliability of tests not previously studied.
Lucas, Nicholas; Macaskill, Petra; Irwig, Les; Moran, Robert; Bogduk, Nikolai
2009-01-01
Trigger points are promoted as an important cause of musculoskeletal pain. There is no accepted reference standard for the diagnosis of trigger points, and data on the reliability of physical examination for trigger points are conflicting. To systematically review the literature on the reliability of physical examination for the diagnosis of trigger points. MEDLINE, EMBASE, and other sources were searched for articles reporting the reliability of physical examination for trigger points. Included studies were evaluated for their quality and applicability, and reliability estimates were extracted and reported. Nine studies were eligible for inclusion. None satisfied all quality and applicability criteria. No study specifically reported reliability for the identification of the location of active trigger points in the muscles of symptomatic participants. Reliability estimates varied widely for each diagnostic sign, for each muscle, and across each study. Reliability estimates were generally higher for subjective signs such as tenderness (kappa range, 0.22-1.0) and pain reproduction (kappa range, 0.57-1.00), and lower for objective signs such as the taut band (kappa range, -0.08-0.75) and local twitch response (kappa range, -0.05-0.57). No study to date has reported the reliability of trigger point diagnosis according to the currently proposed criteria. On the basis of the limited number of studies available, and significant problems with their design, reporting, statistical integrity, and clinical applicability, physical examination cannot currently be recommended as a reliable test for the diagnosis of trigger points. The reliability of trigger point diagnosis needs to be further investigated with studies of high quality that use current diagnostic criteria in clinically relevant patients.
NASA Technical Reports Server (NTRS)
White, Mark; Huang, Bing; Qin, Jin; Gur, Zvi; Talmor, Michael; Chen, Yuan; Heidecker, Jason; Nguyen, Duc; Bernstein, Joseph
2005-01-01
As microelectronics are scaled in to the deep sub-micron regime, users of advanced technology CMOS, particularly in high-reliability applications, should reassess how scaling effects impact long-term reliability. An experimental based reliability study of industrial grade SRAMs, consisting of three different technology nodes, is proposed to substantiate current acceleration models for temperature and voltage life-stress relationships. This reliability study utilizes step-stress techniques to evaluate memory technologies (0.25mum, 0.15mum, and 0.13mum) embedded in many of today's high-reliability space/aerospace applications. Two acceleration modeling approaches are presented to relate experimental FIT calculations to Mfr's qualification data.
Salyers, M P; McHugo, G J; Cook, J A; Razzano, L A; Drake, R E; Mueser, K T
2001-09-01
Reliability of well-known instruments was examined in 202 people with severe mental illness participating in a multisite vocational study. We examined interrater reliability of the Positive and Negative Syndrome Scale (PANSS) and the internal consistency and test-retest reliability of the PANSS, the Rosenberg Self-Esteem Scale, the Medical Outcomes Study Short Form-36 (SF-36), and the Quality of Life Interview. Most scales had good levels of reliability, with intraclass correlation coefficients (ICCs) and coefficient alphas above .70. However, the SF-36 scales were generally less stable over time, particularly Social Functioning (ICC = .55). Test-retest reliability was lower among less educated respondents and among ethnic minorities. We recommend close monitoring of psychometric issues in future multisite studies.
Duncan, Laura; Comeau, Jinette; Wang, Li; Vitoroulis, Irene; Boyle, Michael H; Bennett, Kathryn
2018-02-19
A better understanding of factors contributing to the observed variability in estimates of test-retest reliability in published studies on standardized diagnostic interviews (SDI) is needed. The objectives of this systematic review and meta-analysis were to estimate the pooled test-retest reliability for parent and youth assessments of seven common disorders, and to examine sources of between-study heterogeneity in reliability. Following a systematic review of the literature, multilevel random effects meta-analyses were used to analyse 202 reliability estimates (Cohen's kappa = ҡ) from 31 eligible studies and 5,369 assessments of 3,344 children and youth. Pooled reliability was moderate at ҡ = .58 (CI 95% 0.53-0.63) and between-study heterogeneity was substantial (Q = 2,063 (df = 201), p < .001 and I 2 = 79%). In subgroup analysis, reliability varied across informants for specific types of psychiatric disorder (ҡ = .53-.69 for parent vs. ҡ = .39-.68 for youth) with estimates significantly higher for parents on attention deficit hyperactivity disorder, oppositional defiant disorder and the broad groupings of externalizing and any disorder. Reliability was also significantly higher in studies with indicators of poor or fair study methodology quality (sample size <50, retest interval <7 days). Our findings raise important questions about the meaningfulness of published evidence on the test-retest reliability of SDIs and the usefulness of these tools in both clinical and research contexts. Potential remedies include the introduction of standardized study and reporting requirements for reliability studies, and exploration of other approaches to assessing and classifying child and adolescent psychiatric disorder. © 2018 Association for Child and Adolescent Mental Health.
The development of a quality appraisal tool for studies of diagnostic reliability (QAREL).
Lucas, Nicholas P; Macaskill, Petra; Irwig, Les; Bogduk, Nikolai
2010-08-01
In systematic reviews of the reliability of diagnostic tests, no quality assessment tool has been used consistently. The aim of this study was to develop a specific quality appraisal tool for studies of diagnostic reliability. Key principles for the quality of studies of diagnostic reliability were identified with reference to epidemiologic principles, existing quality appraisal checklists, and the Standards for Reporting of Diagnostic Accuracy (STARD) and Quality Assessment of Diagnostic Accuracy Studies (QUADAS) resources. Specific items that encompassed each of the principles were developed. Experts in diagnostic research provided feedback on the items that were to form the appraisal tool. This process was iterative and continued until consensus among experts was reached. The Quality Appraisal of Reliability Studies (QAREL) checklist includes 11 items that explore seven principles. Items cover the spectrum of subjects, spectrum of examiners, examiner blinding, order effects of examination, suitability of the time interval among repeated measurements, appropriate test application and interpretation, and appropriate statistical analysis. QAREL has been developed as a specific quality appraisal tool for studies of diagnostic reliability. The reliability of this tool in different contexts needs to be evaluated. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Barrett, Eva; McCreesh, Karen; Lewis, Jeremy
2014-02-01
A wide array of instruments are available for non-invasive thoracic kyphosis measurement. Guidelines for selecting outcome measures for use in clinical and research practice recommend that properties such as validity and reliability are considered. This systematic review reports on the reliability and validity of non-invasive methods for measuring thoracic kyphosis. A systematic search of 11 electronic databases located studies assessing reliability and/or validity of non-invasive thoracic kyphosis measurement techniques. Two independent reviewers used a critical appraisal tool to assess the quality of retrieved studies. Data was extracted by the primary reviewer. The results were synthesized qualitatively using a level of evidence approach. 27 studies satisfied the eligibility criteria and were included in the review. The reliability, validity and both reliability and validity were investigated by sixteen, two and nine studies respectively. 17/27 studies were deemed to be of high quality. In total, 15 methods of thoracic kyphosis were evaluated in retrieved studies. All investigated methods showed high (ICC ≥ .7) to very high (ICC ≥ .9) levels of reliability. The validity of the methods ranged from low to very high. The strongest levels of evidence for reliability exists in support of the Debrunner kyphometer, Spinal Mouse and Flexicurve index, and for validity supports the arcometer and Flexicurve index. Further reliability and validity studies are required to strengthen the level of evidence for the remaining methods of measurement. This should be addressed by future research. Copyright © 2013 Elsevier Ltd. All rights reserved.
Taghipour, Morteza; Mohseni-Bandpei, Mohammad Ali; Behtash, Hamid; Abdollahi, Iraj; Rajabzadeh, Fatemeh; Pourahmadi, Mohammad Reza; Emami, Mahnaz
2018-04-24
Rehabilitative ultrasound (US) imaging is one of the popular methods for investigating muscle morphologic characteristics and dimensions in recent years. The reliability of this method has been investigated in different studies. As studies have been performed with different designs and quality, reported values of rehabilitative US have a wide range. The objective of this study was to systematically review the literature conducted on the reliability of rehabilitative US imaging for the assessment of deep abdominal and lumbar trunk muscle dimensions. The PubMed/MEDLINE, Scopus, Google Scholar, Science Direct, Embase, Physiotherapy Evidence, Ovid, and CINAHL databases were searched to identify original research articles conducted on the reliability of rehabilitative US imaging published from June 2007 to August 2017. The articles were qualitatively assessed; reliability data were extracted; and the methodological quality was evaluated by 2 independent reviewers. Of the 26 included studies, 16 were considered of high methodological quality. Except for 2 studies, all high-quality studies reported intraclass correlation coefficients (ICCs) for intra-rater reliability of 0.70 or greater. Also, ICCs reported for inter-rater reliability in high-quality studies were generally greater than 0.70. Among low-quality studies, reported ICCs ranged from 0.26 to 0.99 and 0.68 to 0.97 for intra- and inter-rater reliability, respectively. Also, the reported standard error of measurement and minimal detectable change for rehabilitative US were generally in an acceptable range. Generally, the results of the reviewed studies indicate that rehabilitative US imaging has good levels of both inter- and intra-rater reliability. © 2018 by the American Institute of Ultrasound in Medicine.
Bottema-Beutel, Kristen; Lloyd, Blair; Carter, Erik W; Asmus, Jennifer M
2014-11-01
Attaining reliable estimates of observational measures can be challenging in school and classroom settings, as behavior can be influenced by multiple contextual factors. Generalizability (G) studies can enable researchers to estimate the reliability of observational data, and decision (D) studies can inform how many observation sessions are necessary to achieve a criterion level of reliability. We conducted G and D studies using observational data from a randomized control trial focusing on social and academic participation of students with severe disabilities in inclusive secondary classrooms. Results highlight the importance of anchoring observational decisions to reliability estimates from existing or pilot data sets. We outline steps for conducting G and D studies and address options when reliability estimates are lower than desired.
Intersession reliability of fMRI activation for heat pain and motor tasks
Quiton, Raimi L.; Keaser, Michael L.; Zhuo, Jiachen; Gullapalli, Rao P.; Greenspan, Joel D.
2014-01-01
As the practice of conducting longitudinal fMRI studies to assess mechanisms of pain-reducing interventions becomes more common, there is a great need to assess the test–retest reliability of the pain-related BOLD fMRI signal across repeated sessions. This study quantitatively evaluated the reliability of heat pain-related BOLD fMRI brain responses in healthy volunteers across 3 sessions conducted on separate days using two measures: (1) intraclass correlation coefficients (ICC) calculated based on signal amplitude and (2) spatial overlap. The ICC analysis of pain-related BOLD fMRI responses showed fair-to-moderate intersession reliability in brain areas regarded as part of the cortical pain network. Areas with the highest intersession reliability based on the ICC analysis included the anterior midcingulate cortex, anterior insula, and second somatosensory cortex. Areas with the lowest intersession reliability based on the ICC analysis also showed low spatial reliability; these regions included pregenual anterior cingulate cortex, primary somatosensory cortex, and posterior insula. Thus, this study found regional differences in pain-related BOLD fMRI response reliability, which may provide useful information to guide longitudinal pain studies. A simple motor task (finger-thumb opposition) was performed by the same subjects in the same sessions as the painful heat stimuli were delivered. Intersession reliability of fMRI activation in cortical motor areas was comparable to previously published findings for both spatial overlap and ICC measures, providing support for the validity of the analytical approach used to assess intersession reliability of pain-related fMRI activation. A secondary finding of this study is that the use of standard ICC alone as a measure of reliability may not be sufficient, as the underlying variance structure of an fMRI dataset can result in inappropriately high ICC values; a method to eliminate these false positive results was used in this study and is recommended for future studies of test–retest reliability. PMID:25161897
Assessing reliability and validity measures in managed care studies.
Montoya, Isaac D
2003-01-01
To review the reliability and validity literature and develop an understanding of these concepts as applied to managed care studies. Reliability is a test of how well an instrument measures the same input at varying times and under varying conditions. Validity is a test of how accurately an instrument measures what one believes is being measured. A review of reliability and validity instructional material was conducted. Studies of managed care practices and programs abound. However, many of these studies utilize measurement instruments that were developed for other purposes or for a population other than the one being sampled. In other cases, instruments have been developed without any testing of the instrument's performance. The lack of reliability and validity information may limit the value of these studies. This is particularly true when data are collected for one purpose and used for another. The usefulness of certain studies without reliability and validity measures is questionable, especially in cases where the literature contradicts itself
Assessing the reliability of ecotoxicological studies: An overview of current needs and approaches.
Moermond, Caroline; Beasley, Amy; Breton, Roger; Junghans, Marion; Laskowski, Ryszard; Solomon, Keith; Zahner, Holly
2017-07-01
In general, reliable studies are well designed and well performed, and enough details on study design and performance are reported to assess the study. For hazard and risk assessment in various legal frameworks, many different types of ecotoxicity studies need to be evaluated for reliability. These studies vary in study design, methodology, quality, and level of detail reported (e.g., reviews, peer-reviewed research papers, or industry-sponsored studies documented under Good Laboratory Practice [GLP] guidelines). Regulators have the responsibility to make sound and verifiable decisions and should evaluate each study for reliability in accordance with scientific principles regardless of whether they were conducted in accordance with GLP and/or standardized methods. Thus, a systematic and transparent approach is needed to evaluate studies for reliability. In this paper, 8 different methods for reliability assessment were compared using a number of attributes: categorical versus numerical scoring methods, use of exclusion and critical criteria, weighting of criteria, whether methods are tested with case studies, domain of applicability, bias toward GLP studies, incorporation of standard guidelines in the evaluation method, number of criteria used, type of criteria considered, and availability of guidance material. Finally, some considerations are given on how to choose a suitable method for assessing reliability of ecotoxicity studies. Integr Environ Assess Manag 2017;13:640-651. © 2016 The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals, Inc. on behalf of Society of Environmental Toxicology & Chemistry (SETAC). © 2016 The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals, Inc. on behalf of Society of Environmental Toxicology & Chemistry (SETAC).
Acute Respiratory Distress Syndrome Measurement Error. Potential Effect on Clinical Study Results
Cooke, Colin R.; Iwashyna, Theodore J.; Hofer, Timothy P.
2016-01-01
Rationale: Identifying patients with acute respiratory distress syndrome (ARDS) is a recognized challenge. Experts often have only moderate agreement when applying the clinical definition of ARDS to patients. However, no study has fully examined the implications of low reliability measurement of ARDS on clinical studies. Objectives: To investigate how the degree of variability in ARDS measurement commonly reported in clinical studies affects study power, the accuracy of treatment effect estimates, and the measured strength of risk factor associations. Methods: We examined the effect of ARDS measurement error in randomized clinical trials (RCTs) of ARDS-specific treatments and cohort studies using simulations. We varied the reliability of ARDS diagnosis, quantified as the interobserver reliability (κ-statistic) between two reviewers. In RCT simulations, patients identified as having ARDS were enrolled, and when measurement error was present, patients without ARDS could be enrolled. In cohort studies, risk factors as potential predictors were analyzed using reviewer-identified ARDS as the outcome variable. Measurements and Main Results: Lower reliability measurement of ARDS during patient enrollment in RCTs seriously degraded study power. Holding effect size constant, the sample size necessary to attain adequate statistical power increased by more than 50% as reliability declined, although the result was sensitive to ARDS prevalence. In a 1,400-patient clinical trial, the sample size necessary to maintain similar statistical power increased to over 1,900 when reliability declined from perfect to substantial (κ = 0.72). Lower reliability measurement diminished the apparent effectiveness of an ARDS-specific treatment from a 15.2% (95% confidence interval, 9.4–20.9%) absolute risk reduction in mortality to 10.9% (95% confidence interval, 4.7–16.2%) when reliability declined to moderate (κ = 0.51). In cohort studies, the effect on risk factor associations was similar. Conclusions: ARDS measurement error can seriously degrade statistical power and effect size estimates of clinical studies. The reliability of ARDS measurement warrants careful attention in future ARDS clinical studies. PMID:27159648
Clayson, Peter E; Miller, Gregory A
2017-01-01
Failing to consider psychometric issues related to reliability and validity, differential deficits, and statistical power potentially undermines the conclusions of a study. In research using event-related brain potentials (ERPs), numerous contextual factors (population sampled, task, data recording, analysis pipeline, etc.) can impact the reliability of ERP scores. The present review considers the contextual factors that influence ERP score reliability and the downstream effects that reliability has on statistical analyses. Given the context-dependent nature of ERPs, it is recommended that ERP score reliability be formally assessed on a study-by-study basis. Recommended guidelines for ERP studies include 1) reporting the threshold of acceptable reliability and reliability estimates for observed scores, 2) specifying the approach used to estimate reliability, and 3) justifying how trial-count minima were chosen. A reliability threshold for internal consistency of at least 0.70 is recommended, and a threshold of 0.80 is preferred. The review also advocates the use of generalizability theory for estimating score dependability (the generalizability theory analog to reliability) as an improvement on classical test theory reliability estimates, suggesting that the latter is less well suited to ERP research. To facilitate the calculation and reporting of dependability estimates, an open-source Matlab program, the ERP Reliability Analysis Toolbox, is presented. Copyright © 2016 Elsevier B.V. All rights reserved.
Which symptom assessments and approaches are uniquely appropriate for paediatric concussion?
Gioia, G A; Schneider, J C; Vaughan, C G; Isquith, P K
2009-05-01
To (a) identify post-concussion symptom scales appropriate for children and adolescents in sports; (b) review evidence for reliability and validity; and (c) recommend future directions for scale development. Quantitative and qualitative literature review of symptom rating scales appropriate for children and adolescents aged 5 to 22 years. Literature identified via search of Medline, Ovid-Medline and PsycInfo databases; review of reference lists in identified articles; querying sports concussion specialists. 29 articles met study inclusion criteria. 5 symptom scales examined in 11 studies for ages 5-12 years and in 25 studies for ages 13-22. 10 of 11 studies for 5-12-year-olds presented validity evidence for three scales; 7 studies provided reliability evidence for two scales; 7 studies used serial administrations but no reliable change metrics. Two scales included parent-reports and one included a teacher report. 24 of 25 studies for 13-22 year-olds presented validity evidence for five measures; seven studies provided reliability evidence for four measures with 18 studies including serial administrations and two examining Reliable Change. Psychometric evidence for symptom scales is stronger for adolescents than for younger children. Most scales provide evidence of concurrent validity, discriminating concussed and non-concussed groups. Few report reliability and evidence for validity is narrow. Two measures include parent/teacher reports. Few scales examine reliable change statistics, limiting interpretability of temporal changes. Future studies are needed to fully define symptom scale psychometric properties with the greatest need in younger student-athletes.
Boonstra, Anne M; Schiphorst Preuper, Henrica R; Reneman, Michiel F; Posthumus, Jitze B; Stewart, Roy E
2008-06-01
To determine the reliability and concurrent validity of a visual analogue scale (VAS) for disability as a single-item instrument measuring disability in chronic pain patients was the objective of the study. For the reliability study a test-retest design and for the validity study a cross-sectional design was used. A general rehabilitation centre and a university rehabilitation centre was the setting for the study. The study population consisted of patients over 18 years of age, suffering from chronic musculoskeletal pain; 52 patients in the reliability study, 344 patients in the validity study. Main outcome measures were as follows. Reliability study: Spearman's correlation coefficients (rho values) of the test and retest data of the VAS for disability; validity study: rho values of the VAS disability scores with the scores on four domains of the Short-Form Health Survey (SF-36) and VAS pain scores, and with Roland-Morris Disability Questionnaire scores in chronic low back pain patients. Results were as follows: in the reliability study rho values varied from 0.60 to 0.77; and in the validity study rho values of VAS disability scores with SF-36 domain scores varied from 0.16 to 0.51, with Roland-Morris Disability Questionnaire scores from 0.38 to 0.43 and with VAS pain scores from 0.76 to 0.84. The conclusion of the study was that the reliability of the VAS for disability is moderate to good. Because of a weak correlation with other disability instruments and a strong correlation with the VAS for pain, however, its validity is questionable.
2016-03-01
A BOUNCE? A STUDY ON RESILIENCE AND HUMAN RELATIONS IN A HIGH RELIABILITY ORGANIZATION by Robert D. Johns March 2016 Thesis Advisor...RELATIONS IN A HIGH RELIABILITY ORGANIZATION 5. FUNDING NUMBERS 6. AUTHOR(S) Robert D. Johns 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES...200 words) This study analyzes the various resilience factors associated with a military high reliability organization (HRO). The data measuring
Zaki, Rafdzah; Bulgiba, Awang; Nordin, Noorhaire; Azina Ismail, Noor
2013-06-01
Reliability measures precision or the extent to which test results can be replicated. This is the first ever systematic review to identify statistical methods used to measure reliability of equipment measuring continuous variables. This studyalso aims to highlight the inappropriate statistical method used in the reliability analysis and its implication in the medical practice. In 2010, five electronic databases were searched between 2007 and 2009 to look for reliability studies. A total of 5,795 titles were initially identified. Only 282 titles were potentially related, and finally 42 fitted the inclusion criteria. The Intra-class Correlation Coefficient (ICC) is the most popular method with 25 (60%) studies having used this method followed by the comparing means (8 or 19%). Out of 25 studies using the ICC, only 7 (28%) reported the confidence intervals and types of ICC used. Most studies (71%) also tested the agreement of instruments. This study finds that the Intra-class Correlation Coefficient is the most popular method used to assess the reliability of medical instruments measuring continuous outcomes. There are also inappropriate applications and interpretations of statistical methods in some studies. It is important for medical researchers to be aware of this issue, and be able to correctly perform analysis in reliability studies.
A systematic review of the factor structure and reliability of the Spence Children's Anxiety Scale.
Orgilés, Mireia; Fernández-Martínez, Iván; Guillén-Riquelme, Alejandro; Espada, José P; Essau, Cecilia A
2016-01-15
The Spence Children's Anxiety Scale (SCAS) is a widely used instrument for assessing symptoms of anxiety disorders among children and adolescents. Previous studies have demonstrated its good reliability for children and adolescents from different backgrounds. However, remarkable variability in the reliability of the SCAS across studies and inconsistent results regarding its factor structure has been found. The present study aims to examine the SCAS factor structure by means of a systematic review with narrative synthesis, the mean reliability of the SCAS by means of a meta-analysis, and the influence of the moderators on the SCAS reliability. Databases employed to collect the studies included Scholar Google, PsycARTICLES, PsycINFO, Web of Science, and Scopus since 1997. Twenty-nine and 32 studies, which examined the factor structure and the internal consistency of the SCAS, respectively, were included. The SCAS was found to have strong internal consistency, influenced by different moderators. The systematic review demonstrated that the original six-factor model was supported by most studies. Factorial invariance studies (across age, gender, country) and test-retest reliability of the SCAS were not examined in this study. It is concluded that the SCAS is a reliable instrument for cross-cultural use, and it is suggested that the original six-factor model is appropriate for cross-cultural application. Copyright © 2015 Elsevier B.V. All rights reserved.
Hulteen, Ryan M; Lander, Natalie J; Morgan, Philip J; Barnett, Lisa M; Robertson, Samuel J; Lubans, David R
2015-10-01
It has been suggested that young people should develop competence in a variety of 'lifelong physical activities' to ensure that they can be active across the lifespan. The primary aim of this systematic review is to report the methodological properties, validity, reliability, and test duration of field-based measures that assess movement skill competency in lifelong physical activities. A secondary aim was to clearly define those characteristics unique to lifelong physical activities. A search of four electronic databases (Scopus, SPORTDiscus, ProQuest, and PubMed) was conducted between June 2014 and April 2015 with no date restrictions. Studies addressing the validity and/or reliability of lifelong physical activity tests were reviewed. Included articles were required to assess lifelong physical activities using process-oriented measures, as well as report either one type of validity or reliability. Assessment criteria for methodological quality were adapted from a checklist used in a previous review of sport skill outcome assessments. Movement skill assessments for eight different lifelong physical activities (badminton, cycling, dance, golf, racquetball, resistance training, swimming, and tennis) in 17 studies were identified for inclusion. Methodological quality, validity, reliability, and test duration (time to assess a single participant), for each article were assessed. Moderate to excellent reliability results were found in 16 of 17 studies, with 71% reporting inter-rater reliability and 41% reporting intra-rater reliability. Only four studies in this review reported test-retest reliability. Ten studies reported validity results; content validity was cited in 41% of these studies. Construct validity was reported in 24% of studies, while criterion validity was only reported in 12% of studies. Numerous assessments for lifelong physical activities may exist, yet only assessments for eight lifelong physical activities were included in this review. Generalizability of results may be more applicable if more heterogeneous samples are used in future research. Moderate to excellent levels of inter- and intra-rater reliability were reported in the majority of studies. However, future work should look to establish test-retest reliability. Validity was less commonly reported than reliability, and further types of validity other than content validity need to be established in future research. Specifically, predictive validity of 'lifelong physical activity' movement skill competency is needed to support the assertion that such activities provide the foundation for a lifetime of activity.
The reliability of the Australasian Triage Scale: a meta-analysis
Ebrahimi, Mohsen; Heydari, Abbas; Mazlom, Reza; Mirhaghi, Amir
2015-01-01
BACKGROUND: Although the Australasian Triage Scale (ATS) has been developed two decades ago, its reliability has not been defined; therefore, we present a meta-analyis of the reliability of the ATS in order to reveal to what extent the ATS is reliable. DATA SOURCES: Electronic databases were searched to March 2014. The included studies were those that reported samples size, reliability coefficients, and adequate description of the ATS reliability assessment. The guidelines for reporting reliability and agreement studies (GRRAS) were used. Two reviewers independently examined abstracts and extracted data. The effect size was obtained by the z-transformation of reliability coefficients. Data were pooled with random-effects models, and meta-regression was done based on the method of moment’s estimator. RESULTS: Six studies were included in this study at last. Pooled coefficient for the ATS was substantial 0.428 (95%CI 0.340–0.509). The rate of mis-triage was less than fifty percent. The agreement upon the adult version is higher than the pediatric version. CONCLUSION: The ATS has shown an acceptable level of overall reliability in the emergency department, but it needs more development to reach an almost perfect agreement. PMID:26056538
McCreesh, Karen M; Crotty, James M; Lewis, Jeremy S
2015-03-01
Narrowing of the subacromial space has been noted as a common feature of rotator cuff (RC) tendinopathy. It has been implicated in the development of symptoms and forms the basis for some surgical and rehabilitation approaches. Various radiological methods have been used to measure the subacromial space, which is represented by a two-dimensional measurement of acromiohumeral distance (AHD). A reliable method of measurement could be used to assess the impact of rehabilitation or surgical interventions for RC tendinopathy; however, there are no published reviews assessing the reliability of AHD measurement. The aim of this review was to systematically assess the evidence for the intrarater and inter-rater reliability of radiological methods of measuring AHD, in order to identify the most reliable method for use in RC tendinopathy. An electronic literature search was carried out and studies describing the reliability of any radiological method of measuring AHD in either healthy or RC tendinopathy groups were included. Eighteen studies met the inclusion criteria and were appraised by two reviewers using the Quality Appraisal for reliability Studies checklist. Eight studies were deemed to be of high methodological quality. Study weaknesses included lack of tester blinding, inadequate description of tester experience, lack of inclusion of symptomatic populations, poor reporting of statistical methods and unclear diagnosis. There was strong evidence for the reliability of ultrasound for measuring AHD, with moderate evidence for MRI and CT measures and conflicting evidence for radiographic methods. Overall, there was lack of research in RC tendinopathy populations, with only six studies including participants with shoulder pain. The results support the reliability of ultrasound and CT or MRI for the measurement of AHD; however, more studies in symptomatic populations are required. The reliability of AHD measurement using radiographs has not been supported by the studies reviewed. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
A study of fault prediction and reliability assessment in the SEL environment
NASA Technical Reports Server (NTRS)
Basili, Victor R.; Patnaik, Debabrata
1986-01-01
An empirical study on estimation and prediction of faults, prediction of fault detection and correction effort, and reliability assessment in the Software Engineering Laboratory environment (SEL) is presented. Fault estimation using empirical relationships and fault prediction using curve fitting method are investigated. Relationships between debugging efforts (fault detection and correction effort) in different test phases are provided, in order to make an early estimate of future debugging effort. This study concludes with the fault analysis, application of a reliability model, and analysis of a normalized metric for reliability assessment and reliability monitoring during development of software.
1975-04-01
to 2-16 Category 3: Fabrie’ition Methods and Techniques 3-01 to 3-21 Category 4: ReliabiLity Studies 4-01 to 4-15 Category 5: C,,rputeiized Analysis...RAC icrodrcuit Thesaurus. The ternis are arranged in alphabetical order with sub-term description followinti each main term. Cosvreferencing is...Reliability aspects of vrocircuit manufacturi’. 4. Reliability Studies : Technics) reports !:datig to ;ormal ve isbbty studies and investi- sations
[Study of the relationship between human quality and reliability].
Long, S; Wang, C; Wang, L i; Yuan, J; Liu, H; Jiao, X
1997-02-01
To clarify the relationship between human quality and reliability, 1925 experiments in 20 subjects were carried out to study the relationship between disposition character, digital memory, graphic memory, multi-reaction time and education level and simulated aircraft operation. Meanwhile, effects of task difficulty and enviromental factor on human reliability were also studied. The results showed that human quality can be predicted and evaluated through experimental methods. The better the human quality, the higher the human reliability.
Use of Internal Consistency Coefficients for Estimating Reliability of Experimental Tasks Scores
Green, Samuel B.; Yang, Yanyun; Alt, Mary; Brinkley, Shara; Gray, Shelley; Hogan, Tiffany; Cowan, Nelson
2017-01-01
Reliabilities of scores for experimental tasks are likely to differ from one study to another to the extent that the task stimuli change, the number of trials varies, the type of individuals taking the task changes, the administration conditions are altered, or the focal task variable differs. Given reliabilities vary as a function of the design of these tasks and the characteristics of the individuals taking them, making inferences about the reliability of scores in an ongoing study based on reliability estimates from prior studies is precarious. Thus, it would be advantageous to estimate reliability based on data from the ongoing study. We argue that internal consistency estimates of reliability are underutilized for experimental task data and in many applications could provide this information using a single administration of a task. We discuss different methods for computing internal consistency estimates with a generalized coefficient alpha and the conditions under which these estimates are accurate. We illustrate use of these coefficients using data for three different tasks. PMID:26546100
Mani, Suresh; Sharma, Shobha; Omar, Baharudin; Paungmali, Aatit; Joseph, Leonard
2017-04-01
Purpose The purpose of this review is to systematically explore and summarise the validity and reliability of telerehabilitation (TR)-based physiotherapy assessment for musculoskeletal disorders. Method A comprehensive systematic literature review was conducted using a number of electronic databases: PubMed, EMBASE, PsycINFO, Cochrane Library and CINAHL, published between January 2000 and May 2015. The studies examined the validity, inter- and intra-rater reliabilities of TR-based physiotherapy assessment for musculoskeletal conditions were included. Two independent reviewers used the Quality Appraisal Tool for studies of diagnostic Reliability (QAREL) and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool to assess the methodological quality of reliability and validity studies respectively. Results A total of 898 hits were achieved, of which 11 articles based on inclusion criteria were reviewed. Nine studies explored the concurrent validity, inter- and intra-rater reliabilities, while two studies examined only the concurrent validity. Reviewed studies were moderate to good in methodological quality. The physiotherapy assessments such as pain, swelling, range of motion, muscle strength, balance, gait and functional assessment demonstrated good concurrent validity. However, the reported concurrent validity of lumbar spine posture, special orthopaedic tests, neurodynamic tests and scar assessments ranged from low to moderate. Conclusion TR-based physiotherapy assessment was technically feasible with overall good concurrent validity and excellent reliability, except for lumbar spine posture, orthopaedic special tests, neurodynamic testa and scar assessment.
Reliability of conditioned pain modulation: a systematic review
Kennedy, Donna L.; Kemp, Harriet I.; Ridout, Deborah; Yarnitsky, David; Rice, Andrew S.C.
2016-01-01
Abstract A systematic literature review was undertaken to determine if conditioned pain modulation (CPM) is reliable. Longitudinal, English language observational studies of the repeatability of a CPM test paradigm in adult humans were included. Two independent reviewers assessed the risk of bias in 6 domains; study participation; study attrition; prognostic factor measurement; outcome measurement; confounding and analysis using the Quality in Prognosis Studies (QUIPS) critical assessment tool. Intraclass correlation coefficients (ICCs) less than 0.4 were considered to be poor; 0.4 and 0.59 to be fair; 0.6 and 0.75 good and greater than 0.75 excellent. Ten studies were included in the final review. Meta-analysis was not appropriate because of differences between studies. The intersession reliability of the CPM effect was investigated in 8 studies and reported as good (ICC = 0.6-0.75) in 3 studies and excellent (ICC > 0.75) in subgroups in 2 of those 3. The assessment of risk of bias demonstrated that reporting is not comprehensive for the description of sample demographics, recruitment strategy, and study attrition. The absence of blinding, a lack of control for confounding factors, and lack of standardisation in statistical analysis are common. Conditioned pain modulation is a reliable measure; however, the degree of reliability is heavily dependent on stimulation parameters and study methodology and this warrants consideration for investigators. The validation of CPM as a robust prognostic factor in experimental and clinical pain studies may be facilitated by improvements in the reporting of CPM reliability studies. PMID:27559835
Partnering to Establish and Study Simulation in International Nursing Education.
Garner, Shelby L; Killingsworth, Erin; Raj, Leena
The purpose of this article was to describe an international partnership to establish and study simulation in India. A pilot study was performed to determine interrater reliability among faculty new to simulation when evaluating nursing student competency performance. Interrater reliability was below the ideal agreement level. Findings in this study underscore the need to obtain baseline interrater reliability data before integrating competency evaluation into a simulation program.
A Note on the Score Reliability for the Satisfaction with Life Scale: An RG Study
ERIC Educational Resources Information Center
Vassar, Matt
2008-01-01
The purpose of the present study was to meta-analytically investigate the score reliability for the Satisfaction With Life Scale. Four-hundred and sixteen articles using the measure were located through electronic database searches and then separated to identify studies which had calculated reliability estimates from their own data. Sixty-two…
Factors Influencing the Reliability of the Glasgow Coma Scale: A Systematic Review.
Reith, Florence Cm; Synnot, Anneliese; van den Brande, Ruben; Gruen, Russell L; Maas, Andrew Ir
2017-06-01
The Glasgow Coma Scale (GCS) characterizes patients with diminished consciousness. In a recent systematic review, we found overall adequate reliability across different clinical settings, but reliability estimates varied considerably between studies, and methodological quality of studies was overall poor. Identifying and understanding factors that can affect its reliability is important, in order to promote high standards for clinical use of the GCS. The aim of this systematic review was to identify factors that influence reliability and to provide an evidence base for promoting consistent and reliable application of the GCS. A comprehensive literature search was undertaken in MEDLINE, EMBASE, and CINAHL from 1974 to July 2016. Studies assessing the reliability of the GCS in adults or describing any factor that influences reliability were included. Two reviewers independently screened citations, selected full texts, and undertook data extraction and critical appraisal. Methodological quality of studies was evaluated with the consensus-based standards for the selection of health measurement instruments checklist. Data were synthesized narratively and presented in tables. Forty-one studies were included for analysis. Factors identified that may influence reliability are education and training, the level of consciousness, and type of stimuli used. Conflicting results were found for experience of the observer, the pathology causing the reduced consciousness, and intubation/sedation. No clear influence was found for the professional background of observers. Reliability of the GCS is influenced by multiple factors and as such is context dependent. This review points to the potential for improvement from training and education and standardization of assessment methods, for which recommendations are presented. Copyright © 2017 by the Congress of Neurological Surgeons.
A reliability study of the new sensors for movement analysis (SHARIF-HMIS).
Abedi, Mohen; Manshadi, Farideh Dehghan; Zavieh, Minoo Khalkhali; Ashouri, Sajad; Azimi, Hadi; Parnanpour, Mohamad
2016-04-01
SHARIF-HMIS is a new inertial sensor designed for movement analysis. The aim of the present study was to assess the inter-tester and intra-tester reliability of some kinematic parameters in different lumbar motions making use of this sensor. 24 healthy persons and 28 patients with low back pain participated in the current reliability study. The test was performed in five different lumbar motions consisting of lumbar flexion in 0, 15, and 30° in the right and left directions. For measuring inter-tester reliability, all the tests were carried out twice on the same day separately by two physiotherapists. Intra-tester reliability was assessed by reproducing the tests after 3 days by the same physiotherapist. The present study revealed satisfactory inter- and intra-tester reliability indices in different positions. ICCs for intra-tester reliability ranged from 0.65 to 0.98 and 0.59 to 0.81 for healthy and patient participants, respectively. Also, ICCs for inter-tester reliability ranged from 0.65 to 0.92 for the healthy and 0.65 to 0.87 for patient participants. In general, it can be inferred from the results that measuring the kinematic parameters in lumbar movements using inertial sensors enjoys acceptable reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Chiang, Hsin-Yu; Lu, Wen-Shian; Yu, Wan-Hui; Hsueh, I-Ping; Hsieh, Ching-Lin
2018-04-11
To examine the interrater and intrarater reliability of the Balance Computerized Adaptive Test (Balance CAT) in patients with chronic stroke having a wide range of balance functions. Repeated assessments design (1wk apart). Seven teaching hospitals. A pooled sample (N=102) including 2 independent groups of outpatients (n=50 for the interrater reliability study; n=52 for the intrarater reliability study) with chronic stroke. Not applicable. Balance CAT. For the interrater reliability study, the values of intraclass correlation coefficient, minimal detectable change (MDC), and percentage of MDC (MDC%) for the Balance CAT were .84, 1.90, and 31.0%, respectively. For the intrarater reliability study, the values of intraclass correlation coefficient, MDC, and MDC% ranged from .89 to .91, from 1.14 to 1.26, and from 17.1% to 18.6%, respectively. The Balance CAT showed sufficient intrarater reliability in patients with chronic stroke having balance functions ranging from sitting with support to independent walking. Although the Balance CAT may have good interrater reliability, we found substantial random measurement error between different raters. Accordingly, if the Balance CAT is used as an outcome measure in clinical or research settings, same raters are suggested over different time points to ensure reliable assessments. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Rabelo, Michelle; Nunes, Guilherme S; da Costa Amante, Natália Menezes; de Noronha, Marcos; Fachin-Martins, Emerson
2016-02-01
Muscle weakness is the main cause of motor impairment among stroke survivors and is associated with reduced peak muscle torque. To systematically investigate and organize the evidence of the reliability of muscle strength evaluation measures in post-stroke survivors with chronic hemiparesis. Two assessors independently searched four electronic databases in January 2014 (Medline, Scielo, CINAHL, Embase). Inclusion criteria comprised studies on reliability on muscle strength assessment in adult post-stroke patients with chronic hemiparesis. We extracted outcomes from included studies about reliability data, measured by intraclass correlation coefficient (ICC) and/or similar. The meta-analyses were conducted only with isokinetic data. Of 450 articles, eight articles were included for this review. After quality analysis, two studies were considered of high quality. Five different joints were analyzed within the included studies (knee, hip, ankle, shoulder, and elbow). Their reliability results varying from low to very high reliability (ICCs from 0.48 to 0.99). Results of meta-analysis for knee extension varying from high to very high reliability (pooled ICCs from 0.89 to 0.97), for knee flexion varying from high to very high reliability (pooled ICCs from 0.84 to 0.91) and for ankle plantar flexion showed high reliability (pooled ICC = 0.85). Objective muscle strength assessment can be reliably used in lower and upper extremities in post-stroke patients with chronic hemiparesis.
Validity and Reliability of Turkish Male Breast Self-Examination Instrument.
Erkin, Özüm; Göl, İlknur
2018-04-01
This study aims to measure the validity and reliability of Turkish male breast self-examination (MBSE) instrument. The methodological study was performed in 2016 at Ege University, Faculty of Nursing, İzmir, Turkey. The MBSE includes ten steps. For validity studies, face validity, content validity, and construct validity (exploratory factor analysis) were done. For reliability study, Kuder Richardson was calculated. The content validity index was found to be 0.94. Kendall W coefficient was 0.80 (p=0.551). The total variance explained by the two factors was found to be 63.24%. Kuder Richardson 21 was done for reliability study and found to be 0.97 for the instrument. The final instrument included 10 steps and two stages. The Turkish version of MBSE is a valid and reliable instrument for early diagnose. The MBSE can be used in Turkish speaking countries and cultures with two stages and 10 steps.
NASA Astrophysics Data System (ADS)
Li, Lin; Zeng, Li; Lin, Zi-Jing; Cazzell, Mary; Liu, Hanli
2015-05-01
Test-retest reliability of neuroimaging measurements is an important concern in the investigation of cognitive functions in the human brain. To date, intraclass correlation coefficients (ICCs), originally used in inter-rater reliability studies in behavioral sciences, have become commonly used metrics in reliability studies on neuroimaging and functional near-infrared spectroscopy (fNIRS). However, as there are six popular forms of ICC, the adequateness of the comprehensive understanding of ICCs will affect how one may appropriately select, use, and interpret ICCs toward a reliability study. We first offer a brief review and tutorial on the statistical rationale of ICCs, including their underlying analysis of variance models and technical definitions, in the context of assessment on intertest reliability. Second, we provide general guidelines on the selection and interpretation of ICCs. Third, we illustrate the proposed approach by using an actual research study to assess intertest reliability of fNIRS-based, volumetric diffuse optical tomography of brain activities stimulated by a risk decision-making protocol. Last, special issues that may arise in reliability assessment using ICCs are discussed and solutions are suggested.
Reliability of 3D laser-based anthropometry and comparison with classical anthropometry.
Kuehnapfel, Andreas; Ahnert, Peter; Loeffler, Markus; Broda, Anja; Scholz, Markus
2016-05-26
Anthropometric quantities are widely used in epidemiologic research as possible confounders, risk factors, or outcomes. 3D laser-based body scans (BS) allow evaluation of dozens of quantities in short time with minimal physical contact between observers and probands. The aim of this study was to compare BS with classical manual anthropometric (CA) assessments with respect to feasibility, reliability, and validity. We performed a study on 108 individuals with multiple measurements of BS and CA to estimate intra- and inter-rater reliabilities for both. We suggested BS equivalents of CA measurements and determined validity of BS considering CA the gold standard. Throughout the study, the overall concordance correlation coefficient (OCCC) was chosen as indicator of agreement. BS was slightly more time consuming but better accepted than CA. For CA, OCCCs for intra- and inter-rater reliability were greater than 0.8 for all nine quantities studied. For BS, 9 of 154 quantities showed reliabilities below 0.7. BS proxies for CA measurements showed good agreement (minimum OCCC > 0.77) after offset correction. Thigh length showed higher reliability in BS while upper arm length showed higher reliability in CA. Except for these issues, reliabilities of CA measurements and their BS equivalents were comparable.
López-Pina, José Antonio; Sánchez-Meca, Julio; López-López, José Antonio; Marín-Martínez, Fulgencio; Núñez-Núñez, Rosa Ma; Rosa-Alcázar, Ana I; Gómez-Conesa, Antonia; Ferrer-Requena, Josefa
2015-01-01
The Yale-Brown Obsessive-Compulsive Scale for children and adolescents (CY-BOCS) is a frequently applied test to assess obsessive-compulsive symptoms. We conducted a reliability generalization meta-analysis on the CY-BOCS to estimate the average reliability, search for reliability moderators, and propose a predictive model that researchers and clinicians can use to estimate the expected reliability of the CY-BOCS scores. A total of 47 studies reporting a reliability coefficient with the data at hand were included in the meta-analysis. The results showed good reliability and a large variability associated to the standard deviation of total scores and sample size.
Test Reliability at the Individual Level
Hu, Yueqin; Nesselroade, John R.; Erbacher, Monica K.; Boker, Steven M.; Burt, S. Alexandra; Keel, Pamela K.; Neale, Michael C.; Sisk, Cheryl L.; Klump, Kelly
2016-01-01
Reliability has a long history as one of the key psychometric properties of a test. However, a given test might not measure people equally reliably. Test scores from some individuals may have considerably greater error than others. This study proposed two approaches using intraindividual variation to estimate test reliability for each person. A simulation study suggested that the parallel tests approach and the structural equation modeling approach recovered the simulated reliability coefficients. Then in an empirical study, where forty-five females were measured daily on the Positive and Negative Affect Schedule (PANAS) for 45 consecutive days, separate estimates of reliability were generated for each person. Results showed that reliability estimates of the PANAS varied substantially from person to person. The methods provided in this article apply to tests measuring changeable attributes and require repeated measures across time on each individual. This article also provides a set of parallel forms of PANAS. PMID:28936107
Reliability of infrared thermometric measurements of skin temperature in the hand.
Packham, Tara L; Fok, Diana; Frederiksen, Karen; Thabane, Lehana; Buckley, Norman
2012-01-01
Clinical measurement study. Skin temperature asymmetries (STAs) are used in the diagnosis of complex regional pain syndrome (CRPS), but little evidence exists for reliability of the equipment and methods. This study examined the reliability of an inexpensive infrared (IR) thermometer and measurement points in the hand for the study of STA. ST was measured three times at five points on both hands with an IR thermometer by two raters in 20 volunteers (12 normals and 8 CRPS). ST measurement results using IR thermometers support inter-rater reliability: intraclass correlation coefficient (ICC) estimate for single measures 0.80; all ST measurement points were also highly reliable (ICC single measures, 0.83-0.91). The equipment demonstrated excellent reliability, with little difference in the reliability of the five measurement sites. These preliminary findings support their use in future CRPS research. Not applicable. Copyright © 2012 Hanley & Belfus. Published by Elsevier Inc. All rights reserved.
Wei, Meifen; Russell, Daniel W; Mallinckrodt, Brent; Vogel, David L
2007-04-01
We developed a 12-item, short form of the Experiences in Close Relationship Scale (ECR; Brennan, Clark, & Shaver, 1998) across 6 studies. In Study 1, we examined the reliability and factor structure of the measure. In Studies 2 and 3, we cross-validated the reliability, factor structure, and validity of the short form measure; whereas in Study 4, we examined test-retest reliability over a 1-month period. In Studies 5 and 6, we further assessed the reliability, factor structure, and validity of the short version of the ECR when administered as a stand-alone instrument. Confirmatory factor analyses indicated that 2 factors, labeled Anxiety and Avoidance, provided a good fit to the data after removing the influence of response sets. We found validity to be equivalent for the short and the original versions of the ECR across studies. Finally, the results were comparable when we embedded the short form within the original version of the ECR and when we administered it as a stand-alone measure.
Study of complete interconnect reliability for a GaAs MMIC power amplifier
NASA Astrophysics Data System (ADS)
Lin, Qian; Wu, Haifeng; Chen, Shan-ji; Jia, Guoqing; Jiang, Wei; Chen, Chao
2018-05-01
By combining the finite element analysis (FEA) and artificial neural network (ANN) technique, the complete prediction of interconnect reliability for a monolithic microwave integrated circuit (MMIC) power amplifier (PA) at the both of direct current (DC) and alternating current (AC) operation conditions is achieved effectively in this article. As a example, a MMIC PA is modelled to study the electromigration failure of interconnect. This is the first time to study the interconnect reliability for an MMIC PA at the conditions of DC and AC operation simultaneously. By training the data from FEA, a high accuracy ANN model for PA reliability is constructed. Then, basing on the reliability database which is obtained from the ANN model, it can give important guidance for improving the reliability design for IC.
What to Do With "Moderate" Reliability and Validity Coefficients?
Post, Marcel W
2016-07-01
Clinimetric studies may use criteria for test-retest reliability and convergent validity such that correlation coefficients as low as .40 are supportive of reliability and validity. It can be argued that moderate (.40-.60) correlations should not be interpreted in this way and that reliability coefficients <.70 should be considered as indicative of unreliability. Convergent validity coefficients in the .40 to .60 or .40 to .70 range should be considered as indications of validity problems, or as inconclusive at best. Studies on reliability and convergent should be designed in such a way that it is realistic to expect high reliability and validity coefficients. Multitrait multimethod approaches are preferred to study construct (convergent-divergent) validity. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Rosa-Rizzotto, M; Visonà Dalla Pozza, L; Corlatti, A; Luparia, A; Marchi, A; Molteni, F; Facchin, P; Pagliano, E; Fedrizzi, E
2014-10-01
In hemiplegic children, the recognition of the activity limitation pattern and the possibility of grading its severity are relevant for clinicians while planning interventions, monitoring results, predicting outcomes. Aim of the study is to examine the reliability and validity of Besta Scale, an instrument used to measure in hemiplegic children from 18 months to 12 years of age both grasp on request (capacity) and spontaneous use of upper limb (performance) in bimanual play activities and in ADL. Psychometric analysis of reliability and of validity of the Besta scale was performed. Outpatient study sample Reliability study: A sample of 39 patients was enrolled. The administration of Besta scale was video-recorded in a standardized manner. All videos were scored by 20 independent raters on subsequent viewing. 3 raters randomly selected from the 20-raters group rescored the same video two years later for intra-rater reliability. Intra and inter-rater reliability were calculated using Intraclass Correlation Coefficient (ICC) and Kendall's coefficient (K), respectively. Internal consistency reliability was assessed using Alpha's Chronbach coefficient. Validity study: a sample of 105 children was assessed 5 times (at t0 and 2, 3, 6 and 12 months later) by 20 independent raters. Each patient underwent at the same time to QUEST and Besta scale administration and assessment. Criterion validity was calculated using rho-Pearson coefficient. Reliability study: The inter-rater reliability calculated with Kendall's coefficient resulted moderate K=0.47. The intra-rater (or test-retest) reliability for 3 raters was excellent (ICC=0.927). The Cronbach's alpha for internal consistency was 0.972. Validity study: Besta scale showed a good criterion validity compared to QUEST increasing by age and severity of impairment. Rho Pearson's correlation coefficient r was 0.81 (P<0.0001). Limitations. Besta scales in infants finds hard to distinguish between mild to moderately impaired hand function. Besta scale scoring system is a valid and reliable tool, utilizable in a clinical setting to monitor evolution of unimanual and bimanual manipulation and to distinguish hand's capacity from performance.
ERIC Educational Resources Information Center
Strauss, Gregory P.; Allen, Daniel N.; Jorgensen, Melinda L.; Cramer, Stacey L.
2005-01-01
Previous studies have examined the reliability of scores derived from various Stroop tasks. However, few studies have compared reliability of more recently developed Stroop variants such as emotional Stroop tasks to standard versions of the Stroop. The current study developed four different single-stimulus Stroop tasks and compared test-retest…
Cognitive Decline in Down Syndrome: A Validity/Reliability Study of the Test for Severe Impairment.
ERIC Educational Resources Information Center
Cosgrave, Mary P.; McCarron, Mary; Anderson, Mary; Tyrrell, Janette; Gill, Michael; Lawlor, Brian A.
1998-01-01
The utility of the Test for Severe Impairment was studied with 60 older persons who had Down Syndrome. Construct validity, test-retest reliability, and interrater reliability were established for the full study group and for subgroups based on degree of mental retardation and dementia status. Some possible applications and limitations of the test…
ERIC Educational Resources Information Center
Boonstra, Anne M.; Reneman, Michiel F.; Stewart, Roy E.; Balk, Gerlof A.
2012-01-01
The aim of this study was to determine the reliability and discriminant validity of the Dutch version of the life satisfaction questionnaire (Lisat-9 DV) to assess patients with an acquired brain injury. The reliability study used a test-retest design, and the validity study used a cross-sectional design. The setting was the general rehabilitation…
Validity and Reliability of the Academic Resilience Scale in Turkish High School
ERIC Educational Resources Information Center
Kapikiran, Sahin
2012-01-01
The present study aims to determine the validity and reliability of the academic resilience scale in Turkish high school. The participances of the study includes 378 high school students in total (192 female and 186 male). A set of analyses were conducted in order to determine the validity and reliability of the study. Firstly, both exploratory…
Reliability-based structural optimization: A proposed analytical-experimental study
NASA Technical Reports Server (NTRS)
Stroud, W. Jefferson; Nikolaidis, Efstratios
1993-01-01
An analytical and experimental study for assessing the potential of reliability-based structural optimization is proposed and described. In the study, competing designs obtained by deterministic and reliability-based optimization are compared. The experimental portion of the study is practical because the structure selected is a modular, actively and passively controlled truss that consists of many identical members, and because the competing designs are compared in terms of their dynamic performance and are not destroyed if failure occurs. The analytical portion of this study is illustrated on a 10-bar truss example. In the illustrative example, it is shown that reliability-based optimization can yield a design that is superior to an alternative design obtained by deterministic optimization. These analytical results provide motivation for the proposed study, which is underway.
Test-Retest Reliability of Pediatric Heart Rate Variability: A Meta-Analysis.
Weiner, Oren M; McGrath, Jennifer J
2017-01-01
Heart rate variability (HRV), an established index of autonomic cardiovascular modulation, is associated with health outcomes (e.g., obesity, diabetes) and mortality risk. Time- and frequency-domain HRV measures are commonly reported in longitudinal adult and pediatric studies of health. While test-retest reliability has been established among adults, less is known about the psychometric properties of HRV among infants, children, and adolescents. The objective was to conduct a meta-analysis of the test-retest reliability of time- and frequency-domain HRV measures from infancy to adolescence. Electronic searches (PubMed, PsycINFO; January 1970-December 2014) identified studies with nonclinical samples aged ≤ 18 years; ≥ 2 baseline HRV recordings separated by ≥ 1 day; and sufficient data for effect size computation. Forty-nine studies ( N = 5,170) met inclusion criteria. Methodological variables coded included factors relevant to study protocol, sample characteristics, electrocardiogram (ECG) signal acquisition and preprocessing, and HRV analytical decisions. Fisher's Z was derived as the common effect size. Analyses were age-stratified (infant/toddler < 5 years, n = 3,329; child/adolescent 5-18 years, n = 1,841) due to marked methodological differences across the pediatric literature. Meta-analytic results revealed HRV demonstrated moderate reliability; child/adolescent studies ( Z = 0.62, r = 0.55) had significantly higher reliability than infant/toddler studies ( Z = 0.42, r = 0.40). Relative to other reported measures, HF exhibited the highest reliability among infant/toddler studies ( Z = 0.42, r = 0.40), while rMSSD exhibited the highest reliability among child/adolescent studies ( Z = 1.00, r = 0.76). Moderator analyses indicated greater reliability with shorter test-retest interval length, reported exclusion criteria based on medical illness/condition, lower proportion of males, prerecording acclimatization period, and longer recording duration; differences were noted across age groups. HRV is reliable among pediatric samples. Reliability is sensitive to pertinent methodological decisions that require careful consideration by the researcher. Limited methodological reporting precluded several a priori moderator analyses. Suggestions for future research, including standards specified by Task Force Guidelines, are discussed.
Test-Retest Reliability of Pediatric Heart Rate Variability
Weiner, Oren M.; McGrath, Jennifer J.
2017-01-01
Heart rate variability (HRV), an established index of autonomic cardiovascular modulation, is associated with health outcomes (e.g., obesity, diabetes) and mortality risk. Time- and frequency-domain HRV measures are commonly reported in longitudinal adult and pediatric studies of health. While test-retest reliability has been established among adults, less is known about the psychometric properties of HRV among infants, children, and adolescents. The objective was to conduct a meta-analysis of the test-retest reliability of time- and frequency-domain HRV measures from infancy to adolescence. Electronic searches (PubMed, PsycINFO; January 1970–December 2014) identified studies with nonclinical samples aged ≤ 18 years; ≥ 2 baseline HRV recordings separated by ≥ 1 day; and sufficient data for effect size computation. Forty-nine studies (N = 5,170) met inclusion criteria. Methodological variables coded included factors relevant to study protocol, sample characteristics, electrocardiogram (ECG) signal acquisition and preprocessing, and HRV analytical decisions. Fisher’s Z was derived as the common effect size. Analyses were age-stratified (infant/toddler < 5 years, n = 3,329; child/adolescent 5–18 years, n = 1,841) due to marked methodological differences across the pediatric literature. Meta-analytic results revealed HRV demonstrated moderate reliability; child/adolescent studies (Z = 0.62, r = 0.55) had significantly higher reliability than infant/toddler studies (Z = 0.42, r = 0.40). Relative to other reported measures, HF exhibited the highest reliability among infant/toddler studies (Z = 0.42, r = 0.40), while rMSSD exhibited the highest reliability among child/adolescent studies (Z = 1.00, r = 0.76). Moderator analyses indicated greater reliability with shorter test-retest interval length, reported exclusion criteria based on medical illness/condition, lower proportion of males, prerecording acclimatization period, and longer recording duration; differences were noted across age groups. HRV is reliable among pediatric samples. Reliability is sensitive to pertinent methodological decisions that require careful consideration by the researcher. Limited methodological reporting precluded several a priori moderator analyses. Suggestions for future research, including standards specified by Task Force Guidelines, are discussed. PMID:29307951
Cutolo, Maurizio; Vanhaecke, Amber; Ruaro, Barbara; Deschepper, Ellen; Ickinger, Claudia; Melsens, Karin; Piette, Yves; Trombetta, Amelia Chiara; De Keyser, Filip; Smith, Vanessa
2018-06-06
A reliable tool to evaluate flow is paramount in systemic sclerosis (SSc). We describe herein on the one hand a systematic literature review on the reliability of laser speckle contrast analysis (LASCA) to measure the peripheral blood perfusion (PBP) in SSc and perform an additional pilot study, investigating the intra- and inter-rater reliability of LASCA. A systematic search was performed in 3 electronic databases, according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. In the pilot study, 30 SSc patients and 30 healthy subjects (HS) underwent LASCA assessment. Intra-rater reliability was assessed by having a first anchor rater performing the measurements at 2 time-points and inter-rater reliability by having the anchor rater and a team of second raters performing the measurements in 15 SSc and 30 HS. The measurements were repeated with a second anchor rater in the other 15 SSc patients, as external validation. Only 1 of the 14 records of interest identified through the systematic search was included in the final analysis. In the additional pilot study: intra-class correlation coefficient (ICC) for intra-rater reliability of the first anchor rater was 0.95 in SSc and 0.93 in HS, the ICC for inter-rater reliability was 0.97 in SSc and 0.93 in HS. Intra- and inter-rater reliability of the second anchor rater was 0.78 and 0.87. The identified literature regarding the reliability of LASCA measurements reports good to excellent inter-rater agreement. This very pilot study could confirm the reliability of LASCA measurements with good to excellent inter-rater agreement and found additionally good to excellent intra-rater reliability. Furthermore, similar results were found in the external validation. Copyright © 2018. Published by Elsevier B.V.
Seeking high reliability in primary care: Leadership, tools, and organization.
Weaver, Robert R
2015-01-01
Leaders in health care increasingly recognize that improving health care quality and safety requires developing an organizational culture that fosters high reliability and continuous process improvement. For various reasons, a reliability-seeking culture is lacking in most health care settings. Developing a reliability-seeking culture requires leaders' sustained commitment to reliability principles using key mechanisms to embed those principles widely in the organization. The aim of this study was to examine how key mechanisms used by a primary care practice (PCP) might foster a reliability-seeking, system-oriented organizational culture. A case study approach was used to investigate the PCP's reliability culture. The study examined four cultural artifacts used to embed reliability-seeking principles across the organization: leadership statements, decision support tools, and two organizational processes. To decipher their effects on reliability, the study relied on observations of work patterns and the tools' use, interactions during morning huddles and process improvement meetings, interviews with clinical and office staff, and a "collective mindfulness" questionnaire. The five reliability principles framed the data analysis. Leadership statements articulated principles that oriented the PCP toward a reliability-seeking culture of care. Reliability principles became embedded in the everyday discourse and actions through the use of "problem knowledge coupler" decision support tools and daily "huddles." Practitioners and staff were encouraged to report unexpected events or close calls that arose and which often initiated a formal "process change" used to adjust routines and prevent adverse events from recurring. Activities that foster reliable patient care became part of the taken-for-granted routine at the PCP. The analysis illustrates the role leadership, tools, and organizational processes play in developing and embedding a reliable-seeking culture across an organization. Progress toward a reliability-seeking, system-oriented approach to care remains ongoing, and movement in that direction requires deliberate and sustained effort by committed leaders in health care.
Fuller, Catherine J; Bladon, Bruce M; Driver, Adam J; Barr, Alistair R S
2006-03-01
The objective of this study was to assess the reliability of lameness scoring in horses. One veterinary surgeon examined nineteen lame horses on four occasions. Gait was recorded by camcorder, and scored from 0 to 10 ranging from sound to non-weight bearing lameness. A global score of overall change in lameness during the study was also determined for each horse. To measure intra-assessor reliability of the scoring systems, one veterinary surgeon scored videotapes of the horses' gaits on two occasions. To measure inter-assessor reliability, three veterinary surgeons viewed the videotapes, assigning individual lameness scores plus global scores to each horse. Reliability of individual lameness scoring was good intra-assessor, but only just within our acceptable limit inter-assessor. However, global scoring of change in lameness throughout the study was found to be reliable overall. Since clinician scoring is commonly used to assess lameness in horses, this is an important finding, fundamental to future clinical studies.
Reducing random measurement error in assessing postural load on the back in epidemiologic surveys.
Burdorf, A
1995-02-01
The goal of this study was to design strategies to assess postural load on the back in occupational epidemiology by taking into account the reliability of measurement methods and the variability of exposure among the workers under study. Intermethod reliability studies were evaluated to estimate the systematic bias (accuracy) and random measurement error (precision) of various methods to assess postural load on the back. Intramethod reliability studies were reviewed to estimate random variability of back load over time. Intermethod surveys have shown that questionnaires have a moderate reliability for gross activities such as sitting, whereas duration of trunk flexion and rotation should be assessed by observation methods or inclinometers. Intramethod surveys indicate that exposure variability can markedly affect the reliability of estimates of back load if the estimates are based upon a single measurement over a certain time period. Equations have been presented to evaluate various study designs according to the reliability of the measurement method, the optimum allocation of the number of repeated measurements per subject, and the number of subjects in the study. Prior to a large epidemiologic study, an exposure-oriented survey should be conducted to evaluate the performance of measurement instruments and to estimate sources of variability for back load. The strategy for assessing back load can be optimized by balancing the number of workers under study and the number of repeated measurements per worker.
Lee, Chin-Pang; Chiu, Yu-Wen; Chu, Chun-Lin; Chen, Yu; Jiang, Kun-Hao; Chen, Jiun-Liang; Chen, Ching-Yen
2016-12-01
The aging males' symptoms (AMS) scale is an instrument used to determine the health-related quality of life in adult and elderly men. The purpose of this study was to synthesize internal consistency (Cronbach's alpha) and test-retest reliability for the AMS scale and its three subscales. Of the 123 studies reviewed, 12 provided alpha coefficients which were then used in the meta-analyses of internal consistency. Seven of the 12 included studies provided test-retest coefficients, and these were used in the meta-analyses of test-retest reliability. The AMS scale had excellent internal consistency [α = 0.89 (95% CI 0.88-0.90)]; the mean alpha estimates across the AMS subscales ranged from 0.79 to 0.82. The AMS scale also had good test-retest reliability [r = 0.85 (95% CI 0.82-0.88]; the test-retest reliability coefficients of the AMS subscales ranged from 0.76 to 0.83. There was significant heterogeneity among the included studies. The AMS scale and the three subscales had fairly good internal consistency and test-retest reliability. Future psychometric studies of the AMS scale should report important characteristics of the participants, details of item scores, and test-retest reliability.
Mai, Zhi-Ming; Lin, Jia-Huang; Chiang, Shing-Chun; Ngan, Roger Kai-Cheong; Kwong, Dora Lai-Wan; Ng, Wai-Tong; Ng, Alice Wan-Ying; Yuen, Kam-Tong; Ip, Kai-Ming; Chan, Yap-Hang; Lee, Anne Wing-Mui; Ho, Sai-Yin; Lung, Maria Li; Lam, Tai-Hing
2018-05-04
We evaluated the reliability of early life nasopharyngeal carcinoma (NPC) aetiology factors in the questionnaire of an NPC case-control study in Hong Kong during 2014-2017. 140 subjects aged 18+ completed the same computer-assisted questionnaire twice, separated by at least 2 weeks. The questionnaire included most known NPC aetiology factors and the present analysis focused on early life exposure. Test-retest reliability of all the 285 questionnaire items was assessed in all subjects and in 5 subgroups defined by cases/controls, sex, time between 1 st and 2 nd questionnaire (2-29/≥30 weeks), education (secondary or less/postsecondary), and age (25-44/45-59/60+ years) at the first questionnaire. The reliability of items on dietary habits, body figure, skin tone and sun exposure in early life periods (age 6-12 and 13-18) was moderate-to-almost perfect, and most other items had fair-to-substantial reliability in all life periods (age 6-12, 13-18 and 19-30, and 10 years ago). Differences in reliability by strata of the 5 subgroups were only observed in a few items. This study is the first to report the reliability of an NPC questionnaire, and make the questionnaire available online. Overall, our questionnaire had acceptable reliability, suggesting that previous NPC study results on the same risk factors would have similar reliability.
How reliable are Functional Movement Screening scores? A systematic review of rater reliability.
Moran, Robert W; Schneiders, Anthony G; Major, Katherine M; Sullivan, S John
2016-05-01
Several physical assessment protocols to identify intrinsic risk factors for injury aetiology related to movement quality have been described. The Functional Movement Screen (FMS) is a standardised, field-expedient test battery intended to assess movement quality and has been used clinically in preparticipation screening and in sports injury research. To critically appraise and summarise research investigating the reliability of scores obtained using the FMS battery. Systematic literature review. Systematic search of Google Scholar, Scopus (including ScienceDirect and PubMed), EBSCO (including Academic Search Complete, AMED, CINAHL, Health Source: Nursing/Academic Edition), MEDLINE and SPORTDiscus. Studies meeting eligibility criteria were assessed by 2 reviewers for risk of bias using the Quality Appraisal of Reliability Studies checklist. Overall quality of evidence was determined using van Tulder's levels of evidence approach. 12 studies were appraised. Overall, there was a 'moderate' level of evidence in favour of 'acceptable' (intraclass correlation coefficient ≥0.6) inter-rater and intra-rater reliability for composite scores derived from live scoring. For inter-rater reliability of composite scores derived from video recordings there was 'conflicting' evidence, and 'limited' evidence for intra-rater reliability. For inter-rater reliability based on live scoring of individual subtests there was 'moderate' evidence of 'acceptable' reliability (κ≥0.4) for 4 subtests (Deep Squat, Shoulder Mobility, Active Straight-leg Raise, Trunk Stability Push-up) and 'conflicting' evidence for the remaining 3 (Hurdle Step, In-line Lunge, Rotary Stability). This review found 'moderate' evidence that raters can achieve acceptable levels of inter-rater and intra-rater reliability of composite FMS scores when using live ratings. Overall, there were few high-quality studies, and the quality of several studies was impacted by poor study reporting particularly in relation to rater blinding. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
A study on reliability of power customer in distribution network
NASA Astrophysics Data System (ADS)
Liu, Liyuan; Ouyang, Sen; Chen, Danling; Ma, Shaohua; Wang, Xin
2017-05-01
The existing power supply reliability index system is oriented to power system without considering actual electricity availability in customer side. In addition, it is unable to reflect outage or customer’s equipment shutdown caused by instantaneous interruption and power quality problem. This paper thus makes a systematic study on reliability of power customer. By comparing with power supply reliability, reliability of power customer is defined and extracted its evaluation requirements. An indexes system, consisting of seven customer indexes and two contrast indexes, are designed to describe reliability of power customer from continuity and availability. In order to comprehensively and quantitatively evaluate reliability of power customer in distribution networks, reliability evaluation method is proposed based on improved entropy method and the punishment weighting principle. Practical application has proved that reliability index system and evaluation method for power customer is reasonable and effective.
Score Reliability of Adolescent Alcohol Screening Measures: A Meta-Analytic Inquiry
ERIC Educational Resources Information Center
Shields, Alan L.; Campfield, Delia C.; Miller, Christopher S.; Howell, Ryan T.; Wallace, Kimberly; Weiss, Roger D.
2008-01-01
This study describes the reliability reporting practices in empirical studies using eight adolescent alcohol screening tools and characterizes and explores variability in internal consistency estimates across samples. Of 119 observed administrations of these instruments, 40 (34%) reported usable reliability information. The Personal Experience…
Retest Reliability of the Rosenzweig Picture-Frustration Study and Similar Semiprojective Techniques
ERIC Educational Resources Information Center
Rosenzweig, Saul; And Others
1975-01-01
The research dealing with the reliability of the Rosenzweig Picture-Frustration Study is surveyed. Analysis of various split-half, and retest procedures are reviewed and their relative effectiveness evaluated. Reliability measures as applied to projective techniques in general are discussed. (Author/DEP)
The 747 primary flight control systems reliability and maintenance study
NASA Technical Reports Server (NTRS)
1979-01-01
The major operational characteristics of the 747 Primary Flight Control Systems (PFCS) are described. Results of reliability analysis for separate control functions are presented. The analysis makes use of a NASA computer program which calculates reliability of redundant systems. Costs for maintaining the 747 PFCS in airline service are assessed. The reliabilities and cost will provide a baseline for use in trade studies of future flight control system design.
A Cross-Cultural Study of the Reliability of the Coopersmith Self-Esteem Inventory.
ERIC Educational Resources Information Center
Diaz, Joseph O.
1984-01-01
The purpose of this study was to determine the reliability of the Spanish translation of the Coopersmith Self-Esteem Inventory with a group of Puerto Rican students on the island and another on the mainlands. It was found to be reliable for both groups. (Author/BW)
Federal Register 2010, 2011, 2012, 2013, 2014
2013-01-28
...: The Sacramento River Water Reliability Study (SRWRS) was a water supply plan consistent with the Water... supplies to meet growing water supply demands and reliability objectives in their respective service areas.../Environmental Impact Report on the Sacramento River Water Reliability Study, California AGENCY: Bureau of...
Wong, M T P; Ho, T P; Ho, M Y; Yu, C S; Wong, Y H; Lee, S Y
2002-05-01
The Geriatric Depression Scale (GDS) is a common screening tool for elderly depression in Hong Kong. This study aimed at (1) developing a standardized manual for the verbal administration and scoring of the GDS-SF, and (2) comparing the inter-rater reliability between the standardized and non-standardized verbal administration of GDS-SF. Two studies were reported. In Study 1, the process of developing the manual was described. In Study 2, we compared the inter-rater reliabilities of GDS-SF scores using the standardized verbal instructions and the traditional non-standardized administration. Results of Study 2 indicated that the standardized procedure in verbal administration and scoring improved the inter-rater reliabilities of GDS-SF. Copyright 2002 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rogue, F.; Binnall, E.P.
1982-10-01
Reliable instrumentation will be needed to monitor the performance of future high-level waste repository sites. A study has been made to assess instrument reliability at Department of Energy (DOE) waste repository related experiments. Though the study covers a wide variety of instrumentation, this paper concentrates on experiences with geotechnical instrumentation in hostile repository-type environments. Manufacturers have made some changes to improve the reliability of instruments for repositories. This paper reviews the failure modes, rates, and mechanisms, along with manufacturer modifications and recommendations for additional improvements to enhance instrument performance. 4 tables.
INFLUENCES OF RESPONSE RATE AND DISTRIBUTION ON THE CALCULATION OF INTEROBSERVER RELIABILITY SCORES
Rolider, Natalie U.; Iwata, Brian A.; Bullock, Christopher E.
2012-01-01
We examined the effects of several variations in response rate on the calculation of total, interval, exact-agreement, and proportional reliability indices. Trained observers recorded computer-generated data that appeared on a computer screen. In Study 1, target responses occurred at low, moderate, and high rates during separate sessions so that reliability results based on the four calculations could be compared across a range of values. Total reliability was uniformly high, interval reliability was spuriously high for high-rate responding, proportional reliability was somewhat lower for high-rate responding, and exact-agreement reliability was the lowest of the measures, especially for high-rate responding. In Study 2, we examined the separate effects of response rate per se, bursting, and end-of-interval responding. Response rate and bursting had little effect on reliability scores; however, the distribution of some responses at the end of intervals decreased interval reliability somewhat, proportional reliability noticeably, and exact-agreement reliability markedly. PMID:23322930
Lange, Toni; Struyf, Filip; Schmitt, Jochen; Lützner, Jörg; Kopkow, Christian
2017-07-01
Systematic review. The aim of this systematic review was to summarize and evaluate intra- and interrater reliability research of physical examination tests used for the assessment of scapular dyskinesis. Scapular dyskinesis, defined as alteration of normal scapular kinematics, is described as a non-specific response to different shoulder pathologies. A systematic literature search was conducted in MEDLINE, EMBASE, AMED and PEDro until March 20th, 2015. Methodological quality was assessed with the Quality Appraisal of Reliability Studies (QAREL) by two independent reviewers. The search strategy revealed 3259 articles, of which 15 met the inclusion criteria. These studies evaluated the reliability of 41 test and test variations used for the assessment of scapular dyskinesis. This review identified a lack of high-quality studies evaluating intra- as well as interrater reliability of tests used for the assessment of scapular dyskinesis. In addition, reliability measures differed between included studies hindering proper cross-study comparisons. The effect of manual correction of the scapula on shoulder symptoms was evaluated in only one study, which is striking, since symptom alteration tests are used in routine care to guide further treatment. Thus, there is a strong need for further research in this area. Diagnosis, level 3a. Copyright © 2016. Published by Elsevier Ltd.
Richler, Jennifer J.; Floyd, R. Jackie; Gauthier, Isabel
2014-01-01
Efforts to understand individual differences in high-level vision necessitate the development of measures that have sufficient reliability, which is generally not a concern in group studies. Holistic processing is central to research on face recognition and, more recently, to the study of individual differences in this area. However, recent work has shown that the most popular measure of holistic processing, the composite task, has low reliability. This is particularly problematic for the recent surge in interest in studying individual differences in face recognition. Here, we developed and validated a new measure of holistic face processing specifically for use in individual-differences studies. It avoids some of the pitfalls of the standard composite design and capitalizes on the idea that trial variability allows for better traction on reliability. Across four experiments, we refine this test and demonstrate its reliability. PMID:25228629
How reliable are clinical systems in the UK NHS? A study of seven NHS organisations
Franklin, Bryony Dean; Moorthy, Krishna; Cooke, Matthew W; Vincent, Charles
2012-01-01
Background It is well known that many healthcare systems have poor reliability; however, the size and pervasiveness of this problem and its impact has not been systematically established in the UK. The authors studied four clinical systems: clinical information in surgical outpatient clinics, prescribing for hospital inpatients, equipment in theatres, and insertion of peripheral intravenous lines. The aim was to describe the nature, extent and variation in reliability of these four systems in a sample of UK hospitals, and to explore the reasons for poor reliability. Methods Seven UK hospital organisations were involved; each system was studied in three of these. The authors took delivery of the systems' intended outputs to be a proxy for the reliability of the system as a whole. For example, for clinical information, 100% reliability was defined as all patients having an agreed list of clinical information available when needed during their appointment. Systems factors were explored using semi-structured interviews with key informants. Common themes across the systems were identified. Results Overall reliability was found to be between 81% and 87% for the systems studied, with significant variation between organisations for some systems: clinical information in outpatient clinics ranged from 73% to 96%; prescribing for hospital inpatients 82–88%; equipment availability in theatres 63–88%; and availability of equipment for insertion of peripheral intravenous lines 80–88%. One in five reliability failures were associated with perceived threats to patient safety. Common factors causing poor reliability included lack of feedback, lack of standardisation, and issues such as access to information out of working hours. Conclusions Reported reliability was low for the four systems studied, with some common factors behind each. However, this hides significant variation between organisations for some processes, suggesting that some organisations have managed to create more reliable systems. Standardisation of processes would be expected to have significant benefit. PMID:22495099
Examination of Anomalous World Experience: A Report on Reliability.
Conerty, Joseph; Skodlar, Borut; Pienkos, Elizabeth; Zadravek, Tina; Byrom, Greg; Sass, Louis
2017-01-01
The EAWE (Examination of Anomalous World Experience) is a newly developed, semi-structured interview that aims to capture anomalies of subjectivity, common in schizophrenia spectrum disorders, that pertain to experiences of the lived world, including space, time, people, language, atmosphere, and certain existential attitudes. By contrast, previous empirical studies of subjective experience in schizophrenia have focused largely on disturbances in self-experience. To assess the reliability of the EAWE, including internal consistency and interrater reliability. In the course of developing the EAWE, two distinct studies were conducted, one in the United States and the other in Slovenia. Thirteen patients diagnosed with schizophrenia spectrum or mood disorders were recruited for the US study. Fifteen such patients were recruited for the Slovenian study. Two live interviewers conducted the EAWE in the US. The Slovenian interviews were completed by one live interviewer with a second rater reviewing audiorecordings of the interview. Internal consistency and interrater reliability were calculated independently for each study, utilizing Cronbach's α, Spearman's ρ, and Cohen's κ. Each study yielded high internal consistency (Cronbach's α >0.82) and high interrater reliability for total EAWE scores (ρ > 0.83; average κ values were at least 0.78 for each study, with EAWE domain-specific κ not lower than 0.73). The EAWE, containing world-oriented inquiries into anomalies in subjective experience, has adequate reliability for use in a clinical or research setting. © 2017 S. Karger AG, Basel.
Feijen, Stef; Kuppens, Kevin; Tate, Angela; Baert, Isabel; Struyf, Thomas; Struyf, Filip
2018-04-17
Measuring thoracic spine mobility can be of interest to competitive swimmers as it has been associated with shoulder girdle function and scapular position in subjects with and without shoulder pain. At present, no reliability data of thoracic spine mobility measurements are available in the swimming population. This study aims to evaluate the within-session intra- and interrater reliability of the "lumbar-locked rotation test" for thoracic spine rotation in competitive swimmers aged 10 to 18 years. This reliability study is part of a larger prospective cohort study investigating potential risk factors for the development of shoulder pain in competitive swimmers. Within-session, intra- and inter-rater reliability. Competitive swimming clubs in Belgium. 21 competitive swimmers. Intra- and inter-rater reliability of the lumbar-locked thoracic rotation test. Intraclass correlation coefficients (ICCs) ranged from 0.91 (95% CI 0.78 to 0.96) to 0.96 (0.89-0.98) for intra-rater reliability. Results for inter-rater reliability ranged from 0.89 (0.72-0.95) to 0.86 (0.65-0.94) respectively for right and left thoracic rotation. Results suggest good to excellent reliability of the lumbar-locked thoracic rotation test, indicating this test can be used reliably in clinical practice. Copyright © 2018 Elsevier Ltd. All rights reserved.
Interformat reliability of digital psychiatric self-report questionnaires: a systematic review.
Alfonsson, Sven; Maathz, Pernilla; Hursti, Timo
2014-12-03
Research on Internet-based interventions typically use digital versions of pen and paper self-report symptom scales. However, adaptation into the digital format could affect the psychometric properties of established self-report scales. Several studies have investigated differences between digital and pen and paper versions of instruments, but no systematic review of the results has yet been done. This review aims to assess the interformat reliability of self-report symptom scales used in digital or online psychotherapy research. Three databases (MEDLINE, Embase, and PsycINFO) were systematically reviewed for studies investigating the reliability between digital and pen and paper versions of psychiatric symptom scales. From a total of 1504 publications, 33 were included in the review, and interformat reliability of 40 different symptom scales was assessed. Significant differences in mean total scores between formats were found in 10 of 62 analyses. These differences were found in just a few studies, which indicates that the results were due to study effects and sample effects rather than unreliable instruments. The interformat reliability ranged from r=.35 to r=.99; however, the majority of instruments showed a strong correlation between format scores. The quality of the included studies varied, and several studies had insufficient power to detect small differences between formats. When digital versions of self-report symptom scales are compared to pen and paper versions, most scales show high interformat reliability. This supports the reliability of results obtained in psychotherapy research on the Internet and the comparability of the results to traditional psychotherapy research. There are, however, some instruments that consistently show low interformat reliability, suggesting that these conclusions cannot be generalized to all questionnaires. Most studies had at least some methodological issues with insufficient statistical power being the most common issue. Future studies should preferably provide information about the transformation of the instrument into digital format and the procedure for data collection in more detail.
DiCesare, Christopher A.; Bates, Nathaniel A.; Barber Foss, Kim D.; Thomas, Staci M.; Wordeman, Samuel C.; Sugimoto, Dai; Roewer, Benjamin D.; Medina McKeon, Jennifer M.; Di Stasi, Stephanie; Noehren, Brian W.; Ford, Kevin R.; Kiefer, Adam W.; Hewett, Timothy E.; Myer, Gregory D.
2015-01-01
Background: Anterior cruciate ligament (ACL) injuries are physically and financially devastating but affect a relatively small percentage of the population. Prospective identification of risk factors for ACL injury necessitates a large sample size; therefore, study of this injury would benefit from a multicenter approach. Purpose: To determine the reliability of kinematic and kinetic measures of a single-leg cross drop task across 3 institutions. Study Design: Controlled laboratory study. Methods: Twenty-five female high school volleyball players participated in this study. Three-dimensional motion data of each participant performing the single-leg cross drop were collected at 3 institutions over a period of 4 weeks. Coefficients of multiple correlation were calculated to assess the reliability of kinematic and kinetic measures during the landing phase of the movement. Results: Between-centers reliability for kinematic waveforms in the frontal and sagittal planes was good, but moderate in the transverse plane. Between-centers reliability for kinetic waveforms was good in the sagittal, frontal, and transverse planes. Conclusion: Based on these findings, the single-leg cross drop task has moderate to good reliability of kinematic and kinetic measures across institutions after implementation of a standardized testing protocol. Clinical Relevance: Multicenter collaborations can increase study numbers and generalize results, which is beneficial for studies of relatively rare phenomena, such as ACL injury. An important step is to determine the reliability of risk assessments across institutions before a multicenter collaboration can be initiated. PMID:26779550
Theory of reliable systems. [systems analysis and design
NASA Technical Reports Server (NTRS)
Meyer, J. F.
1973-01-01
The analysis and design of reliable systems are discussed. The attributes of system reliability studied are fault tolerance, diagnosability, and reconfigurability. Objectives of the study include: to determine properties of system structure that are conducive to a particular attribute; to determine methods for obtaining reliable realizations of a given system; and to determine how properties of system behavior relate to the complexity of fault tolerant realizations. A list of 34 references is included.
A Monte Carlo Simulation Study of the Reliability of Intraindividual Variability
Estabrook, Ryne; Grimm, Kevin J.; Bowles, Ryan P.
2012-01-01
Recent research has seen intraindividual variability (IIV) become a useful technique to incorporate trial-to-trial variability into many types of psychological studies. IIV as measured by individual standard deviations (ISDs) has shown unique prediction to several types of positive and negative outcomes (Ram, Rabbit, Stollery, & Nesselroade, 2005). One unanswered question regarding measuring intraindividual variability is its reliability and the conditions under which optimal reliability is achieved. Monte Carlo simulation studies were conducted to determine the reliability of the ISD compared to the intraindividual mean. The results indicate that ISDs generally have poor reliability and are sensitive to insufficient measurement occasions, poor test reliability, and unfavorable amounts and distributions of variability in the population. Secondary analysis of psychological data shows that use of individual standard deviations in unfavorable conditions leads to a marked reduction in statistical power, although careful adherence to underlying statistical assumptions allows their use as a basic research tool. PMID:22268793
Reliability reporting across studies using the Buss Durkee Hostility Inventory.
Vassar, Matt; Hale, William
2009-01-01
Empirical research on anger and hostility has pervaded the academic literature for more than 50 years. Accurate measurement of anger/hostility and subsequent interpretation of results requires that the instruments yield strong psychometric properties. For consistent measurement, reliability estimates must be calculated with each administration, because changes in sample characteristics may alter the scale's ability to generate reliable scores. Therefore, the present study was designed to address reliability reporting practices for a widely used anger assessment, the Buss Durkee Hostility Inventory (BDHI). Of the 250 published articles reviewed, 11.2% calculated and presented reliability estimates for the data at hand, 6.8% cited estimates from a previous study, and 77.1% made no mention of score reliability. Mean alpha estimates of scores for BDHI subscales generally fell below acceptable standards. Additionally, no detectable pattern was found between reporting practices and publication year or journal prestige. Areas for future research are also discussed.
Psychometrics Matter in Health Behavior: A Long-term Reliability Generalization Study.
Pickett, Andrew C; Valdez, Danny; Barry, Adam E
2017-09-01
Despite numerous calls for increased understanding and reporting of reliability estimates, social science research, including the field of health behavior, has been slow to respond and adopt such practices. Therefore, we offer a brief overview of reliability and common reporting errors; we then perform analyses to examine and demonstrate the variability of reliability estimates by sample and over time. Using meta-analytic reliability generalization, we examined the variability of coefficient alpha scores for a well-designed, consistent, nationwide health study, covering a span of nearly 40 years. For each year and sample, reliability varied. Furthermore, reliability was predicted by a sample characteristic that differed among age groups within each administration. We demonstrated that reliability is influenced by the methods and individuals from which a given sample is drawn. Our work echoes previous calls that psychometric properties, particularly reliability of scores, are important and must be considered and reported before drawing statistical conclusions.
Comprehensive Design Reliability Activities for Aerospace Propulsion Systems
NASA Technical Reports Server (NTRS)
Christenson, R. L.; Whitley, M. R.; Knight, K. C.
2000-01-01
This technical publication describes the methodology, model, software tool, input data, and analysis result that support aerospace design reliability studies. The focus of these activities is on propulsion systems mechanical design reliability. The goal of these activities is to support design from a reliability perspective. Paralleling performance analyses in schedule and method, this requires the proper use of metrics in a validated reliability model useful for design, sensitivity, and trade studies. Design reliability analysis in this view is one of several critical design functions. A design reliability method is detailed and two example analyses are provided-one qualitative and the other quantitative. The use of aerospace and commercial data sources for quantification is discussed and sources listed. A tool that was developed to support both types of analyses is presented. Finally, special topics discussed include the development of design criteria, issues of reliability quantification, quality control, and reliability verification.
Romli, Muhammad Hibatullah; Mackenzie, Lynette; Lovarini, Meryl; Tan, Maw Pin; Clemson, Lindy
2017-06-01
Falls can be a devastating issue for older people living in the community, including those living in Malaysia. Health professionals and community members have a responsibility to ensure that older people have a safe home environment to reduce the risk of falls. Using a standardised screening tool is beneficial to intervene early with this group. The Home Falls and Accidents Screening Tool (HOME FAST) should be considered for this purpose; however, its use in Malaysia has not been studied. Therefore, the aim of this study was to evaluate the interrater and test-retest reliability of the HOME FAST with multiple professionals in the Malaysian context. A cross-sectional design was used to evaluate interrater reliability where the HOME FAST was used simultaneously in the homes of older people by 2 raters and a prospective design was used to evaluate test-retest reliability with a separate group of older people at different times in their homes. Both studies took place in an urban area of Kuala Lumpur. Professionals from 9 professional backgrounds participated as raters in this study, and a group of 51 community older people were recruited for the interrater reliability study and another group of 30 for the test-retest reliability study. The overall agreement was moderate for interrater reliability and good for test-retest reliability. The HOME FAST was consistently rated by different professionals, and no bias was found among the multiple raters. The HOME FAST can be used with confidence by a variety of professionals across different settings. The HOME FAST can become a universal tool to screen for home hazards related to falls. © 2017 John Wiley & Sons, Ltd.
Empirical Recommendations for Improving the Stability of the Dot-Probe Task in Clinical Research
Price, Rebecca B.; Kuckertz, Jennie M.; Siegle, Greg J.; Ladouceur, Cecile D.; Silk, Jennifer S.; Ryan, Neal D.; Dahl, Ronald E.; Amir, Nader
2014-01-01
The dot-probe task has been widely used in research to produce an index of biased attention based on reaction times (RTs). Despite its popularity, very few published studies have examined psychometric properties of the task, including test-retest reliability, and no previous study has examined reliability in clinically anxious samples or systematically explored the effects of task design and analysis decisions on reliability. In the current analysis, we utilized dot-probe data from three studies where attention bias towards threat-related faces was assessed at multiple (≥5) timepoints. Two of the studies were similar (adults with Social Anxiety Disorder, similar design features) while one was much more disparate (pediatric healthy volunteers, distinct task design). We explored the effects of analysis choices (e.g., bias score calculation formula, methods for outlier handling) on reliability and searched for convergence of findings across the three studies. We found that, when considering the three studies concurrently, the most reliable RT bias index utilized data from dot-bottom trials, comparing congruent to incongruent trials, with rescaled outliers, particularly after averaging across more than one assessment point. Although reliability of RT bias indices was moderate to low under most circumstances, within-session variability in bias (attention bias variability; ABV), a recently proposed RT index, was more reliable across sessions. Several eyetracking-based indices of attention bias (available in the pediatric healthy sample only) showed reliability that matched the optimal RT index (ABV). On the basis of these findings, we make specific recommendations to researchers using the dot probe, particularly those wishing to investigate individual differences and/or single-patient applications. PMID:25419646
Ustün, B; Compton, W; Mager, D; Babor, T; Baiyewu, O; Chatterji, S; Cottler, L; Göğüş, A; Mavreas, V; Peters, L; Pull, C; Saunders, J; Smeets, R; Stipec, M R; Vrasti, R; Hasin, D; Room, R; Van den Brink, W; Regier, D; Blaine, J; Grant, B F; Sartorius, N
1997-09-25
The WHO Study on the reliability and validity of the alcohol and drug use disorder instruments in an international study which has taken place in centres in ten countries, aiming to test the reliability and validity of three diagnostic instruments for alcohol and drug use disorders: the Composite International Diagnostic Interview (CIDI), the Schedules for Clinical Assessment in Neuropsychiatry (SCAN) and a special version of the Alcohol Use Disorder and Associated Disabilities Interview schedule-alcohol/drug-revised (AUDADIS-ADR). The purpose of the reliability and validity (R&V) study is to further develop the alcohol and drug sections of these instruments so that a range of substance-related diagnoses can be made in a systematic, consistent, and reliable way. The study focuses on new criteria proposed in the tenth revision of the International Classification of Diseases (ICD-10) and the fourth revision of the diagnostic and statistical manual of mental disorders (DSM-IV) for dependence, harmful use and abuse categories for alcohol and psychoactive substance use disorders. A systematic study including a scientifically rigorous measure of reliability (i.e. 1 week test-retest reliability) and validity (i.e. comparison between clinical and non-clinical measures) has been undertaken. Results have yielded useful information on reliability and validity of these instruments at diagnosis, criteria and question level. Overall the diagnostic concordance coefficients (kappa, kappa) were very good for dependence disorders (0.7-0.9), but were somewhat lower for the abuse and harmful use categories. The comparisons among instruments and independent clinical evaluations and debriefing interviews gave important information about possible sources of unreliability, and provided useful clues on the applicability and consistency of nosological concepts across cultures.
Critically re-evaluating a common technique: Accuracy, reliability, and confirmation bias of EMG.
Narayanaswami, Pushpa; Geisbush, Thomas; Jones, Lyell; Weiss, Michael; Mozaffar, Tahseen; Gronseth, Gary; Rutkove, Seward B
2016-01-19
(1) To assess the diagnostic accuracy of EMG in radiculopathy. (2) To evaluate the intrarater reliability and interrater reliability of EMG in radiculopathy. (3) To assess the presence of confirmation bias in EMG. Three experienced academic electromyographers interpreted 3 compact discs with 20 EMG videos (10 normal, 10 radiculopathy) in a blinded, standardized fashion without information regarding the nature of the study. The EMGs were interpreted 3 times (discs A, B, C) 1 month apart. Clinical information was provided only with disc C. Intrarater reliability was calculated by comparing interpretations in discs A and B, interrater reliability by comparing interpretation between reviewers. Confirmation bias was estimated by the difference in correct interpretations when clinical information was provided. Sensitivity was similar to previous reports (77%, confidence interval [CI] 63%-90%); specificity was 71%, CI 56%-85%. Intrarater reliability was good (κ 0.61, 95% CI 0.41-0.81); interrater reliability was lower (κ 0.53, CI 0.35-0.71). There was no substantial confirmation bias when clinical information was provided (absolute difference in correct responses 2.2%, CI -13.3% to 17.7%); the study lacked precision to exclude moderate confirmation bias. This study supports that (1) serial EMG studies should be performed by the same electromyographer since intrarater reliability is better than interrater reliability; (2) knowledge of clinical information does not bias EMG interpretation substantially; (3) EMG has moderate diagnostic accuracy for radiculopathy with modest specificity and electromyographers should exercise caution interpreting mild abnormalities. This study provides Class III evidence that EMG has moderate diagnostic accuracy and specificity for radiculopathy. © 2015 American Academy of Neurology.
The reliability of knee joint position testing using electrogoniometry
Piriyaprasarth, Pagamas; Morris, Meg E; Winter, Adele; Bialocerkowski, Andrea E
2008-01-01
Background The current investigation examined the inter- and intra-tester reliability of knee joint angle measurements using a flexible Penny and Giles Biometric® electrogoniometer. The clinical utility of electrogoniometry was also addressed. Methods The first study examined the inter- and intra-tester reliability of measurements of knee joint angles in supine, sitting and standing in 35 healthy adults. The second study evaluated inter-tester and intra-tester reliability of knee joint angle measurements in standing and after walking 10 metres in 20 healthy adults, using an enhanced measurement protocol with a more detailed electrogoniometer attachment procedure. Both inter-tester reliability studies involved two testers. Results In the first study, inter-tester reliability (ICC[2,10]) ranged from 0.58–0.71 in supine, 0.68–0.79 in sitting and 0.57–0.80 in standing. The standard error of measurement between testers was less than 3.55° and the limits of agreement ranged from -12.51° to 12.21°. Reliability coefficients for intra-tester reliability (ICC[3,10]) ranged from 0.75–0.76 in supine, 0.86–0.87 in sitting and 0.87–0.88 in standing. The standard error of measurement for repeated measures by the same tester was less than 1.7° and the limits of agreement ranged from -8.13° to 7.90°. The second study showed that using a more detailed electrogoniometer attachment protocol reduced the error of measurement between testers to 0.5°. Conclusion Using a standardised protocol, reliable measures of knee joint angles can be gained in standing, supine and sitting by using a flexible goniometer. PMID:18211714
Wang, X; Jiao, Y; Tang, T; Wang, H; Lu, Z
2013-12-19
Intrinsic connectivity networks (ICNs) are composed of spatial components and time courses. The spatial components of ICNs were discovered with moderate-to-high reliability. So far as we know, few studies focused on the reliability of the temporal patterns for ICNs based their individual time courses. The goals of this study were twofold: to investigate the test-retest reliability of temporal patterns for ICNs, and to analyze these informative univariate metrics. Additionally, a correlation analysis was performed to enhance interpretability. Our study included three datasets: (a) short- and long-term scans, (b) multi-band echo-planar imaging (mEPI), and (c) eyes open or closed. Using dual regression, we obtained the time courses of ICNs for each subject. To produce temporal patterns for ICNs, we applied two categories of univariate metrics: network-wise complexity and network-wise low-frequency oscillation. Furthermore, we validated the test-retest reliability for each metric. The network-wise temporal patterns for most ICNs (especially for default mode network, DMN) exhibited moderate-to-high reliability and reproducibility under different scan conditions. Network-wise complexity for DMN exhibited fair reliability (ICC<0.5) based on eyes-closed sessions. Specially, our results supported that mEPI could be a useful method with high reliability and reproducibility. In addition, these temporal patterns were with physiological meanings, and certain temporal patterns were correlated to the node strength of the corresponding ICN. Overall, network-wise temporal patterns of ICNs were reliable and informative and could be complementary to spatial patterns of ICNs for further study. Copyright © 2013 IBRO. Published by Elsevier Ltd. All rights reserved.
Study on the Validity and Reliability of Melbourne Decision Making Scale in Turkey
ERIC Educational Resources Information Center
Çolakkadioglu, Oguzhan; Deniz, M. Engin
2015-01-01
This study is to analyze the validity and reliability of Melbourne Decision Making Questionnaire (MDMQ). The sample consisted of 650 university students. The structural validity of the MDMQ, as well as correlations among its sub-scales, measure-bound validity, internal consistency, item total correlations and test-retest reliability coefficients…
Reliability and Validity Tests of Singelis's Self-Construal Scale (1994).
ERIC Educational Resources Information Center
Wang, Qi
Two studies focused on the reliability and validity of T.M. Singelis's 24-item Self-Construal Scale (SCS) (1994). In the first study, Cronbach alphas were calculated to assess the internal consistency of the reliability of the two subscales that were supposed to measure individuals' independent and interdependent self construals. The sample was…
Reliability of reports of childhood trauma in bipolar disorder: A test-retest study over 18 months.
Shannon, Ciaran; Hanna, Donncha; Tumelty, Leo; Waldron, Daniel; Maguire, Chrissie; Mowlds, William; Meenagh, Ciaran; Mulholland, Ciaran
2016-01-01
This study aimed to explore the reliability of self-reported trauma histories in a population with a diagnosis of bipolar disorder using the Childhood Trauma Questionnaire. Previous studies in other populations suggest high reliability of trauma histories over time, and it was postulated that a similar high reliability would be demonstrated in this population. A total of 39 patients with a confirmed diagnosis (Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, criteria) were followed up and readministered the Childhood Trauma Questionnaire after 18 months. Cohen's kappa scores and intraclass correlations suggested reasonable test-retest reliability over the 18-month time period of the study for all types of childhood abuse, namely, emotional, physical, and sexual abuse and physical and emotional neglect. Intraclass correlations ranged from r = .50 (sexual abuse) to r = .96 (physical abuse). Cohen's kappas ranged from .44 (sexual abuse) to .76 (physical abuse). Retrospective reports of childhood trauma can be seen as reliable and are in keeping with results found with other mental health populations.
Rater methodology for stroboscopy: a systematic review.
Bonilha, Heather Shaw; Focht, Kendrea L; Martin-Harris, Bonnie
2015-01-01
Laryngeal endoscopy with stroboscopy (LES) remains the clinical gold standard for assessing vocal fold function. LES is used to evaluate the efficacy of voice treatments in research studies and clinical practice. LES as a voice treatment outcome tool is only as good as the clinician interpreting the recordings. Research using LES as a treatment outcome measure should be evaluated based on rater methodology and reliability. The purpose of this literature review was to evaluate the rater-related methodology from studies that use stroboscopic findings as voice treatment outcome measures. Systematic literature review. Computerized journal databases were searched for relevant articles using terms: stroboscopy and treatment. Eligible articles were categorized and evaluated for the use of rater-related methodology, reporting of number of raters, types of raters, blinding, and rater reliability. Of the 738 articles reviewed, 80 articles met inclusion criteria. More than one-third of the studies included in the review did not report the number of raters who participated in the study. Eleven studies reported results of rater reliability analysis with only two studies reporting good inter- and intrarater reliability. The comparability and use of results from treatment studies that use LES are limited by a lack of rigor in rater methodology and variable, mostly poor, inter- and intrarater reliability. To improve our ability to evaluate and use the findings from voice treatment studies that use LES features as outcome measures, greater consistency of reporting rater methodology characteristics across studies and improved rater reliability is needed. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Karns, James
1993-01-01
The objective of this study was to establish the initial quantitative reliability bounds for nuclear electric propulsion systems in a manned Mars mission required to ensure crew safety and mission success. Finding the reliability bounds involves balancing top-down (mission driven) requirements and bottom-up (technology driven) capabilities. In seeking this balance we hope to accomplish the following: (1) provide design insights into the achievability of the baseline design in terms of reliability requirements, given the existing technology base; (2) suggest alternative design approaches which might enhance reliability and crew safety; and (3) indicate what technology areas require significant research and development to achieve the reliability objectives.
Smith, Toby O; Clark, Allan; Neda, Sophia; Arendt, Elizabeth A; Post, William R; Grelsamer, Ronald P; Dejour, David; Almqvist, Karl Fredrik; Donell, Simon T
2012-08-01
An accurate physical examination of patients with patellar instability is an important aspect of the diagnosis and treatment. While previous studies have assessed the diagnostic accuracy of such physical examination tests, little has been undertaken to assess the inter- and intra-tester reliability of such techniques. The purpose of this study was to determine the inter- and intra-tester reliability of the physical examination tests used for patients with patellar instability. Five patients (10 knees) with bilateral recurrent patellar instability were assessed by five members of the International Patellofemoral Study Group. Each surgeon assessed each patient twice using 18 reported physical examination tests. The inter- and intra-observer reliability was assessed using weighted Kappa statistics with 95% confidence intervals. The findings of the study suggested that there were very poor inter-observer reliability for the majority of the physical tests, with only the assessments of patellofemoral crepitus, foot arch position and the J-sign presenting with fair to moderate agreement respectively. The intra-observer reliability indicated largely moderate to substantial agreement between the first and second tests performed by each assessor, with the greatest agreement seen for the assessment of tibial torsion, popliteal angle and the Bassett's sign. For the common physical examination tests used in the management of patients with patellar instability inter-observer reliability is poor, while intra-observer reliability is moderate. Standardization of physical exam assessments and further study of these results among different clinicians and more divergent patient groups is indicated. Copyright © 2011 Elsevier B.V. All rights reserved.
Hughes, Michael; Tracey, Andrew; Bhushan, Monica; Chakravarty, Kuntal; Denton, Christopher P; Dubey, Shirish; Guiducci, Serena; Muir, Lindsay; Ong, Voon; Parker, Louise; Pauling, John D; Prabu, Athiveeraramapandian; Rogers, Christine; Roberts, Christopher; Herrick, Ariane L
2018-06-01
The reliability of clinician grading of systemic sclerosis-related digital ulcers has been reported to be poor to moderate at best, which has important implications for clinical trial design. The aim of this study was to examine the reliability of new proposed UK Scleroderma Study Group digital ulcer definitions among UK clinicians with an interest in systemic sclerosis. Raters graded (through a custom-built interface) 90 images (80 unique and 10 repeat) of a range of digital lesions collected from patients with systemic sclerosis. Lesions were graded on an ordinal scale of severity: 'no ulcer', 'healed ulcer' or 'digital ulcer'. A total of 23 clinicians - 18 rheumatologists, 3 dermatologists, 1 hand surgeon and 1 specialist rheumatology nurse - completed the study. A total of 2070 (1840 unique + 230 repeat) image gradings were obtained. For intra-rater reliability, across all images, the overall weighted kappa coefficient was high (0.71) and was moderate (0.55) when averaged across individual raters. Overall inter-rater reliability was poor (0.15). Although our proposed digital ulcer definitions had high intra-rater reliability, the overall inter-rater reliability was poor. Our study highlights the challenges of digital ulcer assessment by clinicians with an interest in systemic sclerosis and provides a number of useful insights for future clinical trial design. Further research is warranted to improve the reliability of digital ulcer definition/rating as an outcome measure in clinical trials, including examining the role for objective measurement techniques, and the development of digital ulcer patient-reported outcome measures.
The reliability of WorkWell Systems Functional Capacity Evaluation: a systematic review
2014-01-01
Background Functional capacity evaluation (FCE) determines a person’s ability to perform work-related tasks and is a major component of the rehabilitation process. The WorkWell Systems (WWS) FCE (formerly known as Isernhagen Work Systems FCE) is currently the most commonly used FCE tool in German rehabilitation centres. Our systematic review investigated the inter-rater, intra-rater and test-retest reliability of the WWS FCE. Methods We performed a systematic literature search of studies on the reliability of the WWS FCE and extracted item-specific measures of inter-rater, intra-rater and test-retest reliability from the identified studies. Intraclass correlation coefficients ≥ 0.75, percentages of agreement ≥ 80%, and kappa coefficients ≥ 0.60 were categorised as acceptable, otherwise they were considered non-acceptable. The extracted values were summarised for the five performance categories of the WWS FCE, and the results were classified as either consistent or inconsistent. Results From 11 identified studies, 150 item-specific reliability measures were extracted. 89% of the extracted inter-rater reliability measures, all of the intra-rater reliability measures and 96% of the test-retest reliability measures of the weight handling and strength tests had an acceptable level of reliability, compared to only 67% of the test-retest reliability measures of the posture/mobility tests and 56% of the test-retest reliability measures of the locomotion tests. Both of the extracted test-retest reliability measures of the balance test were acceptable. Conclusions Weight handling and strength tests were found to have consistently acceptable reliability. Further research is needed to explore the reliability of the other tests as inconsistent findings or a lack of data prevented definitive conclusions. PMID:24674029
Su, T A; Hoe, V C W
2008-12-01
Validity and reliability of the information relating to hand-transmitted vibration exposure and vibration-related health outcome are very important for case finding in hand-arm vibration syndrome (HAVS) studies. In a local HAVS study among a group of construction workers in Kuala Lumpur, Malaysia, a questionnaire translated into Malay was created based on the Hand-transmitted Vibration Health Surveillance--Initial Questionnaire and Clinical Assessment, from Vibration Injury Network. This study was conducted to determine the reliability of standardised questions in the questionnaire used in the study. 15 subjects were selected randomly from the sampling frame of the HAVS study. Test-retest reliability was conducted on all items contained in parts 1-6 of the questionnaire and clinical assessment form, with an interval of 13-14 days between the first and second administration. Kappa coefficient and percentage agreement were calculated for all standardised questions. The kappa coefficient and percentage agreement for all standardised questions varied from -0.174 to 1.000 and 66.7 to 100.0 percent, respectively. The kappa coefficient for important questions related to current vibratory tool usage, tingling, numbness and hand grip weakness were 0.714, 0.432, -0.077 and -0.120, respectively, while the percentage agreement for current vibratory tool usage, finger colour change, tingling, numbness and hand grip weakness were 85.7 percent, 92.8 percent, 79.5 percent, 85.7 percent and 71.4 percent, respectively. Intra-rater reliability on the extent of vibration exposure was good, with the intra-class correlation coefficient (95 percent confidence interval) ranging from 0.786 (0.334-0.931) to 0.975 (0.923-0.992). Critical questions on vascular, neurological and musculoskeletal symptoms of HAVS were found to be reliable. The history on the extent of vibration exposure revealed good reliability when explored by the investigator alone. This questionnaire is considered reliable to be used in the study of HAVS among construction workers working in a construction site.
Reliability of Computerized Neurocognitive Tests for Concussion Assessment: A Meta-Analysis.
Farnsworth, James L; Dargo, Lucas; Ragan, Brian G; Kang, Minsoo
2017-09-01
Although widely used, computerized neurocognitive tests (CNTs) have been criticized because of low reliability and poor sensitivity. A systematic review was published summarizing the reliability of Immediate Post-Concussion Assessment and Cognitive Testing (ImPACT) scores; however, this was limited to a single CNT. Expansion of the previous review to include additional CNTs and a meta-analysis is needed. Therefore, our purpose was to analyze reliability data for CNTs using meta-analysis and examine moderating factors that may influence reliability. A systematic literature search (key terms: reliability, computerized neurocognitive test, concussion) of electronic databases (MEDLINE, PubMed, Google Scholar, and SPORTDiscus) was conducted to identify relevant studies. Studies were included if they met all of the following criteria: used a test-retest design, involved at least 1 CNT, provided sufficient statistical data to allow for effect-size calculation, and were published in English. Two independent reviewers investigated each article to assess inclusion criteria. Eighteen studies involving 2674 participants were retained. Intraclass correlation coefficients were extracted to calculate effect sizes and determine overall reliability. The Fisher Z transformation adjusted for sampling error associated with averaging correlations. Moderator analyses were conducted to evaluate the effects of the length of the test-retest interval, intraclass correlation coefficient model selection, participant demographics, and study design on reliability. Heterogeneity was evaluated using the Cochran Q statistic. The proportion of acceptable outcomes was greatest for the Axon Sports CogState Test (75%) and lowest for the ImPACT (25%). Moderator analyses indicated that the type of intraclass correlation coefficient model used significantly influenced effect-size estimates, accounting for 17% of the variation in reliability. The Axon Sports CogState Test, which has a higher proportion of acceptable outcomes and shorter test duration relative to other CNTs, may be a reliable option; however, future studies are needed to compare the diagnostic accuracy of these instruments.
An interrater reliability study of the Braden scale in two nursing homes.
Kottner, Jan; Dassen, Theo
2008-10-01
Adequate risk assessment is essential in pressure ulcer prevention. Assessment scales were designed to support practitioners in identifying persons at pressure ulcer risk. The Braden scale is one of the most extensively studied risk assessment instruments, although the majority of studies focused on validity rather than reliability. The first aim was to measure the interrater reliability of the Braden scale and its individual items. The second aim was to study different statistical approaches regarding interrater reliability estimation. An interrater reliability study was conducted in two German nursing homes. Residents (n = 152) from 8 units were assessed twice. The raters were trained nurses with a work experience ranging from 0.5 to 30 years. Data were analysed using an overall percentage of agreement, weighted and unweighted kappa and the intraclass correlation coefficient. Differences between nurses rating the overall Braden score ranged from 0 up to 9 points. Interrater reliability expressed by the intraclass correlation coefficient ranged from 0.73 (95% CI 0.26 - 0.91) to 0.95 (95% CI 0.87 - 0.98). Calculated intraclass correlation coefficients for individual items ranged from 0.06 (95% CI -0.31 to 0.48) to 0.97 (95% CI 0.93-0.99) with the lowest values being measured for the items "sensory perception" and "nutrition". There was no association between work experience and the level of interrater reliability. With two exceptions, simple kappa-values were always lower than weighted kappa-values and intraclass correlation coefficients. Although the calculated interrater reliability coefficients for the total Braden score were high in some cases, several clinically relevant differences occurred between the nurses. Due to interrater reliability being very low for the items "sensory perception" and "nutrition", it is doubtful if their assessment contributes to any valid results. The calculation of weighted kappa or intraclass correlation coefficients is the most appropriate interrater reliability estimates.
Seo, Hyun-Ju; Kim, Soo Young; Lee, Yoon Jae; Jang, Bo-Hyoung; Park, Ji-Eun; Sheen, Seung-Soo; Hahn, Seo Kyung
2016-02-01
To develop a study Design Algorithm for Medical Literature on Intervention (DAMI) and test its interrater reliability, construct validity, and ease of use. We developed and then revised the DAMI to include detailed instructions. To test the DAMI's reliability, we used a purposive sample of 134 primary, mainly nonrandomized studies. We then compared the study designs as classified by the original authors and through the DAMI. Unweighted kappa statistics were computed to test interrater reliability and construct validity based on the level of agreement between the original and DAMI classifications. Assessment time was also recorded to evaluate ease of use. The DAMI includes 13 study designs, including experimental and observational studies of interventions and exposure. Both the interrater reliability (unweighted kappa = 0.67; 95% CI [0.64-0.75]) and construct validity (unweighted kappa = 0.63, 95% CI [0.52-0.67]) were substantial. Mean classification time using the DAMI was 4.08 ± 2.44 minutes (range, 0.51-10.92). The DAMI showed substantial interrater reliability and construct validity. Furthermore, given its ease of use, it could be used to accurately classify medical literature for systematic reviews of interventions although minimizing disagreement between authors of such reviews. Copyright © 2016 Elsevier Inc. All rights reserved.
An In vitro evaluation of the reliability of QR code denture labeling technique.
Poovannan, Sindhu; Jain, Ashish R; Krishnan, Cakku Jalliah Venkata; Chandran, Chitraa R
2016-01-01
Positive identification of the dead after accidents and disasters through labeled dentures plays a key role in forensic scenario. A number of denture labeling methods are available, and studies evaluating their reliability under drastic conditions are vital. This study was conducted to evaluate the reliability of QR (Quick Response) Code labeled at various depths in heat-cured acrylic blocks after acid treatment, heat treatment (burns), and fracture in forensics. It was an in vitro study. This study included 160 specimens of heat-cured acrylic blocks (1.8 cm × 1.8 cm) and these were divided into 4 groups (40 samples per group). QR Codes were incorporated in the samples using clear acrylic sheet and they were assessed for reliability under various depths, acid, heat, and fracture. Data were analyzed using Chi-square test, test of proportion. The QR Code inclusion technique was reliable under various depths of acrylic sheet, acid (sulfuric acid 99%, hydrochloric acid 40%) and heat (up to 370°C). Results were variable with fracture of QR Code labeled acrylic blocks. Within the limitations of the study, by analyzing the results, it was clearly indicated that the QR Code technique was reliable under various depths of acrylic sheet, acid, and heat (370°C). Effectiveness varied in fracture and depended on the level of distortion. This study thus suggests that QR Code is an effective and simpler denture labeling method.
Relevance and reliability of experimental data in human health risk assessment of pesticides.
Kaltenhäuser, Johanna; Kneuer, Carsten; Marx-Stoelting, Philip; Niemann, Lars; Schubert, Jens; Stein, Bernd; Solecki, Roland
2017-08-01
Evaluation of data relevance, reliability and contribution to uncertainty is crucial in regulatory health risk assessment if robust conclusions are to be drawn. Whether a specific study is used as key study, as additional information or not accepted depends in part on the criteria according to which its relevance and reliability are judged. In addition to GLP-compliant regulatory studies following OECD Test Guidelines, data from peer-reviewed scientific literature have to be evaluated in regulatory risk assessment of pesticide active substances. Publications should be taken into account if they are of acceptable relevance and reliability. Their contribution to the overall weight of evidence is influenced by factors including test organism, study design and statistical methods, as well as test item identification, documentation and reporting of results. Various reports make recommendations for improving the quality of risk assessments and different criteria catalogues have been published to support evaluation of data relevance and reliability. Their intention was to guide transparent decision making on the integration of the respective information into the regulatory process. This article describes an approach to assess the relevance and reliability of experimental data from guideline-compliant studies as well as from non-guideline studies published in the scientific literature in the specific context of uncertainty and risk assessment of pesticides. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Stroke and aphasia quality-of-life scale-39: Reliability and validity of the Turkish version.
Noyan-ErbaŞ, AyŞin; Toğram, Bülent
2016-10-01
The aim of this study was to adapt the stroke and aphasia quality-of-life scale-39 (SAQoL-39) to the Turkish language and carry out a reliability and validity study of the instrument in a group of patients with aphasia. The study was a descriptive study and contained three phases: adaptation of the SAQoL-39 to the Turkish language, administration of the scale to 30 aphasia patients and reliability and validity studies of the scale. Internal consistency was assessed with Cronbach's alpha and test-re-test reliability was explored (n = 14). The adaptation process was completed based on inter-rater agreement on the translated items and within the scope of final editing by the authors of the study. The SAQoL-39 in Turkish exhibited high test-re-test reliability (ICC =0.97) as well as acceptability with minimal missing data (0-1.4). This instrument exhibited high internal consistency (Cronbach's α = 0.70-0.97), domain-total correlations (r = 0.76-0.85) and inter-domain correlations (r = 0.40-0.68). The analysis shows that the Turkish version of SAQoL-39 is a scale that is highly acceptable, valid and reliable and can be easily used in evaluating the quality-of-life of Turkish people with aphasia.
VFS interjudge reliability using a free and directed search.
Bryant, Karen N; Finnegan, Eileen; Berbaum, Kevin
2012-03-01
Reports in the literature suggest that clinicians demonstrate poor reliability in rating videofluoroscopic swallow (VFS) variables. Contemporary perception theories suggest that the methods used in VFS reliability studies constrain subjects to make judgments in an abnormal way. The purpose of this study was to determine whether a directed search or a free search approach to rating swallow studies results in better interjudge reliability. Ten speech pathologists served as judges. Five clinical judges were assigned to the directed search group (use checklist) and five to the free search group (unguided observations). Clinical judges interpreted 20 VFS examinations of swallowing. Interjudge reliability of ratings of dysphagia severity, affected stage of swallow, dysphagia symptoms, and attributes identified by clinical judges using a directed search was compared with that using a free search approach. Interjudge reliability for rating the presence of aspiration and penetration was significantly better using a free search ("substantial" to "almost perfect" agreement) compared to a directed search ("moderate" agreement). Reliability of dysphagia severity ratings ranged from "moderate" to "almost perfect" agreement for both methods of search. Reliability for reporting all other symptoms and attributes of dysphagia was variable and was not significantly different between the groups.
Validity and Reliability of Accelerometers in Patients With COPD: A SYSTEMATIC REVIEW.
Gore, Shweta; Blackwood, Jennifer; Guyette, Mary; Alsalaheen, Bara
2018-05-01
Reduced physical activity is associated with poor prognosis in chronic obstructive pulmonary disease (COPD). Accelerometers have greatly improved quantification of physical activity by providing information on step counts, body positions, energy expenditure, and magnitude of force. The purpose of this systematic review was to compare the validity and reliability of accelerometers used in patients with COPD. An electronic database search of MEDLINE and CINAHL was performed. Study quality was assessed with the Strengthening the Reporting of Observational Studies in Epidemiology checklist while methodological quality was assessed using the modified Quality Appraisal Tool for Reliability Studies. The search yielded 5392 studies; 25 met inclusion criteria. The SenseWear Pro armband reported high criterion validity under controlled conditions (r = 0.75-0.93) and high reliability (ICC = 0.84-0.86) for step counts. The DynaPort MiniMod demonstrated highest concurrent validity for step count using both video and manual methods. Validity of the SenseWear Pro armband varied between studies especially in free-living conditions, slower walking speeds, and with addition of weights during gait. A high degree of variability was found in the outcomes used and statistical analyses performed between studies, indicating a need for further studies to measure reliability and validity of accelerometers in COPD. The SenseWear Pro armband is the most commonly used accelerometer in COPD, but measurement properties are limited by gait speed variability and assistive device use. DynaPort MiniMod and Stepwatch accelerometers demonstrated high validity in patients with COPD but lack reliability data.
Downs, Stephen; Marquez, Jodie; Chiarelli, Pauline
2013-06-01
What is the intra-rater and inter-rater relative reliability of the Berg Balance Scale? What is the absolute reliability of the Berg Balance Scale? Does the absolute reliability of the Berg Balance Scale vary across the scale? Systematic review with meta-analysis of reliability studies. Any clinical population that has undergone assessment with the Berg Balance Scale. Relative intra-rater reliability, relative inter-rater reliability, and absolute reliability. Eleven studies involving 668 participants were included in the review. The relative intrarater reliability of the Berg Balance Scale was high, with a pooled estimate of 0.98 (95% CI 0.97 to 0.99). Relative inter-rater reliability was also high, with a pooled estimate of 0.97 (95% CI 0.96 to 0.98). A ceiling effect of the Berg Balance Scale was evident for some participants. In the analysis of absolute reliability, all of the relevant studies had an average score of 20 or above on the 0 to 56 point Berg Balance Scale. The absolute reliability across this part of the scale, as measured by the minimal detectable change with 95% confidence, varied between 2.8 points and 6.6 points. The Berg Balance Scale has a higher absolute reliability when close to 56 points due to the ceiling effect. We identified no data that estimated the absolute reliability of the Berg Balance Scale among participants with a mean score below 20 out of 56. The Berg Balance Scale has acceptable reliability, although it might not detect modest, clinically important changes in balance in individual subjects. The review was only able to comment on the absolute reliability of the Berg Balance Scale among people with moderately poor to normal balance. Copyright © 2013 Australian Physiotherapy Association. Published by .. All rights reserved.
Moore, Amy Lawson; Miller, Terissa M
2018-01-01
The purpose of the current study is to evaluate the validity and reliability of the revised Gibson Test of Cognitive Skills, a computer-based battery of tests measuring short-term memory, long-term memory, processing speed, logic and reasoning, visual processing, as well as auditory processing and word attack skills. This study included 2,737 participants aged 5-85 years. A series of studies was conducted to examine the validity and reliability using the test performance of the entire norming group and several subgroups. The evaluation of the technical properties of the test battery included content validation by subject matter experts, item analysis and coefficient alpha, test-retest reliability, split-half reliability, and analysis of concurrent validity with the Woodcock Johnson III Tests of Cognitive Abilities and Tests of Achievement. Results indicated strong sources of evidence of validity and reliability for the test, including internal consistency reliability coefficients ranging from 0.87 to 0.98, test-retest reliability coefficients ranging from 0.69 to 0.91, split-half reliability coefficients ranging from 0.87 to 0.91, and concurrent validity coefficients ranging from 0.53 to 0.93. The Gibson Test of Cognitive Skills-2 is a reliable and valid tool for assessing cognition in the general population across the lifespan.
A Reliability Generalization Meta-Analysis of Coefficient Alpha for the Maslach Burnout Inventory
ERIC Educational Resources Information Center
Wheeler, Denna L.; Vassar, Matt; Worley, Jody A.; Barnes, Laura L. B.
2011-01-01
The purpose of this study was to synthesize internal consistency reliability for the subscale scores on the Maslach Burnout Inventory (MBI). The authors addressed three research questions: (a) What is the mean subscale score reliability for the MBI across studies? (b) What factors are associated with observed variance in MBI subscale score…
Reliability of Total Test Scores When Considered as Ordinal Measurements
ERIC Educational Resources Information Center
Biswas, Ajoy Kumar
2006-01-01
This article studies the ordinal reliability of (total) test scores. This study is based on a classical-type linear model of observed score (X), true score (T), and random error (E). Based on the idea of Kendall's tau-a coefficient, a measure of ordinal reliability for small-examinee populations is developed. This measure is extended to large…
ERIC Educational Resources Information Center
Uzun, N. Bilge; Aktas, Mehtap; Asiret, Semih; Yormaz, Seha
2018-01-01
The goal of this study is to determine the reliability of the performance points of dentistry students regarding communication skills and to examine the scoring reliability by generalizability theory in balanced random and fixed facet (mixed design) data, considering also the interactions of student, rater and duty. The study group of the research…
ERIC Educational Resources Information Center
Nordness, Philip D.; Epstein, Michael H.; Cullinan, Douglas; Pierce, Corey D.
2014-01-01
The Emotional and Behavioral Screener (EBS) is a universal screening instrument designed to identify students whose excessive problem behaviors put them at risk of the education disability category of emotional disturbance (ED). This article reports findings from three studies that address the reliability and validity of the EBS. Studies 1 and 2…
The Validity and Reliability of the Mobbing Scale (MS)
ERIC Educational Resources Information Center
Yaman, Erkan
2009-01-01
The aim of this research is to develop the Mobbing Scale and examine its validity and reliability. The sample of the study consisted of 515 persons from Sakarya and Bursa. In this study, construct validity, internal consistency, test-retest reliability, and item analysis of the scale were examined. As a result of factor analysis for construct…
Validity and Reliability of the Arabic Token Test for Children
ERIC Educational Resources Information Center
Alkhamra, Rana A.; Al-Jazi, Aya B.
2016-01-01
Background: The Token Test for Children (2nd edition) (TTFC) is a measure for assessing receptive language. In this study we describe the translation process, validity and reliability of the Arabic Token Test for Children (A-TTFC). Aims: The aim of this study is to translate, validate and establish the reliability of the Arabic Token Test for…
A Pilot Study Examining the Test-Retest and Internal Consistency Reliability of the ABLLS-R
ERIC Educational Resources Information Center
Partington, James W.; Bailey, Autumn; Partington, Scott W.
2018-01-01
The literature contains a variety of assessment tools for measuring the skills of individuals with autism or other developmental delays, but most lack adequate empirical evidence supporting their reliability and validity. The current pilot study sought to examine the reliability of scores obtained from the Assessment of Basic Language and Learning…
Test of Creative Imagination: Validity and Reliability Study
ERIC Educational Resources Information Center
Gundogan, Aysun; Ari, Meziyet; Gonen, Mubeccel
2013-01-01
The purpose of this study was to investigate validity and reliability of the test of creative imagination. This study was conducted with the participation of 1000 children, aged between 9-14 and were studying in six primary schools in the city center of Denizli Province, chosen by cluster ratio sampling. In the study, it was revealed that the…
An Investigation of the Impact of Guessing on Coefficient α and Reliability
2014-01-01
Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g., parallel test assumptions, homogeneous inter-item correlations, homogeneous item difficulty, and homogeneous guessing levels across items) to evaluate the relation between guessing and test reliability. Based on the item response theory (IRT) framework, this study investigated the extent of the impact of guessing on reliability under more realistic conditions where item difficulty, item discrimination, and guessing levels actually vary across items with three different test lengths (TL). By accommodating multiple item characteristics simultaneously, this study also focused on examining interaction effects between guessing and other variables entered in the simulation to be more realistic. The simulation of the more realistic conditions and calculations of reliability and classical test theory (CTT) item statistics were facilitated by expressing CTT item statistics, coefficient α, and reliability in terms of IRT model parameters. In addition to the general negative impact of guessing on reliability, results showed interaction effects between TL and guessing and between guessing and test difficulty.
Milanović, Zoran; Pantelić, Saša; Trajković, Nebojša; Jorgić, Bojan; Sporiš, Goran; Bratić, Milovan
2014-01-01
The purpose of this study was to determine the test-retest reliability of the International Physical Activity Questionnaire (IPAQ) for older adults in Serbia. Six hundred and sixty older adults (352 men, 53%; 308 women, 47%; mean age 67.65±5.76 years) participated in the study. To examine test-retest reliability, the participants were asked to complete the IPAQ on two occasions 2 weeks apart. Moderate reliability was observed between the repeated IPAQ, with intraclass correlation coefficients ranging from 0.53 to 0.91. The least reliability was established in leisure time activity (0.53) and the most reliability in the transport domain (0.91). Men and women had similar intraclass correlation coefficients for total physical activity (0.71 versus 0.74, respectively), while the biggest difference was obtained for housework in men (0.68) and in women (0.90). Our study shows that the long version of the IPAQ is a reliable instrument for assessing physical activity levels in older adults and that it may be useful for generating internationally comparable data.
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies
Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry
2017-01-01
Objectives To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Design Systematic review and narrative synthesis of reproducibility studies. Data sources Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Eligibility criteria Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations.Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. Results From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ−0.10). Inter-rater reliability was poor in six studies (37%) and excellent in only two (13%). This contrasts with studies conducted in the research setting, where the median inter-rater reliability was 0.76 (range 0.91-0.53), and 71% (5/7) studies achieved excellent inter-rater reliability. Reliability between assessing professionals was higher when the evaluation was guided by a standardised instrument (23 studies, P=0.006). No such association was detected for subjective or chronic health conditions or the studies’ generalisability to real world evaluation of disability (P=0.46, 0.45, and 0.65, respectively). Conclusions Despite their common use and far reaching consequences for workers claiming disabling injury or illness, research on the reliability of medical evaluations of disability for work is limited and indicates high variation in judgments among assessing professionals. Standardising the evaluation process could improve reliability. Development and testing of instruments and structured approaches to improve reliability in evaluation of disability are urgently needed. PMID:28122727
Apeldoorn, Adri T.; van Helvoirt, Hans; Ostelo, Raymond W.; Meihuizen, Hanneke; Kamper, Steven J.; van Tulder, Maurits W.; de Vet, Henrica C. W.
2016-01-01
Study design Observational inter-rater reliability study. Objectives To examine: (1) the inter-rater reliability of a modified version of Delitto et al.’s classification-based algorithm for patients with low back pain; (2) the influence of different levels of familiarity with the system; and (3) the inter-rater reliability of algorithm decisions in patients who clearly fit into a subgroup (clear classifications) and those who do not (unclear classifications). Methods Patients were examined twice on the same day by two of three participating physical therapists with different levels of familiarity with the system. Patients were classified into one of four classification groups. Raters were blind to the others’ classification decision. In order to quantify the inter-rater reliability, percentages of agreement and Cohen’s Kappa were calculated. Results A total of 36 patients were included (clear classification n = 23; unclear classification n = 13). The overall rate of agreement was 53% and the Kappa value was 0·34 [95% confidence interval (CI): 0·11–0·57], which indicated only fair inter-rater reliability. Inter-rater reliability for patients with a clear classification (agreement 52%, Kappa value 0·29) was not higher than for patients with an unclear classification (agreement 54%, Kappa value 0·33). Familiarity with the system (i.e. trained with written instructions and previous research experience with the algorithm) did not improve the inter-rater reliability. Conclusion Our pilot study challenges the inter-rater reliability of the classification procedure in clinical practice. Therefore, more knowledge is needed about factors that affect the inter-rater reliability, in order to improve the clinical applicability of the classification scheme. PMID:27559279
Report: Studies Addressing EPA’s Organizational Structure
Report #2006-P-00029, August 16, 2006. The 13 studies, articles, publications, and reports we reviewed identified issues with cross-media management, regional offices, reliable information, and reliable science.
Brolin, Rosita; Rask, Mikael; Syrén, Susanne; Brunt, David Arthur
2013-10-01
The aim of this study was to investigate the reliability and validity of a questionnaire for studying satisfaction with housing and housing support for people with psychiatric disabilities. Most items were gathered from English language questionnaires. These were translated and adapted to a Swedish context and items concerning housing support were added. Two studies were conducted. The first, a test-retest reliability analysis, was performed in a pilot study with 53 participants; in the second study, which had 370 participants, a five factor solution with good internal consistency emerged. Further development of the questionnaire is discussed.
Lesage, A D; Cyr, M; Toupin, J; Cormier, H; Valiquette, C
1991-01-01
Interview questionnaires offer more validity than self-administered format in exploring psychopathological or psychosocial phenomena of interest in psychiatric research. If used, special care needs to be paid to interviewers' training and ensuring that they maintain their reliability. No widespread training standards exist and each schedule may carry its own procedure. Our aims are to indicate how we trained interviewers with the French version of the Present State Examination (Wing, Cooper and Sartorius, 1974) and how we checked and kept acceptable interraters reliability during one study. We will provide data on the interraters reliability during the training and the study, as well as the test-retest reliability. These results will be used to support some guidelines when using this sort of psychiatric research questionnaires in order to ensure comparability both within the study and between studies.
Reliability Stress-Strength Models for Dependent Observations with Applications in Clinical Trials
NASA Technical Reports Server (NTRS)
Kushary, Debashis; Kulkarni, Pandurang M.
1995-01-01
We consider the applications of stress-strength models in studies involving clinical trials. When studying the effects and side effects of certain procedures (treatments), it is often the case that observations are correlated due to subject effect, repeated measurements and observing many characteristics simultaneously. We develop maximum likelihood estimator (MLE) and uniform minimum variance unbiased estimator (UMVUE) of the reliability which in clinical trial studies could be considered as the chances of increased side effects due to a particular procedure compared to another. The results developed apply to both univariate and multivariate situations. Also, for the univariate situations we develop simple to use lower confidence bounds for the reliability. Further, we consider the cases when both stress and strength constitute time dependent processes. We define the future reliability and obtain methods of constructing lower confidence bounds for this reliability. Finally, we conduct simulation studies to evaluate all the procedures developed and also to compare the MLE and the UMVUE.
Spaan, Suzanne; Pronk, Anjoeka; Koch, Holger M; Jusko, Todd A; Jaddoe, Vincent W V; Shaw, Pamela A; Tiemeier, Henning M; Hofman, Albert; Pierik, Frank H; Longnecker, Matthew P
2015-05-01
The widespread use of organophosphate (OP) pesticides has resulted in ubiquitous exposure in humans, primarily through their diet. Exposure to OP pesticides may have adverse health effects, including neurobehavioral deficits in children. The optimal design of new studies requires data on the reliability of urinary measures of exposure. In the present study, urinary concentrations of six dialkyl phosphate (DAP) metabolites, the main urinary metabolites of OP pesticides, were determined in 120 pregnant women participating in the Generation R Study in Rotterdam. Intra-class correlation coefficients (ICCs) across serial urine specimens taken at <18, 18-25, and >25 weeks of pregnancy were determined to assess reliability. Geometric mean total DAP metabolite concentrations were 229 (GSD 2.2), 240 (GSD 2.1), and 224 (GSD 2.2) nmol/g creatinine across the three periods of gestation. Metabolite concentrations from the serial urine specimens in general correlated moderately. The ICCs for the six DAP metabolites ranged from 0.14 to 0.38 (0.30 for total DAPs), indicating weak to moderate reliability. Although the DAP metabolite levels observed in this study are slightly higher and slightly more correlated than in previous studies, the low to moderate reliability indicates a high degree of within-person variability, which presents challenges for designing well-powered epidemiological studies.
Milner, Clare E; Brindle, Richard A
2016-01-01
There has been increased interest recently in measuring kinematics within the foot during gait. While several multisegment foot models have appeared in the literature, the Oxford foot model has been used frequently for both walking and running. Several studies have reported the reliability for the Oxford foot model, but most studies to date have reported reliability for barefoot walking. The purpose of this study was to determine between-day (intra-rater) and within-session (inter-trial) reliability of the modified Oxford foot model during shod walking and running and calculate minimum detectable difference for common variables of interest. Healthy adult male runners participated. Participants ran and walked in the gait laboratory for five trials of each. Three-dimensional gait analysis was conducted and foot and ankle joint angle time series data were calculated. Participants returned for a second gait analysis at least 5 days later. Intraclass correlation coefficients and minimum detectable difference were determined for walking and for running, to indicate both within-session and between-day reliability. Overall, relative variables were more reliable than absolute variables, and within-session reliability was greater than between-day reliability. Between-day intraclass correlation coefficients were comparable to those reported previously for adults walking barefoot. It is an extension in the use of the Oxford foot model to incorporate wearing a shoe while maintaining marker placement directly on the skin for each segment. These reliability data for walking and running will aid in the determination of meaningful differences in studies which use this model during shod gait. Copyright © 2015 Elsevier B.V. All rights reserved.
Rossettini, Giacomo; Rondoni, Angie; Lovato, Tommaso; Strobe, Marco; Verzè, Elisa; Vicentini, Marco; Testa, Marco
2016-06-03
Passive Intervertebral Movements (PIVMs) are commonly used to assess and treat patients with nonspecific neck pain. Only very few studies have investigated 3D movements until now. This study assessed intra- and inter-rater reliability of three-dimensional (3D) cervical PIVMs performed by physical therapy students in patients with nonspecific neck pain. Thirty-one patients, mean age 47.2 ± 7.2 years, were independently evaluated by 2 physical therapy students. The raters (A and B) assessed mobility, end-feel and pain provocation performing bilaterally the 3D cervical segmental side-bending test (3D CSSB) from levels C2-C3 to C6-C7. Percentage agreement (raw, positive and negative), Cohen's kappa (95% CI), prevalence index and bias index were calculated to estimate intra- and inter-reliability. Intra-rater reliability showed kappa values ranging between fair and substantial (k 0.29-0.80) for pain provocation, mobility and end-feel, with percentage agreements between 61%-90%. Inter-rater reliability presented kappa values ranging between fair and substantial (k 0.22-0.62) for pain provocation, mobility and end-feel, with percentage agreements between 61% and 80%. Intra-rater reliability of 3D PIVMs was superior to inter-rater reliability in patients with nonspecific neck pain. The most repeatable evaluation parameter was pain. However overall poor reliability suggests avoiding the use of these techniques alone to examine patients and measure their outcome. Further studies are needed to investigate PIVMs reliability in combination with other assessment procedure in symptomatic patients.
DiCesare, Christopher A; Bates, Nathaniel A; Barber Foss, Kim D; Thomas, Staci M; Wordeman, Samuel C; Sugimoto, Dai; Roewer, Benjamin D; Medina McKeon, Jennifer M; Di Stasi, Stephanie; Noehren, Brian W; Ford, Kevin R; Kiefer, Adam W; Hewett, Timothy E; Myer, Gregory D
2015-12-01
Anterior cruciate ligament (ACL) injuries are physically and financially devastating but affect a relatively small percentage of the population. Prospective identification of risk factors for ACL injury necessitates a large sample size; therefore, study of this injury would benefit from a multicenter approach. To determine the reliability of kinematic and kinetic measures of a single-leg cross drop task across 3 institutions. Controlled laboratory study. Twenty-five female high school volleyball players participated in this study. Three-dimensional motion data of each participant performing the single-leg cross drop were collected at 3 institutions over a period of 4 weeks. Coefficients of multiple correlation were calculated to assess the reliability of kinematic and kinetic measures during the landing phase of the movement. Between-centers reliability for kinematic waveforms in the frontal and sagittal planes was good, but moderate in the transverse plane. Between-centers reliability for kinetic waveforms was good in the sagittal, frontal, and transverse planes. Based on these findings, the single-leg cross drop task has moderate to good reliability of kinematic and kinetic measures across institutions after implementation of a standardized testing protocol. Multicenter collaborations can increase study numbers and generalize results, which is beneficial for studies of relatively rare phenomena, such as ACL injury. An important step is to determine the reliability of risk assessments across institutions before a multicenter collaboration can be initiated.
The reliability and validity of ultrasound to quantify muscles in older adults: a systematic review
Scafoglieri, Aldo; Jager‐Wittenaar, Harriët; Hobbelen, Johannes S.M.; van der Schans, Cees P.
2017-01-01
Abstract This review evaluates the reliability and validity of ultrasound to quantify muscles in older adults. The databases PubMed, Cochrane, and Cumulative Index to Nursing and Allied Health Literature were systematically searched for studies. In 17 studies, the reliability (n = 13) and validity (n = 8) of ultrasound to quantify muscles in community‐dwelling older adults (≥60 years) or a clinical population were evaluated. Four out of 13 reliability studies investigated both intra‐rater and inter‐rater reliability. Intraclass correlation coefficient (ICC) scores for reliability ranged from −0.26 to 1.00. The highest ICC scores were found for the vastus lateralis, rectus femoris, upper arm anterior, and the trunk (ICC = 0.72 to 1.000). All included validity studies found ICC scores ranging from 0.92 to 0.999. Two studies describing the validity of ultrasound to predict lean body mass showed good validity as compared with dual‐energy X‐ray absorptiometry (r 2 = 0.92 to 0.96). This systematic review shows that ultrasound is a reliable and valid tool for the assessment of muscle size in older adults. More high‐quality research is required to confirm these findings in both clinical and healthy populations. Furthermore, ultrasound assessment of small muscles needs further evaluation. Ultrasound to predict lean body mass is feasible; however, future research is required to validate prediction equations in older adults with varying function and health. PMID:28703496
Koontz, Alicia M; Lin, Yen-Sheng; Kankipati, Padmaja; Boninger, Michael L; Cooper, Rory A
2011-01-01
This study describes a new custom measurement system designed to investigate the biomechanics of sitting-pivot wheelchair transfers and assesses the reliability of selected biomechanical variables. Variables assessed include horizontal and vertical reaction forces underneath both hands and three-dimensional trunk, shoulder, and elbow range of motion. We examined the reliability of these measures between 5 consecutive transfer trials for 5 subjects with spinal cord injury and 12 nondisabled subjects while they performed a self-selected sitting pivot transfer from a wheelchair to a level bench. A majority of the biomechanical variables demonstrated moderate to excellent reliability (r > 0.6). The transfer measurement system recorded reliable and valid biomechanical data for future studies of sitting-pivot wheelchair transfers.We recommend a minimum of five transfer trials to obtain a reliable measure of transfer technique for future studies.
Scale for positive aspects of caregiving experience: development, reliability, and factor structure.
Kate, N; Grover, S; Kulhara, P; Nehra, R
2012-06-01
OBJECTIVE. To develop an instrument (Scale for Positive Aspects of Caregiving Experience [SPACE]) that evaluates positive caregiving experience and assess its psychometric properties. METHODS. Available scales which assess some aspects of positive caregiving experience were reviewed and a 50-item questionnaire with a 5-point rating was constructed. In all, 203 primary caregivers of patients with severe mental disorders were asked to complete the questionnaire. Internal consistency, test-retest reliability, cross-language reliability, split-half reliability, and face validity were evaluated. Principal component factor analysis was run to assess the factorial validity of the scale. RESULTS. The scale developed as part of the study was found to have good internal consistency, test-retest reliability, cross-language reliability, split-half reliability, and face validity. Principal component factor analysis yielded a 4-factor structure, which also had good test-retest reliability and cross-language reliability. There was a strong correlation between the 4 factors obtained. CONCLUSION. The SPACE developed as part of this study has good psychometric properties.
Human Rights Attitude Scale: A Validity and Reliability Study
ERIC Educational Resources Information Center
Ercan, Recep; Yaman, Tugba; Demir, Selcuk Besir
2015-01-01
The objective of this study is to develop a valid and reliable attitude scale having quality psychometric features that can measure secondary school students' attitudes towards human rights. The study group of the research is comprised by 710 6th, 7th and 8th grade students who study at 4 secondary schools in the centre of Sivas. The study group…
Uysal, Hilal; Ozcan, Şeyda
2011-06-01
Many new measuring devices have been developed so that broader psychometric measurements in the coronary artery disease, disease-specific health status measurements, and identification of the broader quality of life can be performed in the recent years. The study was intended to determine whether, and to what extent, MIDAS is a valid and reliable measurement to the patients suffering from myocardial infarction for the first time in Turkey. The research was conducted with the patients hospitalized and treated with myocardial infarction in the cardiology departments of 2 hospitals in Istanbul, Turkey, between 2007 and 2008. Psychometric evaluations of TR-MIDAS were used for validity studies; language validity, content validity, construct validity were examined. For reliability studies; the tool's internal consistency reliability, Cronbach's alpha reliability coefficient, and test-retest reliability were completed. The instrument's content validity index was determined to be "0.95". Principal component analysis revealed six factors with an eigenvalue >1.5. Cronbach's alpha was found to be 0.89 for total scale which was an acceptable value. The total's test-retest reliability was 0.51 (p<0.01). Data obtained at the end of the study supports that Turkish Myocardial Infarction Dimensional Assessment Scale is a valid and reliable instrument as a disease-specific scale to assess the patients' quality of life suffering from myocardial infarction in Turkey. Copyright © 2010 European Society of Cardiology. Published by Elsevier B.V. All rights reserved.
The Reliability and Validity of Big Five Inventory Scores with African American College Students
ERIC Educational Resources Information Center
Worrell, Frank C.; Cross, William E., Jr.
2004-01-01
This article describes a study that examined the reliability and validity of scores on the Big Five Inventory (BFI; O. P. John, E. M. Donahue, & R. L. Kentle, 1991) in a sample of 336 African American college students. Results from the study indicated moderate reliability and structural validity for BFI scores. Additionally, BFI subscales had few…
ERIC Educational Resources Information Center
Ebuoh, Casmir N.; Ezeudu, S. A.
2015-01-01
The study investigated the effects of scoring by section, use of independent scorers and conventional patterns on scorer reliability in Biology essay tests. It was revealed from literature review that conventional pattern of scoring all items at a time in essay tests had been criticized for not being reliable. The study was true experimental study…
Assessing Reliability of Two Versions of Vocabulary Levels Tests in Iranian Context
ERIC Educational Resources Information Center
Bayazidi, Aso; Saeb, Fateme
2017-01-01
This study examined the equivalence and reliability of the two versions of the Vocabulary Levels Test in an Iranian context. This study was motivated by the fact that the Vocabulary Levels test is increasingly being used in Iran for both research and pedagogical purposes without having been checked for validity and reliability in this context. The…
Poulos, Natalie S; Pasch, Keryn E
2015-07-01
Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8-229 per school). Overall inter-rater reliability of the developed tool ranged from 69-89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. Copyright © 2015 Elsevier Ltd. All rights reserved.
Poulos, Natalie S.; Pasch, Keryn E.
2015-01-01
Few studies of the food environment have collected primary data, and even fewer have reported reliability of the tool used. This study focused on the development of an innovative electronic data collection tool used to document outdoor food and beverage (FB) advertising and establishments near 43 middle and high schools in the Outdoor MEDIA Study. Tool development used GIS based mapping, an electronic data collection form on handheld devices, and an easily adaptable interface to efficiently collect primary data within the food environment. For the reliability study, two teams of data collectors documented all FB advertising and establishments within one half-mile of six middle schools. Inter-rater reliability was calculated overall and by advertisement or establishment category using percent agreement. A total of 824 advertisements (n=233), establishment advertisements (n=499), and establishments (n=92) were documented (range=8–229 per school). Overall inter-rater reliability of the developed tool ranged from 69–89% for advertisements and establishments. Results suggest that the developed tool is highly reliable and effective for documenting the outdoor FB environment. PMID:26022774
2014-01-01
Background Patient-reported outcome validation needs to achieve validity and reliability standards. Among reliability analysis parameters, test-retest reliability is an important psychometric property. Retested patients must be in a clinically stable condition. This is particularly problematic in palliative care (PC) settings because advanced cancer patients are prone to a faster rate of clinical deterioration. The aim of this study was to evaluate the methods by which multi-symptom and health-related qualities of life (HRQoL) based on patient-reported outcomes (PROs) have been validated in oncological PC settings with regards to test-retest reliability. Methods A systematic search of PubMed (1966 to June 2013), EMBASE (1980 to June 2013), PsychInfo (1806 to June 2013), CINAHL (1980 to June 2013), and SCIELO (1998 to June 2013), and specific PRO databases was performed. Studies were included if they described a set of validation studies. Studies were included if they described a set of validation studies for an instrument developed to measure multi-symptom or multidimensional HRQoL in advanced cancer patients under PC. The COSMIN checklist was used to rate the methodological quality of the study designs. Results We identified 89 validation studies from 746 potentially relevant articles. From those 89 articles, 31 measured test-retest reliability and were included in this review. Upon critical analysis of the overall quality of the criteria used to determine the test-retest reliability, 6 (19.4%), 17 (54.8%), and 8 (25.8%) of these articles were rated as good, fair, or poor, respectively, and no article was classified as excellent. Multi-symptom instruments were retested over a shortened interval when compared to the HRQoL instruments (median values 24 hours and 168 hours, respectively; p = 0.001). Validation studies that included objective confirmation of clinical stability in their design yielded better results for the test-retest analysis with regard to both pain and global HRQoL scores (p < 0.05). The quality of the statistical analysis and its description were of great concern. Conclusion Test-retest reliability has been infrequently and poorly evaluated. The confirmation of clinical stability was an important factor in our analysis, and we suggest that special attention be focused on clinical stability when designing a PRO validation study that includes advanced cancer patients under PC. PMID:24447633
Mohseni Bandpei, Mohammad A; Rahmani, Nahid; Majdoleslam, Basir; Abdollahi, Iraj; Ali, Shabnam Shah; Ahmad, Ashfaq
2014-09-01
The purpose of this study was to review the literature to determine whether surface electromyography (EMG) is a reliable tool to assess paraspinal muscle fatigue in healthy subjects and in patients with low back pain (LBP). A literature search for the period of 2000 to 2012 was performed, using PubMed, ProQuest, Science Direct, EMBASE, OVID, CINAHL, and MEDLINE databases. Electromyography, reliability, median frequency, paraspinal muscle, endurance, low back pain, and muscle fatigue were used as keywords. The literature search yielded 178 studies using the above keywords. Twelve articles were selected according to the inclusion criteria of the study. In 7 of the 12 studies, the surface EMG was only applied in healthy subjects, and in 5 studies, the reliability of surface EMG was investigated in patients with LBP or a comparison with a control group. In all of these studies, median frequency was shown to be a reliable EMG parameter to assess paraspinal muscles fatigue. There was a wide variation among studies in terms of methodology, surface EMG parameters, electrode location, procedure, and homogeneity of the study population. The results suggest that there seems to be a convincing body of evidence to support the merit of surface EMG in the assessment of paraspinal muscle fatigue in healthy subject and in patients with LBP. Copyright © 2014 National University of Health Sciences. Published by Elsevier Inc. All rights reserved.
de Albuquerque, Priscila Maria Nascimento Martins; de Alencar, Geisa Guimarães; de Oliveira, Daniela Araújo; de Siqueira, Gisela Rocha
2018-01-01
The aim of this study was to examine and interpret the concordance, accuracy, and reliability of photogrammetric protocols available in the literature for evaluating cervical lordosis in an adult population aged 18 to 59 years. A systematic search of 6 electronic databases (MEDLINE via PubMed, LILACS, CINAHL, Scopus, ScienceDirect, and Web of Science) located studies that assessed the reliability and/or concordance and/or accuracy of photogrammetric protocols for evaluating cervical lordosis, compared with radiography. Articles published through April 2016 were selected. Two independent reviewers used a critical appraisal tool (QUADAS and QAREL) to assess the quality of the selected studies. Two studies were included in the review and had high levels of reliability (intraclass correlation coefficient: 0.974-0.98). Only 1 study assessed the concordance between the methods, which was calculated using Pearson's correlation coefficient. To date, the accuracy of photogrammetry has not been investigated thoroughly. We encountered no study in the literature that investigated the accuracy of photogrammetry in diagnosing hyperlordosis of cervical spine. However, both current studies report high levels of intra- and interrater reliability. To increase the level of evidence of photogrammetry in the evaluation of cervical lordosis, it is necessary to conduct further studies using a larger sample to increase the external validity of the findings. Copyright © 2018. Published by Elsevier Inc.
Critically re-evaluating a common technique
Geisbush, Thomas; Jones, Lyell; Weiss, Michael; Mozaffar, Tahseen; Gronseth, Gary; Rutkove, Seward B.
2016-01-01
Objectives: (1) To assess the diagnostic accuracy of EMG in radiculopathy. (2) To evaluate the intrarater reliability and interrater reliability of EMG in radiculopathy. (3) To assess the presence of confirmation bias in EMG. Methods: Three experienced academic electromyographers interpreted 3 compact discs with 20 EMG videos (10 normal, 10 radiculopathy) in a blinded, standardized fashion without information regarding the nature of the study. The EMGs were interpreted 3 times (discs A, B, C) 1 month apart. Clinical information was provided only with disc C. Intrarater reliability was calculated by comparing interpretations in discs A and B, interrater reliability by comparing interpretation between reviewers. Confirmation bias was estimated by the difference in correct interpretations when clinical information was provided. Results: Sensitivity was similar to previous reports (77%, confidence interval [CI] 63%–90%); specificity was 71%, CI 56%–85%. Intrarater reliability was good (κ 0.61, 95% CI 0.41–0.81); interrater reliability was lower (κ 0.53, CI 0.35–0.71). There was no substantial confirmation bias when clinical information was provided (absolute difference in correct responses 2.2%, CI −13.3% to 17.7%); the study lacked precision to exclude moderate confirmation bias. Conclusions: This study supports that (1) serial EMG studies should be performed by the same electromyographer since intrarater reliability is better than interrater reliability; (2) knowledge of clinical information does not bias EMG interpretation substantially; (3) EMG has moderate diagnostic accuracy for radiculopathy with modest specificity and electromyographers should exercise caution interpreting mild abnormalities. Classification of evidence: This study provides Class III evidence that EMG has moderate diagnostic accuracy and specificity for radiculopathy. PMID:26701380
Muhamad, Zailani; Ramli, Ayiesah; Amat, Salleh
2015-05-01
The aim of this study was to determine the content validity, internal consistency, test-retest reliability and inter-rater reliability of the Clinical Competency Evaluation Instrument (CCEVI) in assessing the clinical performance of physiotherapy students. This study was carried out between June and September 2013 at University Kebangsaan Malaysia (UKM), Kuala Lumpur, Malaysia. A panel of 10 experts were identified to establish content validity by evaluating and rating each of the items used in the CCEVI with regards to their relevance in measuring students' clinical competency. A total of 50 UKM undergraduate physiotherapy students were assessed throughout their clinical placement to determine the construct validity of these items. The instrument's reliability was determined through a cross-sectional study involving a clinical performance assessment of 14 final-year undergraduate physiotherapy students. The content validity index of the entire CCEVI was 0.91, while the proportion of agreement on the content validity indices ranged from 0.83-1.00. The CCEVI construct validity was established with factor loading of ≥0.6, while internal consistency (Cronbach's alpha) overall was 0.97. Test-retest reliability of the CCEVI was confirmed with a Pearson's correlation range of 0.91-0.97 and an intraclass coefficient correlation range of 0.95-0.98. Inter-rater reliability of the CCEVI domains ranged from 0.59 to 0.97 on initial and subsequent assessments. This pilot study confirmed the content validity of the CCEVI. It showed high internal consistency, thereby providing evidence that the CCEVI has moderate to excellent inter-rater reliability. However, additional refinement in the wording of the CCEVI items, particularly in the domains of safety and documentation, is recommended to further improve the validity and reliability of the instrument.
O'Grady, Michael G; Dusing, Stacey C
2015-01-01
Play is vital for development. Infants and children learn through play. Traditional standardized developmental tests measure whether a child performs individual skills within controlled environments. Play-based assessments can measure skill performance during natural, child-driven play. The purpose of this study was to systematically review reliability, validity, and responsiveness of all play-based assessments that quantify motor and cognitive skills in children from birth to 36 months of age. Studies were identified from a literature search using PubMed, ERIC, CINAHL, and PsycINFO databases and the reference lists of included papers. Included studies investigated reliability, validity, or responsiveness of play-based assessments that measured motor and cognitive skills for children to 36 months of age. Two reviewers independently screened 40 studies for eligibility and inclusion. The reviewers independently extracted reliability, validity, and responsiveness data. They examined measurement properties and methodological quality of the included studies. Four current play-based assessment tools were identified in 8 included studies. Each play-based assessment tool measured motor and cognitive skills in a different way during play. Interrater reliability correlations ranged from .86 to .98 for motor development and from .23 to .90 for cognitive development. Test-retest reliability correlations ranged from .88 to .95 for motor development and from .45 to .91 for cognitive development. Structural validity correlations ranged from .62 to .90 for motor development and from .42 to .93 for cognitive development. One study assessed responsiveness to change in motor development. Most studies had small and poorly described samples. Lack of transparency in data management and statistical analysis was common. Play-based assessments have potential to be reliable and valid tools to assess cognitive and motor skills, but higher-quality research is needed. Psychometric properties should be considered for each play-based assessment before it is used in clinical and research practice. © 2015 American Physical Therapy Association.
Reliability of joint count assessment in rheumatoid arthritis: a systematic literature review.
Cheung, Peter P; Gossec, Laure; Mak, Anselm; March, Lyn
2014-06-01
Joint counts are central to the assessment of rheumatoid arthritis (RA) but reliability is an issue. To evaluate the reliability and agreement of joint counts (intra-observer and inter-observer) by health care professionals (physicians, nurses, and metrologists) and patients in RA, and the impact of training and standardization on joint count reliability through a systematic literature review. Articles reporting joint count reliability or agreement in RA in PubMed, EMBase, and the Cochrane library between 1960 and 2012 were selected. Data were extracted regarding tender joint counts (TJCs) and swollen joint counts (SJCs) derived by physicians, metrologists, or patients for intra-observer and inter-observer reliability. In addition, methods and effects of training or standardization were extracted. Statistics expressing reliability such as intraclass correlation coefficients (ICCs) were extracted. Data analysis was primarily descriptive due to high heterogeneity. Twenty-eight studies on health care professionals (HCP) and 20 studies on patients were included. Intra-observer reliability for TJCs and SJCs was good for HCPs and patients (range of ICC: 0.49-0.98). Inter-observer reliability between HCPs for TJCs was higher than for SJCs (range of ICC: 0.64-0.88 vs. 0.29-0.98). Patient inter-observer reliability with HCPs as comparators was better for TJCs (range of ICC: 0.31-0.91) compared to SJCs (0.16-0.64). Nine studies (7 with HCPs and 2 with patients) evaluated consensus or training, with improvement in reliability of TJCs but conflicting evidence for SJCs. Intra- and inter-observer reliability was high for TJCs for HCPs and patients: among all groups, reliability was better for TJCs than SJCs. Inter-observer reliability of SJCs was poorer for patients than HCPs. Data were inconclusive regarding the potential for training to improve SJC reliability. Overall, the results support further evaluation for patient-reported joint counts as an outcome measure. © 2013 Published by Elsevier Inc.
Martín-Rodríguez, Saúl; Loturco, Irineu; Hunter, Angus M; Rodríguez-Ruiz, David; Munguia-Izquierdo, Diego
2017-12-01
Martín-Rodríguez, S, Loturco, I, Hunter, AM, Rodríguez-Ruiz, D, and Munguia-Izquierdo, D. Reliability and measurement error of tensiomyography to assess mechanical muscle function: A systematic review. J Strength Cond Res 31(12): 3524-3536, 2017-Interest in studying mechanical skeletal muscle function through tensiomyography (TMG) has increased in recent years. This systematic review aimed to (a) report the reliability and measurement error of all TMG parameters (i.e., maximum radial displacement of the muscle belly [Dm], contraction time [Tc], delay time [Td], half-relaxation time [½ Tr], and sustained contraction time [Ts]) and (b) to provide critical reflection on how to perform accurate and appropriate measurements for informing clinicians, exercise professionals, and researchers. A comprehensive literature search was performed of the Pubmed, Scopus, Science Direct, and Cochrane databases up to July 2017. Eight studies were included in this systematic review. Meta-analysis could not be performed because of the low quality of the evidence of some studies evaluated. Overall, the review of the 9 studies involving 158 participants revealed high relative reliability (intraclass correlation coefficient [ICC]) for Dm (0.91-0.99); moderate-to-high ICC for Ts (0.80-0.96), Tc (0.70-0.98), and ½ Tr (0.77-0.93); and low-to-high ICC for Td (0.60-0.98), independently of the evaluated muscles. In addition, absolute reliability (coefficient of variation [CV]) was low for all TMG parameters except for ½ Tr (CV = >20%), whereas measurement error indexes were high for this parameter. In conclusion, this study indicates that 3 of the TMG parameters (Dm, Td, and Tc) are highly reliable, whereas ½ Tr demonstrate insufficient reliability, and thus should not be used in future studies.
Patterson, P Daniel; Weaver, Matthew D; Fabio, Anthony; Teasley, Ellen M; Renn, Megan L; Curtis, Brett R; Matthews, Margaret E; Kroemer, Andrew J; Xun, Xiaoshuang; Bizhanova, Zhadyra; Weiss, Patricia M; Sequeira, Denisse J; Coppler, Patrick J; Lang, Eddy S; Higgins, J Stephen
2018-02-15
This study sought to systematically search the literature to identify reliable and valid survey instruments for fatigue measurement in the Emergency Medical Services (EMS) occupational setting. A systematic review study design was used and searched six databases, including one website. The research question guiding the search was developed a priori and registered with the PROSPERO database of systematic reviews: "Are there reliable and valid instruments for measuring fatigue among EMS personnel?" (2016:CRD42016040097). The primary outcome of interest was criterion-related validity. Important outcomes of interest included reliability (e.g., internal consistency), and indicators of sensitivity and specificity. Members of the research team independently screened records from the databases. Full-text articles were evaluated by adapting the Bolster and Rourke system for categorizing findings of systematic reviews, and the rated data abstracted from the body of literature as favorable, unfavorable, mixed/inconclusive, or no impact. The Grading of Recommendations, Assessment, Development and Evaluation (GRADE) methodology was used to evaluate the quality of evidence. The search strategy yielded 1,257 unique records. Thirty-four unique experimental and non-experimental studies were determined relevant following full-text review. Nineteen studies reported on the reliability and/or validity of ten different fatigue survey instruments. Eighteen different studies evaluated the reliability and/or validity of four different sleepiness survey instruments. None of the retained studies reported sensitivity or specificity. Evidence quality was rated as very low across all outcomes. In this systematic review, limited evidence of the reliability and validity of 14 different survey instruments to assess the fatigue and/or sleepiness status of EMS personnel and related shift worker groups was identified.
Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies.
Barth, Jürgen; de Boer, Wout E L; Busse, Jason W; Hoving, Jan L; Kedzia, Sarah; Couban, Rachel; Fischer, Katrin; von Allmen, David Y; Spanjer, Jerry; Kunz, Regina
2017-01-25
To explore agreement among healthcare professionals assessing eligibility for work disability benefits. Systematic review and narrative synthesis of reproducibility studies. Medline, Embase, and PsycINFO searched up to 16 March 2016, without language restrictions, and review of bibliographies of included studies. Observational studies investigating reproducibility among healthcare professionals performing disability evaluations using a global rating of working capacity and reporting inter-rater reliability by a statistical measure or descriptively. Studies could be conducted in insurance settings, where decisions on ability to work include normative judgments based on legal considerations, or in research settings, where decisions on ability to work disregard normative considerations. : Teams of paired reviewers identified eligible studies, appraised their methodological quality and generalisability, and abstracted results with pretested forms. As heterogeneity of research designs and findings impeded a quantitative analysis, a descriptive synthesis stratified by setting (insurance or research) was performed. From 4562 references, 101 full text articles were reviewed. Of these, 16 studies conducted in an insurance setting and seven in a research setting, performed in 12 countries, met the inclusion criteria. Studies in the insurance setting were conducted with medical experts assessing claimants who were actual disability claimants or played by actors, hypothetical cases, or short written scenarios. Conditions were mental (n=6, 38%), musculoskeletal (n=4, 25%), or mixed (n=6, 38%). Applicability of findings from studies conducted in an insurance setting to real life evaluations ranged from generalisable (n=7, 44%) and probably generalisable (n=3, 19%) to probably not generalisable (n=6, 37%). Median inter-rater reliability among experts was 0.45 (range intraclass correlation coefficient 0.86 to κ-0.10). Inter-rater reliability was poor in six studies (37%) and excellent in only two (13%). This contrasts with studies conducted in the research setting, where the median inter-rater reliability was 0.76 (range 0.91-0.53), and 71% (5/7) studies achieved excellent inter-rater reliability. Reliability between assessing professionals was higher when the evaluation was guided by a standardised instrument (23 studies, P=0.006). No such association was detected for subjective or chronic health conditions or the studies' generalisability to real world evaluation of disability (P=0.46, 0.45, and 0.65, respectively). Despite their common use and far reaching consequences for workers claiming disabling injury or illness, research on the reliability of medical evaluations of disability for work is limited and indicates high variation in judgments among assessing professionals. Standardising the evaluation process could improve reliability. Development and testing of instruments and structured approaches to improve reliability in evaluation of disability are urgently needed. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Cuchna, Jennifer W; Hoch, Matthew C; Hoch, Johanna M
2016-05-01
To synthesize the literature and perform a meta-analysis for both the interrater and intrarater reliability of the FMS™. Academic Search Complete, CINAHL, Medline and SportsDiscus databases were systematically searched from inception to March 2015. Studies were included if the primary purpose was to determine the interrater or intrarater reliability of the FMS™, assessed and scored all 7-items using the standard scoring criteria, provided a composite score and employed intraclass correlation coefficients (ICCs). Studies were excluded if reliability was not the primary aim, participants were injured at data collection, or a modified FMS™ or scoring system was utilized. Seven papers were included; 6 assessing interrater and 6 assessing intrarater reliability. There was moderate evidence in good interrater reliability with a summary ICC of 0.843 (95% CI = 0.640, 0.936; Q7 = 84.915, p < 0.0001). There was moderate evidence in good intrarater reliability with a summary ICC of 0.869 (95% CI = 0.785, 0.921; Q12 = 60.763, p < 0.0001). There was moderate evidence for both forms of reliability. The sensitivity assessments revealed this interpretation is stable and not influenced by any one study. Overall, the FMS™ is a reliable tool for clinical practice. Copyright © 2015 Elsevier Ltd. All rights reserved.
Hanson, Lisa C; Taylor, Nicholas F; McBurney, Helen
2016-09-01
To determine the retest reliability of the 10m incremental shuttle walk test (ISWT) in a mixed cardiac rehabilitation population. Participants completed two 10m ISWTs in a single session in a repeated measures study. Ten participants completed a third 10m ISWT as part of a pilot study. Hospital physiotherapy department. 62 adults aged a mean of 68 years (SD 10) referred to a cardiac rehabilitation program. Retest reliability of the 10m ISWT expressed as relative reliability and measurement error. Relative reliability was expressed in a ratio in the form of an intraclass correlation coefficient (ICC) and measurement error in the form of the standard error of measurement (SEM) and 95% confidence intervals for the group and individual. There was a high level of relative reliability over the two walks with an ICC of .99. The SEMagreement was 17m, and a change of at least 23m for the group and 54m for the individual would be required to be 95% confident of exceeding measurement error. The 10m ISWT demonstrated good retest reliability and is sufficiently reliable to be applied in practice in this population without the use of a practice test. Copyright © 2015 Chartered Society of Physiotherapy. Published by Elsevier Ltd. All rights reserved.
Test-retest and between-site reliability in a multicenter fMRI study.
Friedman, Lee; Stern, Hal; Brown, Gregory G; Mathalon, Daniel H; Turner, Jessica; Glover, Gary H; Gollub, Randy L; Lauriello, John; Lim, Kelvin O; Cannon, Tyrone; Greve, Douglas N; Bockholt, Henry Jeremy; Belger, Aysenil; Mueller, Bryon; Doty, Michael J; He, Jianchun; Wells, William; Smyth, Padhraic; Pieper, Steve; Kim, Seyoung; Kubicki, Marek; Vangel, Mark; Potkin, Steven G
2008-08-01
In the present report, estimates of test-retest and between-site reliability of fMRI assessments were produced in the context of a multicenter fMRI reliability study (FBIRN Phase 1, www.nbirn.net). Five subjects were scanned on 10 MRI scanners on two occasions. The fMRI task was a simple block design sensorimotor task. The impulse response functions to the stimulation block were derived using an FIR-deconvolution analysis with FMRISTAT. Six functionally-derived ROIs covering the visual, auditory and motor cortices, created from a prior analysis, were used. Two dependent variables were compared: percent signal change and contrast-to-noise-ratio. Reliability was assessed with intraclass correlation coefficients derived from a variance components analysis. Test-retest reliability was high, but initially, between-site reliability was low, indicating a strong contribution from site and site-by-subject variance. However, a number of factors that can markedly improve between-site reliability were uncovered, including increasing the size of the ROIs, adjusting for smoothness differences, and inclusion of additional runs. By employing multiple steps, between-site reliability for 3T scanners was increased by 123%. Dropping one site at a time and assessing reliability can be a useful method of assessing the sensitivity of the results to particular sites. These findings should provide guidance toothers on the best practices for future multicenter studies.
An In vitro evaluation of the reliability of QR code denture labeling technique
Poovannan, Sindhu; Jain, Ashish R.; Krishnan, Cakku Jalliah Venkata; Chandran, Chitraa R.
2016-01-01
Statement of Problem: Positive identification of the dead after accidents and disasters through labeled dentures plays a key role in forensic scenario. A number of denture labeling methods are available, and studies evaluating their reliability under drastic conditions are vital. Aim: This study was conducted to evaluate the reliability of QR (Quick Response) Code labeled at various depths in heat-cured acrylic blocks after acid treatment, heat treatment (burns), and fracture in forensics. It was an in vitro study. Materials and Methods: This study included 160 specimens of heat-cured acrylic blocks (1.8 cm × 1.8 cm) and these were divided into 4 groups (40 samples per group). QR Codes were incorporated in the samples using clear acrylic sheet and they were assessed for reliability under various depths, acid, heat, and fracture. Data were analyzed using Chi-square test, test of proportion. Results: The QR Code inclusion technique was reliable under various depths of acrylic sheet, acid (sulfuric acid 99%, hydrochloric acid 40%) and heat (up to 370°C). Results were variable with fracture of QR Code labeled acrylic blocks. Conclusion: Within the limitations of the study, by analyzing the results, it was clearly indicated that the QR Code technique was reliable under various depths of acrylic sheet, acid, and heat (370°C). Effectiveness varied in fracture and depended on the level of distortion. This study thus suggests that QR Code is an effective and simpler denture labeling method. PMID:28123284
Moore, Martha; Barker, Karen
2017-09-11
The four square step test (FSST) was first validated in healthy older adults to provide a measure of dynamic standing balance and mobility. The FSST has since been used in a variety of patient populations. The purpose of this systematic review is to determine the validity and reliability of the FSST in these different adult patient populations. The literature search was conducted to highlight all the studies that measured validity and reliability of the FSST. Six electronic databases were searched including AMED, CINAHL, MEDLINE, PEDro, Web of Science and Google Scholar. Grey literature was also searched for any documents relevant to the review. Two independent reviewers carried out study selection and quality assessment. The methodological quality was assessed using the QUADAS-2 tool, which is a validated tool for the quality assessment of diagnostic accuracy studies, and the COSMIN four-point checklist, which contains standards for evaluating reliability studies on the measurement properties of health instruments. Fifteen studies were reviewed studying community-dwelling older adults, Parkinson's disease, Huntington's disease, multiple sclerosis, vestibular disorders, post stroke, post unilateral transtibial amputation, knee pain and hip osteoarthritis. Three of the studies were of moderate methodological quality scoring low in risk of bias and applicability for all domains in the QUADAS-2 tool. Three studies scored "fair" on the COSMIN four-point checklist for the reliability components. The concurrent validity of the FSST was measured in nine of the studies with moderate to strong correlations being found. Excellent Intraclass Correlation Coefficients were found between physiotherapists carrying out the tests (ICC = .99) with good to excellent test-retest reliability shown in nine of the studies (ICC = .73-.98). The FSST may be an effective and valid tool for measuring dynamic balance and a participants' falls risk. It has been shown to have strong correlations with other measures of balance and mobility with good reliability shown in a number of populations. However, the quality of the papers reviewed was variable with key factors, such as sample size and test set up, needing to be addressed before the tool can be confidently used in these specified populations.
Reliability Generalization of the Alcohol Use Disorder Identification Test.
ERIC Educational Resources Information Center
Shields, Alan L.; Caruso, John C.
2002-01-01
Evaluated the reliability of scores from the Alcohol Use Disorders Identification Test (AUDIT; J. Sounders and others, 1993) in a reliability generalization study based on 17 empirical journal articles. Results show AUDIT scores to be generally reliable for basic assessment. (SLD)
The Attitude Determination Scale for Value Acquisition: A Validity and Reliability Study
ERIC Educational Resources Information Center
Cetin, Saban
2017-01-01
This study aims to develop a measurement tool having measurement reliability with the aim of determining attitudes for values acquisition of secondary school students. The study was conducted on totally 325 high school senior students as 200 female and 125 male students in spring semester of 2014-2015 educational year. In the study, expert opinion…
The Chinese Version of the Self-Report Family Inventory: Reliability and Validity.
ERIC Educational Resources Information Center
Shek, Daniel T. L.; Lai, Kelly Y. C.
2001-01-01
Reliability and validity of Chinese Self-Report Family Inventory (C-SFI) were examined in three studies. Study 1 showed C-SFI was temporally stable and internally consistent. Study 2 indicated C-SFI could discriminate between clinical and nonclinical groups. Study 3 gave support for internal consistency, concurrent validity and construct validity.…
Berger, Aaron J; Momeni, Arash; Ladd, Amy L
2014-04-01
Trapeziometacarpal, or thumb carpometacarpal (CMC), arthritis is a common problem with a variety of treatment options. Although widely used, the Eaton radiographic staging system for CMC arthritis is of questionable clinical utility, as disease severity does not predictably correlate with symptoms or treatment recommendations. A possible reason for this is that the classification itself may not be reliable, but the literature on this has not, to our knowledge, been systematically reviewed. We therefore performed a systematic review to determine the intra- and interobserver reliability of the Eaton staging system. We systematically reviewed English-language studies published between 1973 and 2013 to assess the degree of intra- and interobserver reliability of the Eaton classification for determining the stage of trapeziometacarpal joint arthritis and pantrapezial arthritis based on plain radiographic imaging. Search engines included: PubMed, Scopus(®), and CINAHL. Four studies, which included a total of 163 patients, met our inclusion criteria and were evaluated. The level of evidence of the studies included in this analysis was determined using the Oxford Centre for Evidence Based Medicine Levels of Evidence Classification by two independent observers. A limited number of studies have been performed to assess intra- and interobserver reliability of the Eaton classification system. The four studies included were determined to be Level 3b. These studies collectively indicate that the Eaton classification demonstrates poor to fair interobserver reliability (kappa values: 0.11-0.56) and fair to moderate intraobserver reliability (kappa values: 0.54-0.657). Review of the literature demonstrates that radiographs assist in the assessment of CMC joint disease, but there is not a reliable system for classification of disease severity. Currently, diagnosis and treatment of thumb CMC arthritis are based on the surgeon's qualitative assessment combining history, physical examination, and radiographic evaluation. Inconsistent agreement using the current common radiographic classification system suggests a need for better radiographic tools to quantify disease severity.
Reliability of two social cognition tests: The combined stories test and the social knowledge test.
Thibaudeau, Élisabeth; Cellard, Caroline; Legendre, Maxime; Villeneuve, Karèle; Achim, Amélie M
2018-04-01
Deficits in social cognition are common in psychiatric disorders. Validated social cognition measures with good psychometric properties are necessary to assess and target social cognitive deficits. Two recent social cognition tests, the Combined Stories Test (COST) and the Social Knowledge Test (SKT), respectively assess theory of mind and social knowledge. Previous studies have shown good psychometric properties for these tests, but the test-retest reliability has never been documented. The aim of this study was to evaluate the test-retest reliability and the inter-rater reliability of the COST and the SKT. The COST and the SKT were administered twice to a group of forty-two healthy adults, with a delay of approximately four weeks between the assessments. Excellent test-retest reliability was observed for the COST, and a good test-retest reliability was observed for the SKT. There was no evidence of practice effect. Furthermore, an excellent inter-rater reliability was observed for both tests. This study shows a good reliability of the COST and the SKT that adds to the good validity previously reported for these two tests. These good psychometrics properties thus support that the COST and the SKT are adequate measures for the assessment of social cognition. Copyright © 2018. Published by Elsevier B.V.
Savage, Trevor Nicholas; McIntosh, Andrew Stuart
2017-03-01
It is important to understand factors contributing to and directly causing sports injuries to improve the effectiveness and safety of sports skills. The characteristics of injury events must be evaluated and described meaningfully and reliably. However, many complex skills cannot be effectively investigated quantitatively because of ethical, technological and validity considerations. Increasingly, qualitative methods are being used to investigate human movement for research purposes, but there are concerns about reliability and measurement bias of such methods. Using the tackle in Rugby union as an example, we outline a systematic approach for developing a skill analysis protocol with a focus on improving objectivity, validity and reliability. Characteristics for analysis were selected using qualitative analysis and biomechanical theoretical models and epidemiological and coaching literature. An expert panel comprising subject matter experts provided feedback and the inter-rater reliability of the protocol was assessed using ten trained raters. The inter-rater reliability results were reviewed by the expert panel and the protocol was revised and assessed in a second inter-rater reliability study. Mean agreement in the second study improved and was comparable (52-90% agreement and ICC between 0.6 and 0.9) with other studies that have reported inter-rater reliability of qualitative analysis of human movement.
Lindskog, Marcus; Winman, Anders; Juslin, Peter; Poom, Leo
2013-01-01
Two studies investigated the reliability and predictive validity of commonly used measures and models of Approximate Number System acuity (ANS). Study 1 investigated reliability by both an empirical approach and a simulation of maximum obtainable reliability under ideal conditions. Results showed that common measures of the Weber fraction (w) are reliable only when using a substantial number of trials, even under ideal conditions. Study 2 compared different purported measures of ANS acuity as for convergent and predictive validity in a within-subjects design and evaluated an adaptive test using the ZEST algorithm. Results showed that the adaptive measure can reduce the number of trials needed to reach acceptable reliability. Only direct tests with non-symbolic numerosity discriminations of stimuli presented simultaneously were related to arithmetic fluency. This correlation remained when controlling for general cognitive ability and perceptual speed. Further, the purported indirect measure of ANS acuity in terms of the Numeric Distance Effect (NDE) was not reliable and showed no sign of predictive validity. The non-symbolic NDE for reaction time was significantly related to direct w estimates in a direction contrary to the expected. Easier stimuli were found to be more reliable, but only harder (7:8 ratio) stimuli contributed to predictive validity. PMID:23964256
Janssen, Ellen M; Marshall, Deborah A; Hauber, A Brett; Bridges, John F P
2017-12-01
The recent endorsement of discrete-choice experiments (DCEs) and other stated-preference methods by regulatory and health technology assessment (HTA) agencies has placed a greater focus on demonstrating the validity and reliability of preference results. Areas covered: We present a practical overview of tests of validity and reliability that have been applied in the health DCE literature and explore other study qualities of DCEs. From the published literature, we identify a variety of methods to assess the validity and reliability of DCEs. We conceptualize these methods to create a conceptual model with four domains: measurement validity, measurement reliability, choice validity, and choice reliability. Each domain consists of three categories that can be assessed using one to four procedures (for a total of 24 tests). We present how these tests have been applied in the literature and direct readers to applications of these tests in the health DCE literature. Based on a stakeholder engagement exercise, we consider the importance of study characteristics beyond traditional concepts of validity and reliability. Expert commentary: We discuss study design considerations to assess the validity and reliability of a DCE, consider limitations to the current application of tests, and discuss future work to consider the quality of DCEs in healthcare.
Harris, Joshua D; Erickson, Brandon J; Cvetanovich, Gregory L; Abrams, Geoffrey D; McCormick, Frank M; Gupta, Anil K; Verma, Nikhil N; Bach, Bernard R; Cole, Brian J
2014-02-01
Condition-specific questionnaires are important components in evaluation of outcomes of surgical interventions. No condition-specific study methodological quality questionnaire exists for evaluation of outcomes of articular cartilage surgery in the knee. To develop a reliable and valid knee articular cartilage-specific study methodological quality questionnaire. Cross-sectional study. A stepwise, a priori-designed framework was created for development of a novel questionnaire. Relevant items to the topic were identified and extracted from a recent systematic review of 194 investigations of knee articular cartilage surgery. In addition, relevant items from existing generic study methodological quality questionnaires were identified. Items for a preliminary questionnaire were generated. Redundant and irrelevant items were eliminated, and acceptable items modified. The instrument was pretested and items weighed. The instrument, the MARK score (Methodological quality of ARticular cartilage studies of the Knee), was tested for validity (criterion validity) and reliability (inter- and intraobserver). A 19-item, 3-domain MARK score was developed. The 100-point scale score demonstrated face validity (focus group of 8 orthopaedic surgeons) and criterion validity (strong correlation to Cochrane Quality Assessment score and Modified Coleman Methodology Score). Interobserver reliability for the overall score was good (intraclass correlation coefficient [ICC], 0.842), and for all individual items of the MARK score, acceptable to perfect (ICC, 0.70-1.000). Intraobserver reliability ICC assessed over a 3-week interval was strong for 2 reviewers (≥0.90). The MARK score is a valid and reliable knee articular cartilage condition-specific study methodological quality instrument. This condition-specific questionnaire may be used to evaluate the quality of studies reporting outcomes of articular cartilage surgery in the knee.
Bravo, G; Bragança, S; Arezes, P M; Molenbroek, J F M; Castellucci, H I
2018-05-22
Despite offering many benefits, direct manual anthropometric measurement method can be problematic due to their vulnerability to measurement errors. The purpose of this literature review was to determine, whether or not the currently published anthropometric studies of school children, related to ergonomics, mentioned or evaluated the variables precision, reliability or accuracy in the direct manual measurement method. Two bibliographic databases, and the bibliographic references of all the selected papers were used for finding relevant published papers in the fields considered in this study. Forty-six (46) studies met the criteria previously defined for this literature review. However, only ten (10) studies mentioned at least one of the analyzed variables, and none has evaluated all of them. Only reliability was assessed by three papers. Moreover, in what regards the factors that affect precision, reliability and accuracy, the reviewed papers presented large differences. This was particularly clear in the instruments used for the measurements, which were not consistent throughout the studies. Additionally, it was also clear that there was a lack of information regarding the evaluators' training and procedures for anthropometric data collection, which are assumed to be the most important issues that affect precision, reliability and accuracy. Based on the review of the literature, it was possible to conclude that the considered anthropometric studies had not focused their attention to the analysis of precision, reliability and accuracy of the manual measurement methods. Hence, and with the aim of avoiding measurement errors and misleading data, anthropometric studies should put more efforts and care on testing measurement error and defining the procedures used to collect anthropometric data.
Reliability of anthropometric measurements in European preschool children: the ToyBox-study.
De Miguel-Etayo, P; Mesana, M I; Cardon, G; De Bourdeaudhuij, I; Góźdź, M; Socha, P; Lateva, M; Iotova, V; Koletzko, B V; Duvinage, K; Androutsos, O; Manios, Y; Moreno, L A
2014-08-01
The ToyBox-study aims to develop and test an innovative and evidence-based obesity prevention programme for preschoolers in six European countries: Belgium, Bulgaria, Germany, Greece, Poland and Spain. In multicentre studies, anthropometric measurements using standardized procedures that minimize errors in the data collection are essential to maximize reliability of measurements. The aim of this paper is to describe the standardization process and reliability (intra- and inter-observer) of height, weight and waist circumference (WC) measurements in preschoolers. All technical procedures and devices were standardized and centralized training was given to the fieldworkers. At least seven children per country participated in the intra- and inter-observer reliability testing. Intra-observer technical error ranged from 0.00 to 0.03 kg for weight and from 0.07 to 0.20 cm for height, with the overall reliability being above 99%. A second training was organized for WC due to low reliability observed in the first training. Intra-observer technical error for WC ranged from 0.12 to 0.71 cm during the first training and from 0.05 to 1.11 cm during the second training, and reliability above 92% was achieved. Epidemiological surveys need standardized procedures and training of researchers to reduce measurement error. In the ToyBox-study, very good intra- and-inter-observer agreement was achieved for all anthropometric measurements performed. © 2014 World Obesity.
Aerts, Frank; Carrier, Kathy; Alwood, Becky
2016-01-01
Background: The assessment of clinical manifestation of muscle fatigue is an effective procedure in establishing therapeutic exercise dose. Few studies have evaluated physical therapist reliability in establishing muscle fatigue through detection of changes in quality of movement patterns in a live setting. Objective: The purpose of this study is to evaluate the inter-rater reliability of physical therapists’ ability to detect altered movement patterns due to muscle fatigue. Design: A reliability study in a live setting with multiple raters. Participants: Forty-four healthy individuals (ages 19-35) were evaluated by six physical therapists in a live setting. Methods: Participants were evaluated by physical therapists for altered movement patterns during resisted shoulder rotation. Each participant completed a total of four tests: right shoulder internal rotation, right shoulder external rotation, left shoulder internal rotation and left shoulder external rotation. Results: For all tests combined, the inter-rater reliability for a single rater scoring ICC (2,1) was .65 (95%, .60, .71) This corresponds to moderate inter-rater reliability between physical therapists. Limitations: The results of this study apply only to healthy participants and therefore cannot be generalized to a symptomatic population. Conclusion: Moderate inter-rater reliability was found between physical therapists in establishing muscle fatigue through the observation of sustained altered movement patterns during dynamic resistive shoulder internal and external rotation. PMID:27347241
Koontz, Alicia M.; Lin, Yen-Sheng; Kankipati, Padmaja; Boninger, Michael L.; Cooper, Rory A.
2017-01-01
This study describes a new custom measurement system designed to investigate the biomechanics of sitting-pivot wheelchair transfers and assesses the reliability of selected biomechanical variables. Variables assessed include horizontal and vertical reaction forces underneath both hands and three-dimensional trunk, shoulder, and elbow range of motion. We examined the reliability of these measures between 5 consecutive transfer trials for 5 subjects with spinal cord injury and 12 non-disabled subjects while they performed a self-selected sitting pivot transfer from a wheelchair to a level bench. A majority of the biomechanical variables demonstrated moderate to excellent reliability (r > 0.6). The transfer measurement system recorded reliable and valid biomechanical data for future studies of sitting-pivot wheelchair transfers. We recommend a minimum of five transfer trials to obtain a reliable measure of transfer technique for future studies. PMID:22068376
Reliability studies of Integrated Modular Engine system designs
NASA Technical Reports Server (NTRS)
Hardy, Terry L.; Rapp, Douglas C.
1993-01-01
A study was performed to evaluate the reliability of Integrated Modular Engine (IME) concepts. Comparisons were made between networked IME systems and non-networked discrete systems using expander cycle configurations. Both redundant and non-redundant systems were analyzed. Binomial approximation and Markov analysis techniques were employed to evaluate total system reliability. In addition, Failure Modes and Effects Analyses (FMEA), Preliminary Hazard Analyses (PHA), and Fault Tree Analysis (FTA) were performed to allow detailed evaluation of the IME concept. A discussion of these system reliability concepts is also presented.
Reliability studies of integrated modular engine system designs
NASA Technical Reports Server (NTRS)
Hardy, Terry L.; Rapp, Douglas C.
1993-01-01
A study was performed to evaluate the reliability of Integrated Modular Engine (IME) concepts. Comparisons were made between networked IME systems and non-networked discrete systems using expander cycle configurations. Both redundant and non-redundant systems were analyzed. Binomial approximation and Markov analysis techniques were employed to evaluate total system reliability. In addition, Failure Modes and Effects Analyses (FMEA), Preliminary Hazard Analyses (PHA), and Fault Tree Analysis (FTA) were performed to allow detailed evaluation of the IME concept. A discussion of these system reliability concepts is also presented.
Reliability studies of integrated modular engine system designs
NASA Astrophysics Data System (ADS)
Hardy, Terry L.; Rapp, Douglas C.
1993-06-01
A study was performed to evaluate the reliability of Integrated Modular Engine (IME) concepts. Comparisons were made between networked IME systems and non-networked discrete systems using expander cycle configurations. Both redundant and non-redundant systems were analyzed. Binomial approximation and Markov analysis techniques were employed to evaluate total system reliability. In addition, Failure Modes and Effects Analyses (FMEA), Preliminary Hazard Analyses (PHA), and Fault Tree Analysis (FTA) were performed to allow detailed evaluation of the IME concept. A discussion of these system reliability concepts is also presented.
Reliability studies of Integrated Modular Engine system designs
NASA Astrophysics Data System (ADS)
Hardy, Terry L.; Rapp, Douglas C.
1993-06-01
A study was performed to evaluate the reliability of Integrated Modular Engine (IME) concepts. Comparisons were made between networked IME systems and non-networked discrete systems using expander cycle configurations. Both redundant and non-redundant systems were analyzed. Binomial approximation and Markov analysis techniques were employed to evaluate total system reliability. In addition, Failure Modes and Effects Analyses (FMEA), Preliminary Hazard Analyses (PHA), and Fault Tree Analysis (FTA) were performed to allow detailed evaluation of the IME concept. A discussion of these system reliability concepts is also presented.
NASA Astrophysics Data System (ADS)
Miao, Yongchun; Kang, Rongxue; Chen, Xuefeng
2017-12-01
In recent years, with the gradual extension of reliability research, the study of production system reliability has become the hot topic in various industries. Man-machine-environment system is a complex system composed of human factors, machinery equipment and environment. The reliability of individual factor must be analyzed in order to gradually transit to the research of three-factor reliability. Meanwhile, the dynamic relationship among man-machine-environment should be considered to establish an effective blurry evaluation mechanism to truly and effectively analyze the reliability of such systems. In this paper, based on the system engineering, fuzzy theory, reliability theory, human error, environmental impact and machinery equipment failure theory, the reliabilities of human factor, machinery equipment and environment of some chemical production system were studied by the method of fuzzy evaluation. At last, the reliability of man-machine-environment system was calculated to obtain the weighted result, which indicated that the reliability value of this chemical production system was 86.29. Through the given evaluation domain it can be seen that the reliability of man-machine-environment integrated system is in a good status, and the effective measures for further improvement were proposed according to the fuzzy calculation results.
ERIC Educational Resources Information Center
Gelisli, Yücel; Beisenbayeva, Lyazzat
2017-01-01
The purpose of the current study is to develop a reliable scale to be used to determine the scientific inquiry competency perception of post-graduate students engaged in post-graduate studies in the field of educational sciences and teacher education in Kazakhstan. The study employed the descriptive method. Within the context of the study, a scale…
Lange, Toni; Matthijs, Omer; Jain, Nitin B; Schmitt, Jochen; Lützner, Jörg; Kopkow, Christian
2017-03-01
Shoulder pain in the general population is common and to identify the aetiology of shoulder pain, history, motion and muscle testing, and physical examination tests are usually performed. The aim of this systematic review was to summarise and evaluate intrarater and inter-rater reliability of physical examination tests in the diagnosis of shoulder pathologies. A comprehensive systematic literature search was conducted using MEDLINE, EMBASE, Allied and Complementary Medicine Database (AMED) and Physiotherapy Evidence Database (PEDro) through 20 March 2015. Methodological quality was assessed using the Quality Appraisal of Reliability Studies (QAREL) tool by 2 independent reviewers. The search strategy revealed 3259 articles, of which 18 finally met the inclusion criteria. These studies evaluated the reliability of 62 test and test variations used for the specific physical examination tests for the diagnosis of shoulder pathologies. Methodological quality ranged from 2 to 7 positive criteria of the 11 items of the QAREL tool. This review identified a lack of high-quality studies evaluating inter-rater as well as intrarater reliability of specific physical examination tests for the diagnosis of shoulder pathologies. In addition, reliability measures differed between included studies hindering proper cross-study comparisons. PROSPERO CRD42014009018. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Do you see what I see? Mobile eye-tracker contextual analysis and inter-rater reliability.
Stuart, S; Hunt, D; Nell, J; Godfrey, A; Hausdorff, J M; Rochester, L; Alcock, L
2018-02-01
Mobile eye-trackers are currently used during real-world tasks (e.g. gait) to monitor visual and cognitive processes, particularly in ageing and Parkinson's disease (PD). However, contextual analysis involving fixation locations during such tasks is rarely performed due to its complexity. This study adapted a validated algorithm and developed a classification method to semi-automate contextual analysis of mobile eye-tracking data. We further assessed inter-rater reliability of the proposed classification method. A mobile eye-tracker recorded eye-movements during walking in five healthy older adult controls (HC) and five people with PD. Fixations were identified using a previously validated algorithm, which was adapted to provide still images of fixation locations (n = 116). The fixation location was manually identified by two raters (DH, JN), who classified the locations. Cohen's kappa correlation coefficients determined the inter-rater reliability. The algorithm successfully provided still images for each fixation, allowing manual contextual analysis to be performed. The inter-rater reliability for classifying the fixation location was high for both PD (kappa = 0.80, 95% agreement) and HC groups (kappa = 0.80, 91% agreement), which indicated a reliable classification method. This study developed a reliable semi-automated contextual analysis method for gait studies in HC and PD. Future studies could adapt this methodology for various gait-related eye-tracking studies.
ERIC Educational Resources Information Center
Ghazali, Nor Hasnida Md
2016-01-01
A valid, reliable and practical instrument is needed to evaluate the implementation of the school-based assessment (SBA) system. The aim of this study is to develop and assess the validity and reliability of an instrument to measure the perception of teachers towards the SBA implementation in schools. The instrument is developed based on a…
Big data analytics for the Future Circular Collider reliability and availability studies
NASA Astrophysics Data System (ADS)
Begy, Volodimir; Apollonio, Andrea; Gutleber, Johannes; Martin-Marquez, Manuel; Niemi, Arto; Penttinen, Jussi-Pekka; Rogova, Elena; Romero-Marin, Antonio; Sollander, Peter
2017-10-01
Responding to the European Strategy for Particle Physics update 2013, the Future Circular Collider study explores scenarios of circular frontier colliders for the post-LHC era. One branch of the study assesses industrial approaches to model and simulate the reliability and availability of the entire particle collider complex based on the continuous monitoring of CERN’s accelerator complex operation. The modelling is based on an in-depth study of the CERN injector chain and LHC, and is carried out as a cooperative effort with the HL-LHC project. The work so far has revealed that a major challenge is obtaining accelerator monitoring and operational data with sufficient quality, to automate the data quality annotation and calculation of reliability distribution functions for systems, subsystems and components where needed. A flexible data management and analytics environment that permits integrating the heterogeneous data sources, the domain-specific data quality management algorithms and the reliability modelling and simulation suite is a key enabler to complete this accelerator operation study. This paper describes the Big Data infrastructure and analytics ecosystem that has been put in operation at CERN, serving as the foundation on which reliability and availability analysis and simulations can be built. This contribution focuses on data infrastructure and data management aspects and presents case studies chosen for its validation.
Hartling, Lisa; Bond, Kenneth; Santaguida, P Lina; Viswanathan, Meera; Dryden, Donna M
2011-08-01
To develop and test a study design classification tool. We contacted relevant organizations and individuals to identify tools used to classify study designs and ranked these using predefined criteria. The highest ranked tool was a design algorithm developed, but no longer advocated, by the Cochrane Non-Randomized Studies Methods Group; this was modified to include additional study designs and decision points. We developed a reference classification for 30 studies; 6 testers applied the tool to these studies. Interrater reliability (Fleiss' κ) and accuracy against the reference classification were assessed. The tool was further revised and retested. Initial reliability was fair among the testers (κ=0.26) and the reference standard raters κ=0.33). Testing after revisions showed improved reliability (κ=0.45, moderate agreement) with improved, but still low, accuracy. The most common disagreements were whether the study design was experimental (5 of 15 studies), and whether there was a comparison of any kind (4 of 15 studies). Agreement was higher among testers who had completed graduate level training versus those who had not. The moderate reliability and low accuracy may be because of lack of clarity and comprehensiveness of the tool, inadequate reporting of the studies, and variability in tester characteristics. The results may not be generalizable to all published studies, as the test studies were selected because they had posed challenges for previous reviewers with respect to their design classification. Application of such a tool should be accompanied by training, pilot testing, and context-specific decision rules. Copyright © 2011 Elsevier Inc. All rights reserved.
Rahman, Azriani Ab; Mohamad, Norsarwany; Imran, Musa Kamarul; Ibrahim, Wan Pauzi Wan; Othman, Azizah; Aziz, Aniza Abd; Harith, Sakinah; Ibrahim, Mohd Ismail; Ariffin, Nor Hashimah; Van Rostenberghe, Hans
2011-01-01
Background: No previous study has assessed the impact of childhood disability on parents and family in the context of Malaysia, and no instrument to measure this impact has previously been available. The objective of this cross-sectional study was to determine the reliability of a Malay version of the PedsQL™ Family Impact Module that measures the impact of children with disabilities (CWD) on their parents and family in a Malaysian context. Methods: The study was conducted in 2009. The questionnaire was translated forward and backward before it was administered to 44 caregivers of CWD to determine the internal consistency reliability. The test for Cronbach’s alpha was performed. Results: The internal consistency reliability was good. The Cronbach’s alpha for all domains was above 0.7, ranging from 0.73 to 0.895. Conclusion: The Malay version of the PedsQL™ Family Impact Module showed evidence of good internal consistency reliability. However, future studies with a larger sample size are necessary before the module can be recommended as a tool to measure the impact of disability on Malay-speaking Malaysian families. PMID:22589674
ERIC Educational Resources Information Center
Park, Bitnara Jasmine; Irvin, P. Shawn; Lai, Cheng-Fei; Alonzo, Julie; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the fifth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
ERIC Educational Resources Information Center
Lai, Cheng-Fei; Irvin, P. Shawn; Alonzo, Julie; Park, Bitnara Jasmine; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the second-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
ERIC Educational Resources Information Center
Park, Bitnara Jasmine; Irvin, P. Shawn; Alonzo, Julie; Lai, Cheng-Fei; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the fourth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
ERIC Educational Resources Information Center
Irvin, P. Shawn; Alonzo, Julie; Park, Bitnara Jasmine; Lai, Cheng-Fei; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the sixth-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
ERIC Educational Resources Information Center
Irvin, P. Shawn; Alonzo, Julie; Lai, Cheng-Fei; Park, Bitnara Jasmine; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the seventh-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
ERIC Educational Resources Information Center
Lai, Cheng-Fei; Irvin, P. Shawn; Park, Bitnara Jasmine; Alonzo, Julie; Tindal, Gerald
2012-01-01
In this technical report, we present the results of a reliability study of the third-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Scheduling for energy and reliability management on multiprocessor real-time systems
NASA Astrophysics Data System (ADS)
Qi, Xuan
Scheduling algorithms for multiprocessor real-time systems have been studied for years with many well-recognized algorithms proposed. However, it is still an evolving research area and many problems remain open due to their intrinsic complexities. With the emergence of multicore processors, it is necessary to re-investigate the scheduling problems and design/develop efficient algorithms for better system utilization, low scheduling overhead, high energy efficiency, and better system reliability. Focusing cluster schedulings with optimal global schedulers, we study the utilization bound and scheduling overhead for a class of cluster-optimal schedulers. Then, taking energy/power consumption into consideration, we developed energy-efficient scheduling algorithms for real-time systems, especially for the proliferating embedded systems with limited energy budget. As the commonly deployed energy-saving technique (e.g. dynamic voltage frequency scaling (DVFS)) will significantly affect system reliability, we study schedulers that have intelligent mechanisms to recuperate system reliability to satisfy the quality assurance requirements. Extensive simulation is conducted to evaluate the performance of the proposed algorithms on reduction of scheduling overhead, energy saving, and reliability improvement. The simulation results show that the proposed reliability-aware power management schemes could preserve the system reliability while still achieving substantial energy saving.
RELIABILITY AND VALIDITY OF SUBJECTIVE ASSESSMENT OF LUMBAR LORDOSIS IN CONVENTIONAL RADIOGRAPHY.
Ruhinda, E; Byanyima, R K; Mugerwa, H
2014-10-01
Reliability and validity studies of different lumbar curvature analysis and measurement techniques have been documented however there is limited literature on the reliability and validity of subjective visual analysis. Radiological assessment of lumbar lordotic curve aids in early diagnosis of conditions even before neurologic changes set in. To ascertain the level of reliability and validity of subjective assessment of lumbar lordosis in conventional radiography. A blinded, repeated-measures diagnostic test was carried out on lumbar spine x-ray radiographs. Radiology Department at Joint Clinical Research Centre (JCRC), Mengo-Kampala-Uganda. Seventy (70) lateral lumbar x-ray films were used for this study and were obtained from the archive of JCRC radiology department at Butikiro house, Mengo-Kampala. Poor observer agreement, both inter- and intra-observer, with kappa values of 0.16 was found. Inter-observer agreement was poorer than intra-observer agreement. Kappa values significantly rose when the lumbar lordosis was clustered into four categories without grading each abnormality. The results confirm that subjective assessment of lumbar lordosis has low reliability and validity. Film quality has limited influence on the observer reliability. This study further shows that fewer scale categories of lordosis abnormalities produce better observer reliability.
Integrating Reliability Analysis with a Performance Tool
NASA Technical Reports Server (NTRS)
Nicol, David M.; Palumbo, Daniel L.; Ulrey, Michael
1995-01-01
A large number of commercial simulation tools support performance oriented studies of complex computer and communication systems. Reliability of these systems, when desired, must be obtained by remodeling the system in a different tool. This has obvious drawbacks: (1) substantial extra effort is required to create the reliability model; (2) through modeling error the reliability model may not reflect precisely the same system as the performance model; (3) as the performance model evolves one must continuously reevaluate the validity of assumptions made in that model. In this paper we describe an approach, and a tool that implements this approach, for integrating a reliability analysis engine into a production quality simulation based performance modeling tool, and for modeling within such an integrated tool. The integrated tool allows one to use the same modeling formalisms to conduct both performance and reliability studies. We describe how the reliability analysis engine is integrated into the performance tool, describe the extensions made to the performance tool to support the reliability analysis, and consider the tool's performance.
[Reliability and validity of the Braden Scale for predicting pressure sore risk].
Boes, C
2000-12-01
For more accurate and objective pressure sore risk assessment various risk assessment tools were developed mainly in the USA and Great Britain. The Braden Scale for Predicting Pressure Sore Risk is one such example. By means of a literature analysis of German and English texts referring to the Braden Scale the scientific control criteria reliability and validity will be traced and consequences for application of the scale in Germany will be demonstrated. Analysis of 4 reliability studies shows an exclusive focus on interrater reliability. Further, even though examination of 19 validity studies occurs in many different settings, such examination is limited to the criteria sensitivity and specificity (accuracy). The range of sensitivity and specificity level is 35-100%. The recommended cut off points rank in the field of 10 to 19 points. The studies prove to be not comparable with each other. Furthermore, distortions in these studies can be found which affect accuracy of the scale. The results of the here presented analysis show an insufficient proof for reliability and validity in the American studies. In Germany, the Braden scale has not yet been tested under scientific criteria. Such testing is needed before using the scale in different German settings. During the course of such testing, construction and study procedures of the American studies can be used as a basis as can the problems be identified in the analysis presented below.
Spaan, Suzanne; Pronk, Anjoeka; Koch, Holger M.; Jusko, Todd A.; Jaddoe, Vincent W.V.; Shaw, Pamela A.; Tiemeier, Henning M.; Hofman, Albert; Pierik, Frank H.; Longnecker, Matthew P.
2014-01-01
The widespread use of organophosphate (OP) pesticides has resulted in ubiquitous exposure in humans, primarily through their diet. Exposure to OP pesticides may have adverse health effects, including neurobehavioral deficits in children. The optimal design of new studies requires data on the reliability of urinary measures of exposure. In the present study, urinary concentrations of six dialkyl phosphate (DAP) metabolites, the main urinary metabolites of OP pesticides, were determined in 120 pregnant women participating in the Generation R Study in Rotterdam. Intra-class correlation coefficients (ICCs) across serial urine specimens taken at <18, 18–25, and >25 weeks of pregnancy were determined to assess reliability. Geometric mean total DAP metabolite concentrations were 229 (GSD 2.2), 240 (GSD 2.1), and 224 (GSD 2.2) nmol/g creatinine across the three periods of gestation. Metabolite concentrations from the serial urine specimens in general correlated moderately. The ICCs for the six DAP metabolites ranged from 0.14 to 0.38 (0.30 for total DAPs), indicating weak to moderate reliability. Although the DAP metabolite levels observed in this study are slightly higher and slightly more correlated than in previous studies, the low to moderate reliability indicates a high degree of within-person variability, which presents challenges for designing well-powered epidemiologic studies. PMID:25515376
Flight control electronics reliability/maintenance study
NASA Technical Reports Server (NTRS)
Dade, W. W.; Edwards, R. H.; Katt, G. T.; Mcclellan, K. L.; Shomber, H. A.
1977-01-01
Collection and analysis of data are reported that concern the reliability and maintenance experience of flight control system electronics currently in use on passenger carrying jet aircraft. Two airlines B-747 airplane fleets were analyzed to assess the component reliability, system functional reliability, and achieved availability of the CAT II configuration flight control system. Also assessed were the costs generated by this system in the categories of spare equipment, schedule irregularity, and line and shop maintenance. The results indicate that although there is a marked difference in the geographic location and route pattern between the airlines studied, there is a close similarity in the reliability and the maintenance costs associated with the flight control electronics.
The interrater reliability of DSM III in children.
Werry, J S; Methven, R J; Fitzpatrick, J; Dixon, H
1983-09-01
A total of 195 admissions to a child psychiatric inpatient unit were diagnosed independently by two to four clinicians on the basis of case presentations at the first ward-round after admission. The DSM III as a whole and the major categories were of high or acceptable reliability, though a few were clearly unreliable. The results are generally consistent with other studies. Unlike other studies, the subcategories were examined and found to vary widely in reliability both as a whole across the system and within parent major categories, throwing considerable doubt upon their utility. The results indicate the need both for improved diagnostic data-gathering techniques in child psychiatry and for more better-designed studies of reliability and, most necessarily, of validity.
2nd Generation Reusable Launch Vehicle (2G RLV). Revised
NASA Technical Reports Server (NTRS)
Matlock, Steve; Sides, Steve; Kmiec, Tom; Arbogast, Tim; Mayers, Tom; Doehnert, Bill
2001-01-01
This is a revised final report and addresses all of the work performed on this program. Specifically, it covers vehicle architecture background, definition of six baseline engine cycles, reliability baseline (space shuttle main engine QRAS), and component level reliability/performance/cost for the six baseline cycles, and selection of 3 cycles for further study. This report further addresses technology improvement selection and component level reliability/performance/cost for the three cycles selected for further study, as well as risk reduction plans, and recommendation for future studies.
NASA Applications and Lessons Learned in Reliability Engineering
NASA Technical Reports Server (NTRS)
Safie, Fayssal M.; Fuller, Raymond P.
2011-01-01
Since the Shuttle Challenger accident in 1986, communities across NASA have been developing and extensively using quantitative reliability and risk assessment methods in their decision making process. This paper discusses several reliability engineering applications that NASA has used over the year to support the design, development, and operation of critical space flight hardware. Specifically, the paper discusses several reliability engineering applications used by NASA in areas such as risk management, inspection policies, components upgrades, reliability growth, integrated failure analysis, and physics based probabilistic engineering analysis. In each of these areas, the paper provides a brief discussion of a case study to demonstrate the value added and the criticality of reliability engineering in supporting NASA project and program decisions to fly safely. Examples of these case studies discussed are reliability based life limit extension of Shuttle Space Main Engine (SSME) hardware, Reliability based inspection policies for Auxiliary Power Unit (APU) turbine disc, probabilistic structural engineering analysis for reliability prediction of the SSME alternate turbo-pump development, impact of ET foam reliability on the Space Shuttle System risk, and reliability based Space Shuttle upgrade for safety. Special attention is given in this paper to the physics based probabilistic engineering analysis applications and their critical role in evaluating the reliability of NASA development hardware including their potential use in a research and technology development environment.
Gilkison, C R; Fenton, M V; Lester, J W
1992-05-01
This study was designed to establish the reliability of a health history questionnaire used as a screening tool for incoming university students. The authors used a test-retest design, with a test interval of 6 months, on a sample of medical and nursing students. The analysis focused on overall reliability of the questionnaire and reproducibility of specific items, based on question format. Questionnaire items of specific interest were those with dichotomous yes/no response options versus open-ended format questions, those using the words frequently or recently, or those that asked multiple questions. Demographic characteristics of the subjects were considered in the evaluation of reliability. Overall reliability of the questionnaire (93.6%) was above the anticipated level of 90%, and subject sex or program of study did not show any significant differences in reproducibility of responses. Although wording of questions did not affect item reliability, dichotomous format questions demonstrated a higher degree of reliability (96.4%) than the overall reliability of the questionnaire. Recommendations for enhancing the reliability of the questionnaire are based on item analysis and information gathered from interviews with subjects.
Reliability of the Test of Integrated Language and Literacy Skills (TILLS).
Mailend, Marja-Liisa; Plante, Elena; Anderson, Michele A; Applegate, E Brooks; Nelson, Nickola W
2016-07-01
As new standardized tests become commercially available, it is critical that clinicians have access to the information about a test's psychometric properties, including aspects of reliability. The purpose of the three studies reported in this article was to investigate the reliability of a new test, the Test of Integrated Language and Literacy Skills (TILLS), with consideration of both internal and external sources of measurement error. The TILLS was administered to children aged 6;0-18;11 years. The participants varied in terms of their language and literacy skills and included children with typical language development as well as those diagnosed with language or learning disability. The sample of children also varied in terms of their racial and socioeconomic backgrounds. Study 1 (N = 1056) assessed the internal consistency of TILLS calculating the coefficient omega for each subtest. Study 2 (N = 103) and Study 3 (N = 39) used the intra-class correlation coefficients to report on test-retest and inter-rater reliability respectively. The results indicate strong internal consistency and inter-rater reliability for all subtests of TILLS. The test-retest reliability was strong for all but one subtest, for which the intra-class correlation coefficient was in the acceptable range. This article provides clinicians with essential scientific information that supports the internal and external reliability of a new test of oral and written language skills, the TILLS. Information about reliability is critical for guiding the selection of an appropriate diagnostic tool amongst a number of options. © 2016 Royal College of Speech and Language Therapists.
The Typical General Aviation Aircraft
NASA Technical Reports Server (NTRS)
Turnbull, Andrew
1999-01-01
The reliability of General Aviation aircraft is unknown. In order to "assist the development of future GA reliability and safety requirements", a reliability study needs to be performed. Before any studies on General Aviation aircraft reliability begins, a definition of a typical aircraft that encompasses most of the general aviation characteristics needs to be defined. In this report, not only is the typical general aviation aircraft defined for the purpose of the follow-on reliability study, but it is also separated, or "sifted" into several different categories where individual analysis can be performed on the reasonably independent systems. In this study, the typical General Aviation aircraft is a four-place, single engine piston, all aluminum fixed-wing certified aircraft with a fixed tricycle landing gear and a cable operated flight control system. The system breakdown of a GA aircraft "sifts" the aircraft systems and components into five categories: Powerplant, Airframe, Aircraft Control Systems, Cockpit Instrumentation Systems, and the Electrical Systems. This breakdown was performed along the lines of a failure of the system. Any component that caused a system to fail was considered a part of that system.
2011-01-01
Background The aim of this study was to develop a child-specific classification system for long bone fractures and to examine its reliability and validity on the basis of a prospective multicentre study. Methods Using the sequentially developed classification system, three samples of between 30 and 185 paediatric limb fractures from a pool of 2308 fractures documented in two multicenter studies were analysed in a blinded fashion by eight orthopaedic surgeons, on a total of 5 occasions. Intra- and interobserver reliability and accuracy were calculated. Results The reliability improved with successive simplification of the classification. The final version resulted in an overall interobserver agreement of κ = 0.71 with no significant difference between experienced and less experienced raters. Conclusions In conclusion, the evaluation of the newly proposed classification system resulted in a reliable and routinely applicable system, for which training in its proper use may further improve the reliability. It can be recommended as a useful tool for clinical practice and offers the option for developing treatment recommendations and outcome predictions in the future. PMID:21548939
Dontje, Manon L; Dall, Philippa M; Skelton, Dawn A; Gill, Jason M R; Chastin, Sebastien F M
2018-01-01
Prolonged sedentary behaviour (SB) is associated with poor health. It is unclear which SB measure is most appropriate for interventions and population surveillance to measure and interpret change in behaviour in older adults. The aims of this study: to examine the relative and absolute reliability, Minimal Detectable Change (MDC) and responsiveness to change of subjective and objective methods of measuring SB in older adults and give recommendations of use for different study designs. SB of 18 older adults (aged 71 (IQR 7) years) was assessed using a systematic set of six subjective tools, derived from the TAxonomy of Self report Sedentary behaviour Tools (TASST), and one objective tool (activPAL3c), over 14 days. Relative reliability (Intra Class Correlation coefficients-ICC), absolute reliability (SEM), MDC, and the relative responsiveness (Cohen's d effect size (ES) and Guyatt's Responsiveness coefficient (GR)) were calculated for each of the different tools and ranked for different study designs. ICC ranged from 0.414 to 0.946, SEM from 36.03 to 137.01 min, MDC from 1.66 to 8.42 hours, ES from 0.017 to 0.259 and GR from 0.024 to 0.485. Objective average day per week measurement ranked as most responsive in a clinical practice setting, whereas a one day measurement ranked highest in quasi-experimental, longitudinal and controlled trial study designs. TV viewing-Previous Week Recall (PWR) ranked as most responsive subjective measure in all study designs. The reliability, Minimal Detectable Change and responsiveness to change of subjective and objective methods of measuring SB is context dependent. Although TV viewing-PWR is the more reliable and responsive subjective method in most situations, it may have limitations as a reliable measure of total SB. Results of this study can be used to guide choice of tools for detecting change in sedentary behaviour in older adults in the contexts of population surveillance, intervention evaluation and individual care.
Astronomy Teaching Self-Efficacy Belief Scale: The Validity and Reliability Study
ERIC Educational Resources Information Center
Demirci, Filiz; Ozyurek, Cengiz
2018-01-01
The purpose of this study is to develop a reliable and safe scale for determining the self-efficacy levels of science teachers in the teaching of astronomy subjects. The study used a survey approach, which is a qualitative research method. The study was conducted with a total of 106 science teachers working in the secondary schools of Ordu city…
Developing the Irrational Beliefs in Mathematics Scale (IBIMS): A Validity and Reliability Study
ERIC Educational Resources Information Center
Kaya, Deniz
2017-01-01
The purpose of this study is developing a valid and reliable scale intended to determine the irrational beliefs of students in mathematics. The study was conducted with a study group consisting of 700 students in 2015-2016 academic year. Expert opinions were received for the content and face validity of the scale, and the Exploratory Factor…
Comparison of fMRI paradigms assessing visuospatial processing: Robustness and reproducibility
Herholz, Peer; Zimmermann, Kristin M.; Westermann, Stefan; Frässle, Stefan; Jansen, Andreas
2017-01-01
The development of brain imaging techniques, in particular functional magnetic resonance imaging (fMRI), made it possible to non-invasively study the hemispheric lateralization of cognitive brain functions in large cohorts. Comprehensive models of hemispheric lateralization are, however, still missing and should not only account for the hemispheric specialization of individual brain functions, but also for the interactions among different lateralized cognitive processes (e.g., language and visuospatial processing). This calls for robust and reliable paradigms to study hemispheric lateralization for various cognitive functions. While numerous reliable imaging paradigms have been developed for language, which represents the most prominent left-lateralized brain function, the reliability of imaging paradigms investigating typically right-lateralized brain functions, such as visuospatial processing, has received comparatively less attention. In the present study, we aimed to establish an fMRI paradigm that robustly and reliably identifies right-hemispheric activation evoked by visuospatial processing in individual subjects. In a first study, we therefore compared three frequently used paradigms for assessing visuospatial processing and evaluated their utility to robustly detect right-lateralized brain activity on a single-subject level. In a second study, we then assessed the test-retest reliability of the so-called Landmark task–the paradigm that yielded the most robust results in study 1. At the single-voxel level, we found poor reliability of the brain activation underlying visuospatial attention. This suggests that poor signal-to-noise ratios can become a limiting factor for test-retest reliability. This represents a common detriment of fMRI paradigms investigating visuospatial attention in general and therefore highlights the need for careful considerations of both the possibilities and limitations of the respective fMRI paradigm–in particular, when being interested in effects at the single-voxel level. Notably, however, when focusing on the reliability of measures of hemispheric lateralization (which was the main goal of study 2), we show that hemispheric dominance (quantified by the lateralization index, LI, with |LI| >0.4) of the evoked activation could be robustly determined in more than 62% and, if considering only two categories (i.e., left, right), in more than 93% of our subjects. Furthermore, the reliability of the lateralization strength (LI) was “fair” to “good”. In conclusion, our results suggest that the degree of right-hemispheric dominance during visuospatial processing can be reliably determined using the Landmark task, both at the group and single-subject level, while at the same time stressing the need for future refinements of experimental paradigms and more sophisticated fMRI data acquisition techniques. PMID:29059201
Swaen, Gerard M H; Carmichael, Neil; Doe, John
2011-05-01
To evaluate the need for the creation of a system in which observational epidemiology studies are registered; an Observational Studies Register (OSR). The current scientific process for observational epidemiology studies is described. Next, a parallel is made with the clinical trials area, where the creation of clinical trial registers has greatly restored and improved their credibility and reliability. Next, the advantages and disadvantages of an OSR are compared. The advantages of an OSR outweigh its disadvantages. The creation of an OSR, similar to the existing Clinical Trials Registers, will improve the assessment of publication bias and will provide an opportunity to compare the original study protocol with the results reported in the publication. Reliability, credibility, and transparency of observational epidemiology studies are strengthened by the creation of an OSR. We propose a structured, collaborative, and coordinated approach for observational epidemiology studies that can provide solutions for existing weaknesses and will strengthen credibility and reliability, similar to the approach currently used in clinical trials, where Clinical Trials Registers have played a key role in strengthening their scientific value. Copyright © 2011 Elsevier Inc. All rights reserved.
Understanding a Widely Misunderstood Statistic: Cronbach's "Alpha"
ERIC Educational Resources Information Center
Ritter, Nicola L.
2010-01-01
It is important to explore score reliability in virtually all studies, because tests are not reliable. The present paper explains the most frequently used reliability estimate, coefficient alpha, so that the coefficient's conceptual underpinnings will be understood. Researchers need to understand score reliability because of the possible impact…
Reliability Analysis for AFTI-F16 SRFCS Using ASSIST and SURE
NASA Technical Reports Server (NTRS)
Wu, N. Eva
2001-01-01
This paper reports the results of a study on reliability analysis of an AFTI-16 Self-Repairing Flight Control System (SRFCS) using software tools SURE (Semi-Markov Unreliability Range Evaluator and ASSIST (Abstract Semi-Markov Specification Interface to the SURE Tool). The purpose of the study is to investigate the potential utility of the software tools in the ongoing effort of the NASA Aviation Safety Program, where the class of systems must be extended beyond the originally intended serving class of electronic digital processors. The study concludes that SURE and ASSIST are applicable to reliability, analysis of flight control systems. They are especially efficient for sensitivity analysis that quantifies the dependence of system reliability on model parameters. The study also confirms an earlier finding on the dominant role of a parameter called a failure coverage. The paper will remark on issues related to the improvement of coverage and the optimization of redundancy level.
Alzyoud, Sukaina; Veeranki, Sreenivas P.; Kheirallah, Khalid A.; Shotar, Ali M.; Pbert, Lori
2016-01-01
Introduction: Waterpipe use among adolescents has been increasing progressively. Yet no studies were reported to assess the validity and reliability of nicotine dependence scale. The current study aims to assess the validity and reliability of an Arabic version of the modified Waterpipe Tolerance Questionnaire WTQ among school-going adolescent waterpipe users. Methods: In a cross-sectional study conducted in Jordan, information on waterpipe use among 333 school-going adolescents aged 11-18 years was obtained using the Arabic version of the WTQ. An exploratory factor analysis and correlation matrices were conducted to assess validity and reliability of the WTQ. Results: The WTQ had a 0.73 alpha of internal consistency indicating moderate level of reliability. The scale showed multidimensionality with items loading on two factors, namely waterpipe consumption and morning smoking. Conclusion: This study report nicotine dependence level among school-going adolescents who identify themselves as waterpipe users using the WTQ. PMID:26383198
The Bahasa Melayu version of the Nursing Stress Scale among nurses: a reliability study in Malaysia.
Rosnawati, Muhamad Robat; Moe, Htay; Masilamani, Retneswari; Darus, A
2010-10-01
The Nursing Stress Scale (NSS) has been shown to be a valid and reliable instrument to assess occupational stressors among nurses. The NSS, which was previously used in the English version, was translated and back-translated into Bahasa Melayu. This study was conducted to assess the reliability of the Bahasa Melayu version of the NSS among nurses for future studies in this country. The reliability of the NSS was assessed after its readministration to 30 nurses with a 2-week interval. The Spearman coefficient was calculated to assess its stability. The internal consistency was measured through 4 measures: Cronbach's α, Spearman-Brown, Guttman split-half, and standardized item α coefficients. The total response rate was 70%. Test-retest reliability showed remarkable stability (Spearman's ρ exceeded .70). All 4 measures of internal consistency among items indicated a satisfactory level (coefficients in the range of .68 to .87). In conclusion, the Bahasa Melayu version of the NSS is a reliable and useful instrument for measuring the possible stressors at the workplace among nurses.
NASA Astrophysics Data System (ADS)
Huang, Yuxia; Mao, Mengchai; Zhang, Zong; Zhou, Hui; Zhao, Yang; Duan, Lian; Kreplin, Ute; Xiao, Xiang; Zhu, Chaozhe
2017-01-01
Functional near-infrared spectroscopy (fNIRS) is being increasingly applied to affective and social neuroscience research; however, the reliability of this method is still unclear. This study aimed to evaluate the test-retest reliability of the fNIRS-based prefrontal response to emotional stimuli. Twenty-six participants viewed unpleasant and neutral pictures, and were simultaneously scanned by fNIRS in two sessions three weeks apart. The reproducibility of the prefrontal activation map was evaluated at three spatial scales (mapwise, clusterwise, and channelwise) at both the group and individual levels. The influence of the time interval was also explored and comparisons were made between longer (intersession) and shorter (intrasession) time intervals. The reliabilities of the activation map at the group level for the mapwise (up to 0.88, the highest value appeared in the intersession assessment) and clusterwise scales (up to 0.91, the highest appeared in the intrasession assessment) were acceptable, indicating that fNIRS may be a reliable tool for emotion studies, especially for a group analysis and under larger spatial scales. However, it should be noted that the individual-level and the channelwise fNIRS prefrontal responses were not sufficiently stable. Future studies should investigate which factors influence reliability, as well as the validity of fNIRS used in emotion studies.
Cobb, Stephen C; Joshi, Mukta N; Pomeroy, Robin L
2016-12-01
In-vitro and invasive in-vivo studies have reported relatively independent motion in the medial and lateral forefoot segments during gait. However, most current surface-based models have not defined medial and lateral forefoot or midfoot segments. The purpose of the current study was to determine the reliability of a 7-segment foot model that includes medial and lateral midfoot and forefoot segments during walking gait. Three-dimensional positions of marker clusters located on the leg and 6 foot segments were tracked as 10 participants completed 5 walking trials. To examine the reliability of the foot model, coefficients of multiple correlation (CMC) were calculated across the trials for each participant. Three-dimensional stance time series and range of motion (ROM) during stance were also calculated for each functional articulation. CMCs for all of the functional articulations were ≥ 0.80. Overall, the rearfoot complex (leg-calcaneus segments) was the most reliable articulation and the medial midfoot complex (calcaneus-navicular segments) was the least reliable. With respect to ROM, reliability was greatest for plantarflexion/dorsiflexion and least for abduction/adduction. Further, the stance ROM and time-series patterns results between the current study and previous invasive in-vivo studies that have assessed actual bone motion were generally consistent.
Álvarez-Gallardo, Inmaculada C; Soriano-Maldonado, Alberto; Segura-Jiménez, Víctor; Carbonell-Baeza, Ana; Estévez-López, Fernando; McVeigh, Joseph G; Delgado-Fernández, Manuel; Ortega, Francisco B
2016-03-01
To examine the construct validity of the International FItness Scale (IFIS) (ie, self-reported fitness) against objectively measured physical fitness in women with fibromyalgia and in healthy women; and to study the test-retest reliability of the IFIS in women with fibromyalgia. Cross-sectional study. Fibromyalgia patient support groups. Women with fibromyalgia (n=413) and healthy women (controls) (n=195) for validity purposes and women with fibromyalgia (n=101) for the reliability study. The total sample was N=709. Not applicable. Fitness level was both self-reported (IFIS) and measured using performance-based fitness tests. For the reliability study the IFIS was completed on 2 occasions, 1 week apart. Women with fibromyalgia who reported average fitness had better measured fitness than those reporting very poor fitness (all P<.001, except 6-minute walk test where P<.05), with similar trends observed in healthy control women. The test-retest reliability of the IFIS, as measured by the average weighted κ, was .45. The IFIS was able to identify women with fibromyalgia who had very low fitness and distinguish them from those with higher fitness levels. Furthermore, the IFIS was moderately reliable in women with fibromyalgia. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Best Practices for Reliable and Robust Spacecraft Structures
NASA Technical Reports Server (NTRS)
Raju, Ivatury S.; Murthy, P. L. N.; Patel, Naresh R.; Bonacuse, Peter J.; Elliott, Kenny B.; Gordon, S. A.; Gyekenyesi, J. P.; Daso, E. O.; Aggarwal, P.; Tillman, R. F.
2007-01-01
A study was undertaken to capture the best practices for the development of reliable and robust spacecraft structures for NASA s next generation cargo and crewed launch vehicles. In this study, the NASA heritage programs such as Mercury, Gemini, Apollo, and the Space Shuttle program were examined. A series of lessons learned during the NASA and DoD heritage programs are captured. The processes that "make the right structural system" are examined along with the processes to "make the structural system right". The impact of technology advancements in materials and analysis and testing methods on reliability and robustness of spacecraft structures is studied. The best practices and lessons learned are extracted from these studies. Since the first human space flight, the best practices for reliable and robust spacecraft structures appear to be well established, understood, and articulated by each generation of designers and engineers. However, these best practices apparently have not always been followed. When the best practices are ignored or short cuts are taken, risks accumulate, and reliability suffers. Thus program managers need to be vigilant of circumstances and situations that tend to violate best practices. Adherence to the best practices may help develop spacecraft systems with high reliability and robustness against certain anomalies and unforeseen events.
Reliability assessments in qualitative health promotion research.
Cook, Kay E
2012-03-01
This article contributes to the debate about the use of reliability assessments in qualitative research in general, and health promotion research in particular. In this article, I examine the use of reliability assessments in qualitative health promotion research in response to health promotion researchers' commonly held misconception that reliability assessments improve the rigor of qualitative research. All qualitative articles published in the journal Health Promotion International from 2003 to 2009 employing reliability assessments were examined. In total, 31.3% (20/64) articles employed some form of reliability assessment. The use of reliability assessments increased over the study period, ranging from <20% in 2003/2004 to 50% and above in 2008/2009, while at the same time the total number of qualitative articles decreased. The articles were then classified into four types of reliability assessments, including the verification of thematic codes, the use of inter-rater reliability statistics, congruence in team coding and congruence in coding across sites. The merits of each type were discussed, with the subsequent discussion focusing on the deductive nature of reliable thematic coding, the limited depth of immediately verifiable data and the usefulness of such studies to health promotion and the advancement of the qualitative paradigm.
Clinical assessment of scapular positioning in musicians: an intertester reliability study.
Struyf, Filip; Nijs, Jo; De Coninck, Kris; Giunta, Marco; Mottram, Sarah; Meeusen, Romain
2009-01-01
The reliability of the measurement of the distance between the posterior border of the acromion and the wall and the reliability of the modified lateral scapular slide test have not been studied. Overall, the reliability of the clinical tools used to assess scapular positioning has not been studied in musicians. To examine the intertester reliability of scapular observation and 2 clinical tests for the assessment of scapular positioning in musicians. Intertester reliability study. University research laboratory. Thirty healthy student musicians at a single university. Two assessors performed a standardized observation protocol, the measurement of the distance between the posterior border of the acromion and the wall, and the modified lateral scapular slide test. Each assessor was blinded to the other's findings. The intertester reliability coefficients (kappa) for the observation in relaxed position, during unloaded movement, and during loaded movement were 0.41, 0.63, and 0.36, respectively. The kappa values for the observation of tilting and winging at rest were 0.48 and 0.42, respectively; during unloaded movement, the kappa values were 0.52 and 0.78, respectively; and with a 1-kg load, the kappa values were 0.24 and 0.50, respectively. The intraclass correlation coefficient (ICC) of the measurement of the acromial distance was 0.72 in relaxed position and 0.75 with the participant actively retracting both shoulders. The ICCs for the modified lateral scapular slide test varied between 0.63 and 0.58. Our results demonstrated that the modified lateral scapular slide test was not a reliable tool to assess scapular positioning in these participants. Our data indicated that scapular observation in the relaxed position and during unloaded abduction in the frontal plane was a reliable assessment tool. The reliability of the measurement of the distance between the posterior border of the acromion and the wall in healthy musicians was moderate.
Clinical Assessment of Scapular Positioning in Musicians: An Intertester Reliability Study
Struyf, Filip; Nijs, Jo; De Coninck, Kris; Giunta, Marco; Mottram, Sarah; Meeusen, Romain
2009-01-01
Abstract Context: The reliability of the measurement of the distance between the posterior border of the acromion and the wall and the reliability of the modified lateral scapular slide test have not been studied. Overall, the reliability of the clinical tools used to assess scapular positioning has not been studied in musicians. Objective: To examine the intertester reliability of scapular observation and 2 clinical tests for the assessment of scapular positioning in musicians. Design: Intertester reliability study. Setting: University research laboratory. Patients or Other Participants: Thirty healthy student musicians at a single university. Main Outcome Measure(s): Two assessors performed a standardized observation protocol, the measurement of the distance between the posterior border of the acromion and the wall, and the modified lateral scapular slide test. Each assessor was blinded to the other's findings. Results: The intertester reliability coefficients (κ) for the observation in relaxed position, during unloaded movement, and during loaded movement were 0.41, 0.63, and 0.36, respectively. The κ values for the observation of tilting and winging at rest were 0.48 and 0.42, respectively; during unloaded movement, the κ values were 0.52 and 0.78, respectively; and with a 1-kg load, the κ values were 0.24 and 0.50, respectively. The intraclass correlation coefficient (ICC) of the measurement of the acromial distance was 0.72 in relaxed position and 0.75 with the participant actively retracting both shoulders. The ICCs for the modified lateral scapular slide test varied between 0.63 and 0.58. Conclusions: Our results demonstrated that the modified lateral scapular slide test was not a reliable tool to assess scapular positioning in these participants. Our data indicated that scapular observation in the relaxed position and during unloaded abduction in the frontal plane was a reliable assessment tool. The reliability of the measurement of the distance between the posterior border of the acromion and the wall in healthy musicians was moderate. PMID:19771291
Tabard-Fougère, Anne; Bonnefoy-Mazure, Alice; Hanquinet, Sylviane; Lascombes, Pierre; Armand, Stéphane; Dayer, Romain
2017-01-15
Test-retest study. This study aimed to evaluate the validity and reliability of rasterstereography in patients with adolescent idiopathic scoliosis (AIS) with a major curve Cobb angle (CA) between 10° and 40° for frontal, sagittal, and transverse parameters. Previous studies evaluating the validity and reliability of rasterstereography concluded that this technique had good accuracy compared with radiographs and a high intra- and interday reliability in healthy volunteers. To the best of our knowledge, the validity and reliability have not been assessed in AIS patients. Thirty-five adolescents with AIS (male = 13) aged 13.1 ± 2.0 years were included. To evaluate the validity of the scoliosis angle (SA) provided by rasterstereography, a comparison (t test, Pearson correlation) was performed with the CA obtained using 2D EOS® radiography (XR). Three rasterstereographic repeated measurements were independently performed by two operators on the same day (interrater reliability) and again by the first operator 1 week later (intrarater reliability). The variables of interest were the SA, lumbar lordosis, and thoracic kyphosis angle, trunk length, pelvic obliquity, and maximum, root mean square and amplitude of vertebral rotations. The data analyses used intraclass correlation coefficients (ICCs). The CA and SA were strongly correlated (R = 0.70) and were nonsignificantly different (P = 0.60). The intrarater reliability (same day: ICC [1, 1], n = 35; 1 week later: ICC [1, 3], n = 28) and interrater reliability (ICC [3, 3], n = 16) were globally excellent (ICC > 0.75) except for the assessment of pelvic obliquity. This study showed that the rasterstereographic system allows for the evaluation of AIS patients with a good validity compared with XR with an overall excellent intra- and interrater reliability. Based on these results, this automatic, fast, and noninvasive system can be used for monitoring the evolution of AIS in growing patients instead of repetitive radiographs, thereby reducing radiation exposure and decreasing costs. 4.
Reliability of an fMRI Paradigm for Emotional Processing in a Multisite Longitudinal Study
Gee, Dylan G.; McEwen, Sarah C.; Forsyth, Jennifer K.; Haut, Kristen M.; Bearden, Carrie E.; Addington, Jean; Goodyear, Bradley; Cadenhead, Kristin S.; Mirzakhanian, Heline; Cornblatt, Barbara A.; Olvet, Doreen; Mathalon, Daniel H.; McGlashan, Thomas H.; Perkins, Diana O.; Belger, Aysenil; Seidman, Larry J.; Thermenos, Heidi; Tsuang, Ming T.; van Erp, Theo G.M.; Walker, Elaine F.; Hamann, Stephan; Woods, Scott W.; Constable, Todd; Cannon, Tyrone D.
2015-01-01
Multisite neuroimaging studies can facilitate the investigation of brain-related changes in many contexts, including patient groups that are relatively rare in the general population. Though multisite studies have characterized the reliability of brain activation during working memory and motor functional magnetic resonance imaging tasks, emotion processing tasks, pertinent to many clinical populations, remain less explored. A traveling participants study was conducted with eight healthy volunteers scanned twice on consecutive days at each of the eight North American Longitudinal Prodrome Study sites. Tests derived from generalizability theory showed excellent reliability in the amygdala (Eρ2=0.82), inferior frontal gyrus (IFG;Eρ2=0.83), anterior cingulate cortex (ACC;Eρ2=0.76), insula (Eρ2=0.85), and fusiform gyrus (Eρ2=0.91) for maximum activation and fair to excellent reliability in the amygdala (Eρ2=0.44), IFG (Eρ2=0.48), ACC (Eρ2=0.55), insula (Eρ2=0.42), and fusiform gyrus (Eρ2=0.83) for mean activation across sites and test days. For the amygdala, habituation (Eρ2=0.71) was more stable than mean activation. In a second investigation, data from 111 healthy individuals across sites were aggregated in a voxelwise, quantitative meta-analysis. When compared with a mixed effects model controlling for site, both approaches identified robust activation in regions consistent with expected results based on prior single-site research. Overall, regions central to emotion processing showed strong reliability in the traveling participants study and robust activation in the aggregation study. These results support the reliability of blood oxygen level-dependent signal in emotion processing areas across different sites and scanners and may inform future efforts to increase efficiency and enhance knowledge of rare conditions in the population through multisite neuroimaging paradigms. PMID:25821147
Salamon, Sarah; Santelmann, Hanno; Franklin, Jeremy; Baethge, Christopher
2018-04-01
Reliability of schizoaffective disorder (SAD) diagnoses is low in adults but unclear in children and adolescents (CAD). We estimate the test-retest reliability of SAD and its key differential diagnoses (schizophrenia, bipolar disorder, and unipolar depression). Systematic literature search of Medline, Embase, and PsycInfo for studies on test-retest reliability of SAD, in CAD. Cohen's kappa was extracted from studies. We performed meta-analysis for kappa, including subgroup and sensitivity analysis (PROSPERO protocol: CRD42013006713). Out of > 4000 records screened, seven studies were included. We estimated kappa values of 0.27 [95%-CI: 0.07 0.47] for SAD, 0.56 [0.29; 0.83] for schizophrenia, 0.64 [0.55; 0.74] for bipolar disorder, and 0.66 [0.52; 0.81] for unipolar depression. In 5/7 studies kappa of SAD was lower than that of schizophrenia; similar trends emerged for bipolar disorder (4/5) and unipolar depression (2/3). Estimates of positive agreement of SAD diagnoses supported these results. The number of studies and patients included is low. The point-estimate of the test-retest reliability of schizoaffective disorder is only fair, and lower than that of its main differential diagnoses. All kappa values under study were lower in children and adolescents samples than those reported for adults. Clinically, schizoaffective disorder should be diagnosed in strict adherence to the operationalized criteria and ought to be re-evaluated regularly. Should larger studies confirm the insufficient reliability of schizoaffective disorder in children and adolescents, the clinical value of the diagnosis is highly doubtful. Copyright © 2017. Published by Elsevier B.V.
Sarig Bahat, Hilla; Sprecher, Elliot; Sela, Itamar; Treleaven, Julia
2016-07-01
The use of virtual reality (VR) for assessment and intervention of neck pain has previously been used and shown reliable for cervical range of motion measures. Neck VR enables analysis of task-oriented neck movement by stimulating responsive movements to external stimuli. Therefore, the purpose of this study was to establish inter-tester reliability of neck kinematic measures so that it can be used as a reliable assessment and treatment tool between clinicians. This reliability study included 46 asymptomatic participants, who were assessed using the neck VR system which displayed an interactive VR scenario via a head-mounted device, controlled by neck movements. The objective of the interactive assessment was to hit 16 targets, randomly appearing in four directions, as fast as possible. Each participant was tested twice by two different testers. Good reliability was found of neck motion kinematic measures in flexion, extension, and rotation (0.64-0.93 inter-class correlation). High reliability was shown for peak velocity globally (0.93), in left rotation (0.9), right rotation and extension (0.88), and flexion (0.86). Mean velocity had a good global reliability (0.84), except for left rotation directed movement with moderate reliability (0.68). Minimal detectable change for peak velocity ranged from 41 to 53 °/s, while mean velocity ranged from 20 to 25 °/s. The results suggest high reliability for peak and mean velocity as measured by the interactive Neck VR assessment of neck motion kinematics. VR appears to provide a reliable and more ecologically valid method of cervical motion evaluation than previous conventional methodologies.
Rathi, Sangeeta; Taylor, Nicholas F; Gee, Jamie; Green, Rodney A
2016-12-01
Ultrasonography is an economical and non-invasive method for measuring real-time joint movements. Although physiotherapists are increasingly using ultrasound imaging for rotator cuff disorders, there is a lack of evidence on their reliability in using ultrasonography to measure glenohumeral translation. The aim of this study was to evaluate the reliability of a physiotherapist in measuring anterior and posterior glenohumeral joint translation with ultrasound. Study design: within day reliability. Anterior and posterior glenohumeral translations were measured at rest, in response to passive accessory motion testing force, and with isometric internal and external rotation in 12 young healthy adults. All the measurements were made in real time by a physiotherapist and an experienced sonographer in two positions (neutral and abducted) and in two views (anterior and posterior). Intra-rater and inter-rater reliability were expressed using intraclass correlation coefficients (ICC) and measurement error (mm). Intra-rater reliability was good for both raters (ICC P : 0.86-0.98; ICC S : 0.85-0.96). The inter-rater reliability between the physiotherapist and sonographer was moderate to good for posterior measurements (ICC 0.50-0.75) and poor to moderate for anterior measurements (ICC 0.31-0.53). For both intra-rater and inter-rater measurements, posterior translation was more reliable than the anterior translation with smaller measurement errors (posterior: 0.1-0.2 mm, anterior: 0.2-0.3 mm). A physiotherapist with minimal training was reliable in measuring glenohumeral joint translations. The ultrasound method was reliable for repeated measurement of both anterior and posterior glenohumeral translations with posterior measurements being more reliable than anterior. This method is recommended for future research to investigate the stabilising role of rotator cuff muscles. Copyright © 2016 Elsevier Ltd. All rights reserved.
Reliability of videotaped observational gait analysis in patients with orthopedic impairments
Brunnekreef, Jaap J; van Uden, Caro JT; van Moorsel, Steven; Kooloos, Jan GM
2005-01-01
Background In clinical practice, visual gait observation is often used to determine gait disorders and to evaluate treatment. Several reliability studies on observational gait analysis have been described in the literature and generally showed moderate reliability. However, patients with orthopedic disorders have received little attention. The objective of this study is to determine the reliability levels of visual observation of gait in patients with orthopedic disorders. Methods The gait of thirty patients referred to a physical therapist for gait treatment was videotaped. Ten raters, 4 experienced, 4 inexperienced and 2 experts, individually evaluated these videotaped gait patterns of the patients twice, by using a structured gait analysis form. Reliability levels were established by calculating the Intraclass Correlation Coefficient (ICC), using a two-way random design and based on absolute agreement. Results The inter-rater reliability among experienced raters (ICC = 0.42; 95%CI: 0.38–0.46) was comparable to that of the inexperienced raters (ICC = 0.40; 95%CI: 0.36–0.44). The expert raters reached a higher inter-rater reliability level (ICC = 0.54; 95%CI: 0.48–0.60). The average intra-rater reliability of the experienced raters was 0.63 (ICCs ranging from 0.57 to 0.70). The inexperienced raters reached an average intra-rater reliability of 0.57 (ICCs ranging from 0.52 to 0.62). The two expert raters attained ICC values of 0.70 and 0.74 respectively. Conclusion Structured visual gait observation by use of a gait analysis form as described in this study was found to be moderately reliable. Clinical experience appears to increase the reliability of visual gait analysis. PMID:15774012
Marawar, Satyajit V; Madom, Ian A; Palumbo, Mark; Tallarico, Richard A; Ordway, Nathaniel R; Metkar, Umesh; Wang, Dongliang; Green, Adam; Lavelle, William F
2017-01-01
Treating surgeon's visual assessment of axial MRI images to ascertain the degree of stenosis has a critical impact on surgical decision-making. The purpose of this study was to prospectively analyze the impact of surgeon experience on inter-observer and intra-observer reliability of assessing severity of spinal stenosis on MRIs by spine surgeons directly involved in surgical decision-making. Seven fellowship trained spine surgeons reviewed MRI studies of 30 symptomatic patients with lumbar stenosis and graded the stenosis in the central canal, the lateral recess and the foramen at T12-L1 to L5-S1 as none, mild, moderate or severe. No specific instructions were provided to what constituted mild, moderate, or severe stenosis. Two surgeons were "senior" (>fifteen years of practice experience); two were "intermediate" (>four years of practice experience), and three "junior" (< one year of practice experience). The concordance correlation coefficient (CCC) was calculated to assess inter-observer reliability. Seven MRI studies were duplicated and randomly re-read to evaluate inter-observer reliability. Surgeon experience was found to be a strong predictor of inter-observer reliability. Senior inter-observer reliability was significantly higher assessing central(p<0.001), foraminal p=0.005 and lateral p=0.001 than "junior" group.Senior group also showed significantly higher inter-observer reliability that intermediate group assessing foraminal stenosis (p=0.036). In intra-observer reliability the results were contrary to that found in inter-observer reliability. Inter-observer reliability of assessing stenosis on MRIs increases with surgeon experience. Lower intra-observer reliability values among the senior group, although not clearly explained, may be due to the small number of MRIs evaluated and quality of MRI images.Level of evidence: Level 3.
ERIC Educational Resources Information Center
Bottema-Beutel, Kristen; Lloyd, Blair; Carter, Erik W.; Asmus, Jennifer M.
2014-01-01
Attaining reliable estimates of observational measures can be challenging in school and classroom settings, as behavior can be influenced by multiple contextual factors. Generalizability (G) studies can enable researchers to estimate the reliability of observational data, and decision (D) studies can inform how many observation sessions are…
Reliable Digit Span: A Systematic Review and Cross-Validation Study
ERIC Educational Resources Information Center
Schroeder, Ryan W.; Twumasi-Ankrah, Philip; Baade, Lyle E.; Marshall, Paul S.
2012-01-01
Reliable Digit Span (RDS) is a heavily researched symptom validity test with a recent literature review yielding more than 20 studies ranging in dates from 1994 to 2011. Unfortunately, limitations within some of the research minimize clinical generalizability. This systematic review and cross-validation study was conducted to address these…
ERIC Educational Resources Information Center
Scofield, Jason; Gilpin, Ansley Tullos; Pierucci, Jillian; Morgan, Reed
2013-01-01
Studies show that children trust previously reliable sources over previously unreliable ones (e.g., Koenig, Clement, & Harris, 2004). However, it is unclear from these studies whether children rely on accuracy or conventionality to determine the reliability and, ultimately, the trustworthiness of a particular source. In the current study, 3- and…
Social Media Addiction Scale-Student Form: The Reliability and Validity Study
ERIC Educational Resources Information Center
Sahin, Cengiz
2018-01-01
The purpose of this study is to develop a valid and reliable measurement tool to determine the social media addictions of secondary school, high school and university students. 998 students participated in the study. 476 students from secondary schools, high schools and universities participated in the first application during which the…
NREL's Energy Storage and REopt Teams Awarded $525k from TCF to Study
Commercial Viability of Optimal, Reliable Building-Integrated Energy Storage | News | NREL NREL's Energy Storage and REopt Teams Awarded $525k from TCF to Study Commercial Viability of Optimal Study Commercial Viability of Optimal, Reliable Building-Integrated Energy Storage November 14, 2017
Turkish Adaptation of the Mentorship Effectiveness Scale: A Validity and Reliability Study
ERIC Educational Resources Information Center
Yirci, Ramazan; Karakose, Turgut; Uygun, Harun; Ozdemir, Tuncay Yavuz
2016-01-01
The purpose of this study is to adapt the Mentoring Relationship Effectiveness Scale to Turkish, and to conduct validity and reliability tests regarding the scale. The study group consisted of 156 university science students receiving graduate education. Construct validity and factor structure of the scale was analyzed first through exploratory…
Venkatraman, Vijay K; Gonzalez, Christopher E.; Landman, Bennett; Goh, Joshua; Reiter, David A.; An, Yang; Resnick, Susan M.
2017-01-01
Diffusion tensor imaging (DTI) measures are commonly used as imaging markers to investigate individual differences in relation to behavioral and health-related characteristics. However, the ability to detect reliable associations in cross-sectional or longitudinal studies is limited by the reliability of the diffusion measures. Several studies have examined reliability of diffusion measures within (i.e. intra-site) and across (i.e. inter-site) scanners with mixed results. Our study compares the test-retest reliability of diffusion measures within and across scanners and field strengths in cognitively normal older adults with a follow-up interval less than 2.25 years. Intra-class correlation (ICC) and coefficient of variation (CoV) of fractional anisotropy (FA) and mean diffusivity (MD) were evaluated in sixteen white matter and twenty-six gray matter bilateral regions. The ICC for intra-site reliability (0.32 to 0.96 for FA and 0.18 to 0.95 for MD in white matter regions; 0.27 to 0.89 for MD and 0.03 to 0.79 for FA in gray matter regions) and inter-site reliability (0.28 to 0.95 for FA in white matter regions, 0.02 to 0.86 for MD in gray matter regions) with longer follow-up intervals were similar to earlier studies using shorter follow-up intervals. The reliability of across field strengths comparisons was lower than intra- and inter-site reliability. Within and across scanner comparisons showed that diffusion measures were more stable in larger white matter regions (> 1500 mm3). For gray matter regions, the MD measure showed stability in specific regions and was not dependent on region size. Linear correction factor estimated from cross-sectional or longitudinal data improved the reliability across field strengths. Our findings indicate that investigations relating diffusion measures to external variables must consider variable reliability across the distinct regions of interest and that correction factors can be used to improve consistency of measurement across field strengths. An important result of this work is that inter-scanner and field strength effects can be partially mitigated with linear correction factors specific to regions of interest. These data-driven linear correction techniques can be applied in cross-sectional or longitudinal studies. PMID:26146196
Piqueras, Jose A; Martín-Vivar, María; Sandin, Bonifacio; San Luis, Concepción; Pineda, David
2017-08-15
Anxiety and depression are among the most common mental disorders during childhood and adolescence. Among the instruments for the brief screening assessment of symptoms of anxiety and depression, the Revised Child Anxiety and Depression Scale (RCADS) is one of the more widely used. Previous studies have demonstrated the reliability of the RCADS for different assessment settings and different versions. The aims of this study were to examine the mean reliability of the RCADS and the influence of the moderators on the RCADS reliability. We searched in EBSCO, PsycINFO, Google Scholar, Web of Science, and NCBI databases and other articles manually from lists of references of extracted articles. A total of 146 studies were included in our meta-analysis. The RCADS showed robust internal consistency reliability in different assessment settings, countries, and languages. We only found that reliability of the RCADS was significantly moderated by the version of RCADS. However, these differences in reliability between different versions of the RCADS were slight and can be due to the number of items. We did not examine factor structure, factorial invariance across gender, age, or country, and test-retest reliability of the RCADS. The RCADS is a reliable instrument for cross-cultural use, with the advantage of providing more information with a low number of items in the assessment of both anxiety and depression symptoms in children and adolescents. Copyright © 2017. Published by Elsevier B.V.
Cook, David A; Reed, Darcy A
2015-08-01
The Medical Education Research Study Quality Instrument (MERSQI) and the Newcastle-Ottawa Scale-Education (NOS-E) were developed to appraise methodological quality in medical education research. The study objective was to evaluate the interrater reliability, normative scores, and between-instrument correlation for these two instruments. In 2014, the authors searched PubMed and Google for articles using the MERSQI or NOS-E. They obtained or extracted data for interrater reliability-using the intraclass correlation coefficient (ICC)-and normative scores. They calculated between-scale correlation using Spearman rho. Each instrument contains items concerning sampling, controlling for confounders, and integrity of outcomes. Interrater reliability for overall scores ranged from 0.68 to 0.95. Interrater reliability was "substantial" or better (ICC > 0.60) for nearly all domain-specific items on both instruments. Most instances of low interrater reliability were associated with restriction of range, and raw agreement was usually good. Across 26 studies evaluating published research, the median overall MERSQI score was 11.3 (range 8.9-15.1, of possible 18). Across six studies, the median overall NOS-E score was 3.22 (range 2.08-3.82, of possible 6). Overall MERSQI and NOS-E scores correlated reasonably well (rho 0.49-0.72). The MERSQI and NOS-E are useful, reliable, complementary tools for appraising methodological quality of medical education research. Interpretation and use of their scores should focus on item-specific codes rather than overall scores. Normative scores should be used for relative rather than absolute judgments because different research questions require different study designs.
Ratter, Julia; Radlinger, Lorenz; Lucas, Cees
2014-09-01
Are submaximal and maximal exercise tests reliable, valid and acceptable in people with chronic pain, fibromyalgia and fatigue disorders? Systematic review of studies of the psychometric properties of exercise tests. People older than 18 years with chronic pain, fibromyalgia and chronic fatigue disorders. Studies of the measurement properties of tests of physical capacity in people with chronic pain, fibromyalgia or chronic fatigue disorders were included. Studies were required to report: reliability coefficients (intraclass correlation coefficient, alpha reliability coefficient, limits of agreements and Bland-Altman plots); validity coefficients (intraclass correlation coefficient, Spearman's correlation, Kendal T coefficient, Pearson's correlation); or dropout rates. Fourteen studies were eligible: none had low risk of bias, 10 had unclear risk of bias and four had high risk of bias. The included studies evaluated: Åstrand test; modified Åstrand test; Lean body mass-based Åstrand test; submaximal bicycle ergometer test following another protocol other than Åstrand test; 2-km walk test; 5-minute, 6-minute and 10-minute walk tests; shuttle walk test; and modified symptom-limited Bruce treadmill test. None of the studies assessed maximal exercise tests. Where they had been tested, reliability and validity were generally high. Dropout rates were generally acceptable. The 2-km walk test was not recommended in fibromyalgia. Moderate evidence was found for reliability, validity and acceptability of submaximal exercise tests in patients with chronic pain, fibromyalgia or chronic fatigue. There is no evidence about maximal exercise tests in patients with chronic pain, fibromyalgia and chronic fatigue. Copyright © 2014. Published by Elsevier B.V.
Reliability Generalization of Scores on the Spielberger State-Trait Anxiety Inventory.
ERIC Educational Resources Information Center
Barnes, Laura L. B.; Harp, Diane; Jung, Woo Sik
2002-01-01
Conducted a reliability generalization study for the State-Trait Anxiety Inventory (C. Spielberger, 1983) by reviewing and classifying 816 research articles. Average reliability coefficients were acceptable for both internal consistency and test-retest reliability, but variation was present among the estimates. Other differences are discussed.…
Meta-Analysis of Scale Reliability Using Latent Variable Modeling
ERIC Educational Resources Information Center
Raykov, Tenko; Marcoulides, George A.
2013-01-01
A latent variable modeling approach is outlined that can be used for meta-analysis of reliability coefficients of multicomponent measuring instruments. Important limitations of efforts to combine composite reliability findings across multiple studies are initially pointed out. A reliability synthesis procedure is discussed that is based on…
Large-scale-system effectiveness analysis. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patton, A.D.; Ayoub, A.K.; Foster, J.W.
1979-11-01
Objective of the research project has been the investigation and development of methods for calculating system reliability indices which have absolute, and measurable, significance to consumers. Such indices are a necessary prerequisite to any scheme for system optimization which includes the economic consequences of consumer service interruptions. A further area of investigation has been joint consideration of generation and transmission in reliability studies. Methods for finding or estimating the probability distributions of some measures of reliability performance have been developed. The application of modern Monte Carlo simulation methods to compute reliability indices in generating systems has been studied.
NASA Astrophysics Data System (ADS)
Gilmanshin, I. R.; Kirpichnikov, A. P.
2017-09-01
In the result of study of the algorithm of the functioning of the early detection module of excessive losses, it is proven the ability to model it by using absorbing Markov chains. The particular interest is in the study of probability characteristics of early detection module functioning algorithm of losses in order to identify the relationship of indicators of reliability of individual elements, or the probability of occurrence of certain events and the likelihood of transmission of reliable information. The identified relations during the analysis allow to set thresholds reliability characteristics of the system components.
A study of the longevity and operational reliability of Goddard Spacecraft, 1960-1980
NASA Technical Reports Server (NTRS)
Shockey, E. F.
1981-01-01
Compiled data regarding the design lives and lifetimes actually achieved by 104 orbiting satellites launched by the Goddard Spaceflight Center between the years 1960 and 1980 is analyzed. Historical trends over the entire 21 year period are reviewed, and the more recent data is subjected to an examination of several key parameters. An empirical reliability function is derived, and compared with various mathematical models. Data from related studies is also discussed. The results provide insight into the reliability history of Goddard spacecraft an guidance for estimating the reliability of future programs.
The Nordic concept of reactive psychosis--a multicenter reliability study.
Hansen, H; Dahl, A A; Bertelsen, A; Birket-Smith, M; von Knorring, L; Ottosson, J O; Pakaslahti, A; Retterstøl, N; Salvesen, C; Thorsteinsson, G
1992-07-01
Reactive psychosis is a common diagnosis in the Nordic countries (Norway, Sweden, Denmark, Finland and Iceland) and in several other parts of the world. In ICD-9 and DSM-III-R, the concept is defined more narrowly than in the Nordic tradition. In this study we examined the interrater reliability of the Nordic concept by the case-summary method between clinicians from 9 university departments in the Nordic countries. The results show that Nordic psychiatrists have a reasonably reliable concept of reactive psychosis, and that this psychosis can be diagnosed as reliably as schizophrenia and affective psychosis.
ERIC Educational Resources Information Center
Çapri, Burhan; Gündüz, Bülent; Akbay, Sinem Evin
2017-01-01
The primary goal of this study is to complete the adaptation, validity and reliability studies of the long (17 items) and short (9 items) forms of UWES-SF. The secondary goal of this study is to study the mediating role of work engagement between academic procrastination and academic responsibility in high school students. The study group consists…
Santelmann, Hanno; Franklin, Jeremy; Bußhoff, Jana; Baethge, Christopher
2015-11-01
Schizoaffective disorder is a frequent diagnosis, and its reliability is subject to ongoing discussion. We compared the diagnostic reliability of schizoaffective disorder with its main differential diagnoses. We systematically searched Medline, Embase, and PsycInfo for all studies on the test-retest reliability of the diagnosis of schizoaffective disorder as compared with schizophrenia, bipolar disorder, and unipolar depression. We used meta-analytic methods to describe and compare Cohen's kappa as well as positive and negative agreement. In addition, multiple pre-specified and post hoc subgroup and sensitivity analyses were carried out. Out of 4,415 studies screened, 49 studies were included. Test-retest reliability of schizoaffective disorder was consistently lower than that of schizophrenia (in 39 out of 42 studies), bipolar disorder (27/33), and unipolar depression (29/35). The mean difference in kappa between schizoaffective disorder and the other diagnoses was approximately 0.2, and mean Cohen's kappa for schizoaffective disorder was 0.50 (95% confidence interval: 0.40-0.59). While findings were unequivocal and homogeneous for schizoaffective disorder's diagnostic reliability relative to its three main differential diagnoses (dichotomous: smaller versus larger), heterogeneity was substantial for continuous measures, even after subgroup and sensitivity analyses. In clinical practice and research, schizoaffective disorder's comparatively low diagnostic reliability should lead to increased efforts to correctly diagnose the disorder. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Partido, Brian B; Jones, Archie A; English, Dana L; Nguyen, Carol A; Jacks, Mary E
2015-02-01
Dental and dental hygiene faculty members often do not provide consistent instruction in the clinical environment, especially in tasks requiring clinical judgment. From previous efforts to calibrate faculty members in calculus detection using typodonts, researchers have suggested using human subjects and emerging technology to improve consistency in clinical instruction. The purpose of this pilot study was to determine if a dental endoscopy-assisted training program would improve intra- and interrater reliability of dental hygiene faculty members in calculus detection. Training included an ODU 11/12 explorer, typodonts, and dental endoscopy. A convenience sample of six participants was recruited from the dental hygiene faculty at a California community college, and a two-group randomized experimental design was utilized. Intra- and interrater reliability was measured before and after calibration training. Pretest and posttest Kappa averages of all participants were compared using repeated measures (split-plot) ANOVA to determine the effectiveness of the calibration training on intra- and interrater reliability. The results showed that both kinds of reliability significantly improved for all participants and the training group improved significantly in interrater reliability from pretest to posttest. Calibration training was beneficial to these dental hygiene faculty members, especially those beginning with less than full agreement. This study suggests that calculus detection calibration training utilizing dental endoscopy can effectively improve interrater reliability of dental and dental hygiene clinical educators. Future studies should include human subjects, involve more participants at multiple locations, and determine whether improved rater reliability can be sustained over time.
Yapali, Gökmen; Günel, Mintaze Kerem; Karahan, Sevilay
2012-05-15
The study design was cross-cultural adaptation and investigation of reliability and validity of the Copenhagen Neck Functional Disability Scale (CNFDS). The aim of this study was to translate the CNFDS into Turkish language and assess its reliability and validity among patients with neck pain in Turkish population. The CNFDS is a reliable and valid evaluation instrument for disability, but there is no published the Turkish version of the CNFDS. One hundred one subjects who had chronic neck pain were included in this study. The CNFDS, Neck Pain and Disability Scale, and visual analogue scale were administered to all subjects. For investigating test-retest reliability, correlation between CNFDS scores, applied at 1-week interval, intraclass correlation coefficient score for test-retest reliability was 0.86 (95% confidence interval = 0.679-0.935). There was no difference between test-retest scores (P < 0.001). For investigating concurrent validity, correlation between total score of the CNFDS and the mean visual analogue scale was r = 0.73 (P < 0.001). Concurrent validity of the CNFDS was very good. For investigating construct validity, correlation between total score of the CNFDS and the Neck Pain and Disability Scale was r = 0.78 (P < 0.001). Construct validity of the CNFDS was also very good. Our results suggest that the Turkish version of the CNFDS is a reliable and valid instrument for Turkish people.
Van Oyen, Herman; Bogaert, Petronille; Yokota, Renata T C; Berger, Nicolas
2018-01-01
GALI or Global Activity Limitation Indicator is a global survey instrument measuring participation restriction. GALI is the measure underlying the European indicator Healthy Life Years (HLY). Gali has a substantial policy use within the EU and its Member States. The objective of current paper is to bring together what is known from published manuscripts on the validity and the reliability of GALI. Following the PRISMA guidelines, two search strategies (PUBMED, Google Scholar) were combined to identify manuscripts published in English with publication date 2000 or beyond. Articles were classified as reliability studies, concurrent or predictive validity studies, in national or international populations. Four cross-sectional studies (of which 2 international) studied how GALI relates to other health measures (concurrent validity). A dose-response effect by GALI severity level on the association with the other health status measures was observed in the national studies. The 2 international studies (SHARE, EHIS) concluded that the odds of reporting participation restriction was higher in subjects with self-reported or observed functional limitations. In SHARE, the size of the Odds Ratio's (ORs) in the different countries was homogeneous, while in EHIS the size of the ORs varied more strongly. For the predictive validity, subjects were followed over time (4 studies of which one international). GALI proved, both in national and international data, to be a consistent predictor of future health outcomes both in terms of mortality and health care expenditure. As predictors of mortality, the two distinct health concepts, self-rated health and GALI, acted independently and complementary of each other. The one reliability study identified reported a sufficient reliability of GALI. GALI as inclusive one question instrument fits all conceptual characteristics specified for a global measure on participation restriction. In none of the studies, included in the review, there was evidence of a failing validity. The review shows that GALI has a good and sufficient concurrent and predictive validity, and reliability.
Curriculum Design Orientations Preference Scale of Teachers: Validity and Reliability Study
ERIC Educational Resources Information Center
Bas, Gokhan
2013-01-01
The purpose of this study was to develop a valid and reliable scale for preferences of teachers in regard of their curriculum design orientations. Because there was no scale development study similar to this one in Turkey, it was considered as an urgent need to develop such a scale in the study. The sample of the research consisted of 300…
Reliability and Validity of the Sexual Pressure Scale for Women-Revised
Jones, Rachel; Gulick, Elsie
2008-01-01
Sexual pressure among young urban women represents adherence to gender stereotypical expectations to engage in sex. Revision of the original 5-factor Sexual Pressure Scale was undertaken in two studies to improve reliabilities in two of the five factors. In Study 1 the reliability of the Sexual Pressure Scale for Women-Revised (SPSW-R) was tested, and principal components analysis was performed in a sample of 325 young, urban women. A parsimonious 18-item, 4-factor model explained 61% of the variance. In Study 2 the theory underlying sexual pressure was supported by confirmatory factor analysis using structural equation modeling in a sample of 181 women. Reliabilities of the SPSW-R total and subscales were very satisfactory, suggesting it may be used in intervention research. PMID:18666222
The validation of Huffaz Intelligence Test (HIT)
NASA Astrophysics Data System (ADS)
Rahim, Mohd Azrin Mohammad; Ahmad, Tahir; Awang, Siti Rahmah; Safar, Ajmain
2017-08-01
In general, a hafiz who can memorize the Quran has many specialties especially in respect to their academic performances. In this study, the theory of multiple intelligences introduced by Howard Gardner is embedded in a developed psychometric instrument, namely Huffaz Intelligence Test (HIT). This paper presents the validation and the reliability of HIT of some tahfiz students in Malaysia Islamic schools. A pilot study was conducted involving 87 huffaz who were randomly selected to answer the items in HIT. The analysis method used includes Partial Least Square (PLS) on reliability, convergence and discriminant validation. The study has validated nine intelligences. The findings also indicated that the composite reliabilities for the nine types of intelligences are greater than 0.8. Thus, the HIT is a valid and reliable instrument to measure the multiple intelligences among huffaz.
The Children's Play Therapy Instrument (CPTI). Description, development, and reliability studies.
Kernberg, P F; Chazan, S E; Normandin, L
1998-01-01
The Children's Play Therapy Instrument (CPTI), its development, and reliability studies are described. The CPTI is a new instrument to examine a child's play activity in individual psychotherapy. Three independent raters used the CPTI to rate eight videotaped play therapy vignettes. Results were compared with the authors' consensual scores from a preliminary study. Generally good to excellent levels of interrater reliability were obtained for the independent raters on intraclass correlation coefficients for ordinal categories of the CPTI. Likewise, kappa levels were acceptable to excellent for nominal categories of the scale. The CPTI holds promise to become a reliable measure of play activity in child psychotherapy. Further research is needed to assess discriminant validity of the CPTI for use as a diagnostic tool and as a measure of process and outcome.
Agreement, the F-Measure, and Reliability in Information Retrieval
Hripcsak, George; Rothschild, Adam S.
2005-01-01
Information retrieval studies that involve searching the Internet or marking phrases usually lack a well-defined number of negative cases. This prevents the use of traditional interrater reliability metrics like the κ statistic to assess the quality of expert-generated gold standards. Such studies often quantify system performance as precision, recall, and F-measure, or as agreement. It can be shown that the average F-measure among pairs of experts is numerically identical to the average positive specific agreement among experts and that κ approaches these measures as the number of negative cases grows large. Positive specific agreement—or the equivalent F-measure—may be an appropriate way to quantify interrater reliability and therefore to assess the reliability of a gold standard in these studies. PMID:15684123
Schiffman, Eric L; Truelove, Edmond L; Ohrbach, Richard; Anderson, Gary C; John, Mike T; List, Thomas; Look, John O
2010-01-01
The purpose of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) Validation Project was to assess the diagnostic validity of this examination protocol. The aim of this article is to provide an overview of the project's methodology, descriptive statistics, and data for the study participant sample. This article also details the development of reliable methods to establish the reference standards for assessing criterion validity of the Axis I RDC/TMD diagnoses. The Axis I reference standards were based on the consensus of two criterion examiners independently performing a comprehensive history, clinical examination, and evaluation of imaging. Intersite reliability was assessed annually for criterion examiners and radiologists. Criterion examination reliability was also assessed within study sites. Study participant demographics were comparable to those of participants in previous studies using the RDC/TMD. Diagnostic agreement of the criterion examiners with each other and with the consensus-based reference standards was excellent with all kappas > or = 0.81, except for osteoarthrosis (moderate agreement, k = 0.53). Intrasite criterion examiner agreement with reference standards was excellent (k > or = 0.95). Intersite reliability of the radiologists for detecting computed tomography-disclosed osteoarthrosis and magnetic resonance imaging-disclosed disc displacement was good to excellent (k = 0.71 and 0.84, respectively). The Validation Project study population was appropriate for assessing the reliability and validity of the RDC/TMD Axis I and II. The reference standards used to assess the validity of Axis I TMD were based on reliable and clinically credible methods.
Rubio-Ochoa, J; Benítez-Martínez, J; Lluch, E; Santacruz-Zaragozá, S; Gómez-Contreras, P; Cook, C E
2016-02-01
It has been suggested that differential diagnosis of headaches should consist of a robust subjective examination and a detailed physical examination of the cervical spine. Cervicogenic headache (CGH) is a form of headache that involves referred pain from the neck. To our knowledge, no studies have summarized the reliability and diagnostic accuracy of physical examination tests for CGH. The aim of this study was to summarize the reliability and diagnostic accuracy of physical examination tests used to diagnose CGH. A systematic review following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines was performed in four electronic databases (MEDLINE, Web of Science, Embase and Scopus). Full text reports concerning physical tests for the diagnosis of CGH which reported the clinometric properties for assessment of CGH, were included and screened for methodological quality. Quality Appraisal for Reliability Studies (QAREL) and Quality Assessment of Studies of Diagnostic Accuracy (QUADAS-2) scores were completed to assess article quality. Eight articles were retrieved for quality assessment and data extraction. Studies investigating diagnostic reliability of physical examination tests for CGH scored poorer on methodological quality (higher risk of bias) than those of diagnostic accuracy. There is sufficient evidence showing high levels of reliability and diagnostic accuracy of the selected physical examination tests for the diagnosis of CGH. The cervical flexion-rotation test (CFRT) exhibited both the highest reliability and the strongest diagnostic accuracy for the diagnosis of CGH. Copyright © 2015 Elsevier Ltd. All rights reserved.
Lagarde, Marloes L J; Kamalski, Digna M A; van den Engel-Hoek, Lenie
2016-02-01
To systematically review the available evidence for the reliability and validity of cervical auscultation in diagnosing the several aspects of dysphagia in adults and children suffering from dysphagia. Medline (PubMed), Embase and the Cochrane Library databases. The systematic review was carried out applying the steps of the PRISMA-statement. The methodological quality of the included studies were evaluated using the Dutch 'Cochrane checklist for diagnostic accuracy studies'. A total of 90 articles were identified through the search strategy, and after applying the inclusion and exclusion criteria, six articles were included in this review. In the six studies, 197 patients were assessed with cervical auscultation. Two of the six articles were considered to be of 'good' quality and three studies were of 'moderate' quality. One article was excluded because of a 'poor' methodological quality. Sensitivity ranges from 23%-94% and specificity ranges from 50%-74%. Inter-rater reliability was 'poor' or 'fair' in all studies. The intra-rater reliability shows a wide variance among speech language therapists. In this systematic review, conflicting evidence is found for the validity of cervical auscultation. The reliability of cervical auscultation is insufficient when used as a stand-alone tool in the diagnosis of dysphagia in adults. There is no available evidence for the validity and reliability of cervical auscultation in children. Cervical auscultation should not be used as a stand-alone instrument to diagnose dysphagia. © The Author(s) 2015.
Impact of data source on travel time reliability assessment.
DOT National Transportation Integrated Search
2014-08-01
Travel time reliability measures are becoming an increasingly important input to the mobility and : congestion management studies. In the case of Maryland State Highway Administration, reliability : measures are key elements in the agencys Annual ...
Test-retest reliability of the proposed DSM-5 eating disorder diagnostic criteria
Sysko, Robyn; Roberto, Christina A.; Barnes, Rachel D.; Grilo, Carlos M.; Attia, Evelyn; Walsh, B. Timothy
2012-01-01
The proposed DSM-5 classification scheme for eating disorders includes both major and minor changes to the existing DSM-IV diagnostic criteria. It is not known what effect these modifications will have on the ability to make reliable diagnoses. Two studies were conducted to evaluate the short-term test-retest reliability of the proposed DSM-5 eating disorder diagnoses: anorexia nervosa, bulimia nervosa, binge eating disorder, and feeding and eating conditions not elsewhere classified. Participants completed two independent telephone interviews with research assessors (n=70 Study 1; n=55 Study 2). Fair to substantial agreements (κ= 0.80 and 0.54) were observed across eating disorder diagnoses in Study 1 and Study 2, respectively. Acceptable rates of agreement were identified for the individual eating disorder diagnoses, including DSM-5 anorexia nervosa (κ’s of 0.81 to 0.97), bulimia nervosa (κ=0.84), binge eating disorder (κ’s of 0.75 and 0.61), and feeding and eating disorders not elsewhere classified (κ’s of 0.70 and 0.46). Further, improved short-term test-retest reliability was noted when using the DSM-5, in comparison to DSM-IV, criteria for binge eating disorder. Thus, these studies found that trained interviewers can reliably diagnose eating disorders using the proposed DSM-5 criteria; however, additional data from general practice settings and community samples are needed. PMID:22401974
Chiwaridzo, Matthew; Chikasha, Tafadzwa Nicole; Naidoo, Nirmala; Dambi, Jermaine Matewu; Tadyanemhandu, Cathrine; Munambah, Nyaradzai; Chizanga, Precious Trish
2017-01-01
In Zimbabwe, a recent increase in the volume of research on recurrent non-specific low back pain (NSLBP) has revealed that adolescents are commonly affected. This is alarming to health professionals and parents and calls for serious primary preventative strategies to be developed and implemented forthwith. Early identification initiatives should be prioritised in order to curtail the condition and its progression. In an attempt to be proactive in minimising the prevalence of recurrent NSLBP, this study was conducted to evaluate the content validity and test-retest reliability of a survey questionnaire with the aim of proffering a valid and reliable questionnaire which can be used in non-clinical settings to identify adolescents with recurrent NSLBP in Harare, Zimbabwe and determine the possible factors associated with the condition. The study was conducted in two parts. The first part assessed content validity of the questionnaire using four experts derived from academia and clinical practice. The second part evaluated the reliability of the questionnaire among 125 high school-children aged between 13 and 19 years in a test-retest study. Twenty-six (26) out of thirty questions in the questionnaire had an Item Content Validity index of 1.00, demonstrating complete agreement among content experts. Overall, the Scale Content Validity Index for the questionnaire was 0.97. Item completion for the reliability study was satisfactory. The questionnaire items had kappa values ranging from 0.17 (slight agreement) to 1 (perfect agreement). High levels of reliability were found for the questions on school bag use ( k =0.94), sports participation ( k =0.97), and lifetime prevalence ( k =0.89). Excellent content validity and slight to perfect test-retest reliability was found for the Low Back Pain (LBP) questionnaire. These results are comparable to findings of other studies evaluating the psychometric properties of LBP questionnaires. Cognisant of the limitations of the study, the results of this study suggest that the LBP questionnaire could be used in local studies investigating LBP among adolescents although questions enquiring on functional limitations and sciatica may need further consideration.
Reliability and Probabilistic Risk Assessment - How They Play Together
NASA Technical Reports Server (NTRS)
Safie, Fayssal M.; Stutts, Richard G.; Zhaofeng, Huang
2015-01-01
PRA methodology is one of the probabilistic analysis methods that NASA brought from the nuclear industry to assess the risk of LOM, LOV and LOC for launch vehicles. PRA is a system scenario based risk assessment that uses a combination of fault trees, event trees, event sequence diagrams, and probability and statistical data to analyze the risk of a system, a process, or an activity. It is a process designed to answer three basic questions: What can go wrong? How likely is it? What is the severity of the degradation? Since 1986, NASA, along with industry partners, has conducted a number of PRA studies to predict the overall launch vehicles risks. Planning Research Corporation conducted the first of these studies in 1988. In 1995, Science Applications International Corporation (SAIC) conducted a comprehensive PRA study. In July 1996, NASA conducted a two-year study (October 1996 - September 1998) to develop a model that provided the overall Space Shuttle risk and estimates of risk changes due to proposed Space Shuttle upgrades. After the Columbia accident, NASA conducted a PRA on the Shuttle External Tank (ET) foam. This study was the most focused and extensive risk assessment that NASA has conducted in recent years. It used a dynamic, physics-based, integrated system analysis approach to understand the integrated system risk due to ET foam loss in flight. Most recently, a PRA for Ares I launch vehicle has been performed in support of the Constellation program. Reliability, on the other hand, addresses the loss of functions. In a broader sense, reliability engineering is a discipline that involves the application of engineering principles to the design and processing of products, both hardware and software, for meeting product reliability requirements or goals. It is a very broad design-support discipline. It has important interfaces with many other engineering disciplines. Reliability as a figure of merit (i.e. the metric) is the probability that an item will perform its intended function(s) for a specified mission profile. In general, the reliability metric can be calculated through the analyses using reliability demonstration and reliability prediction methodologies. Reliability analysis is very critical for understanding component failure mechanisms and in identifying reliability critical design and process drivers. The following sections discuss the PRA process and reliability engineering in detail and provide an application where reliability analysis and PRA were jointly used in a complementary manner to support a Space Shuttle flight risk assessment.
NASA Technical Reports Server (NTRS)
Matlock, Steve
2001-01-01
This is the final report and addresses all of the work performed on this program. Specifically, it covers vehicle architecture background, definition of six baseline engine cycles, reliability baseline (space shuttle main engine QRAS), and component level reliability/performance/cost for the six baseline cycles, and selection of 3 cycles for further study. This report further addresses technology improvement selection and component level reliability/performance/cost for the three cycles selected for further study, as well as risk reduction plans, and recommendation for future studies.
Quinn, Amity E; Rosen, Rochelle K; McGeary, John E; Amoa, Francine; Kranzler, Henry R; Francazio, Sarah; McGarvey, Stephen T; Swift, Robert M
2014-01-01
The aims of this study were to develop a bilingual version of the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA) in English and Samoan and determine the reliability of assessments of alcohol dependence in American Samoa. The study consisted of development and reliability-testing phases. In the development phase, the SSADDA alcohol module was translated and the translation was evaluated through cognitive interviews. In the reliability-testing phase, the bilingual SSADDA was administered to 40 ethnic Samoans, including a sub-sample of 26 individuals who were retested. Cognitive interviews indicated the initial translation was culturally and linguistically appropriate except items pertaining to alcohol tolerance, which were modified to reflect Samoan concepts. SSADDA reliability testing indicated diagnoses of DSM-III-R and DSM-IV alcohol dependence were reliable. Reliability varied by language of administration. The English/Samoan version of the SSADDA is appropriate for the diagnosis of DSM-III-R alcohol dependence, which may be useful in advancing research and public health efforts to address alcohol problems in American Samoa and the Western Pacific. The translation methods may inform researchers translating diagnostic and assessment tools into different languages and cultures. © The Author 2014. Medical Council on Alcohol and Oxford University Press. All rights reserved.
Multidisciplinary System Reliability Analysis
NASA Technical Reports Server (NTRS)
Mahadevan, Sankaran; Han, Song; Chamis, Christos C. (Technical Monitor)
2001-01-01
The objective of this study is to develop a new methodology for estimating the reliability of engineering systems that encompass multiple disciplines. The methodology is formulated in the context of the NESSUS probabilistic structural analysis code, developed under the leadership of NASA Glenn Research Center. The NESSUS code has been successfully applied to the reliability estimation of a variety of structural engineering systems. This study examines whether the features of NESSUS could be used to investigate the reliability of systems in other disciplines such as heat transfer, fluid mechanics, electrical circuits etc., without considerable programming effort specific to each discipline. In this study, the mechanical equivalence between system behavior models in different disciplines are investigated to achieve this objective. A new methodology is presented for the analysis of heat transfer, fluid flow, and electrical circuit problems using the structural analysis routines within NESSUS, by utilizing the equivalence between the computational quantities in different disciplines. This technique is integrated with the fast probability integration and system reliability techniques within the NESSUS code, to successfully compute the system reliability of multidisciplinary systems. Traditional as well as progressive failure analysis methods for system reliability estimation are demonstrated, through a numerical example of a heat exchanger system involving failure modes in structural, heat transfer and fluid flow disciplines.
Alberta infant motor scale: reliability and validity when used on preterm infants in Taiwan.
Jeng, S F; Yau, K I; Chen, L C; Hsiao, S F
2000-02-01
The goal of this study was to examine the reliability and validity of measurements obtained with the Alberta Infant Motor Scale (AIMS) for evaluation of preterm infants in Taiwan. Two independent groups of preterm infants were used to investigate the reliability (n=45) and validity (n=41) for the AIMS. In the reliability study, the AIMS was administered to the infants by a physical therapist, and infant performance was videotaped. The performance was then rescored by the same therapist and by 2 other therapists to examine the intrarater and interrater reliability. In the validity study, the AIMS and the Bayley Motor Scale were administered to the infants at 6 and 12 months of age to examine criterion-related validity. Intraclass correlation coefficients (ICCs) for intrarater and interrater reliability of measurements obtained with the AIMS were high (ICC=.97-.99). The AIMS scores correlated with the Bayley Motor Scale scores at 6 and 12 months (r=.78 and.90), although the AIMS scores at 6 months were only moderately predictive of the motor function at 12 months (r=.56). The results suggest that measurements obtained with the AIMS have acceptable reliability and concurrent validity but limited predictive value for evaluating preterm Taiwanese infants.
Multi-Disciplinary System Reliability Analysis
NASA Technical Reports Server (NTRS)
Mahadevan, Sankaran; Han, Song
1997-01-01
The objective of this study is to develop a new methodology for estimating the reliability of engineering systems that encompass multiple disciplines. The methodology is formulated in the context of the NESSUS probabilistic structural analysis code developed under the leadership of NASA Lewis Research Center. The NESSUS code has been successfully applied to the reliability estimation of a variety of structural engineering systems. This study examines whether the features of NESSUS could be used to investigate the reliability of systems in other disciplines such as heat transfer, fluid mechanics, electrical circuits etc., without considerable programming effort specific to each discipline. In this study, the mechanical equivalence between system behavior models in different disciplines are investigated to achieve this objective. A new methodology is presented for the analysis of heat transfer, fluid flow, and electrical circuit problems using the structural analysis routines within NESSUS, by utilizing the equivalence between the computational quantities in different disciplines. This technique is integrated with the fast probability integration and system reliability techniques within the NESSUS code, to successfully compute the system reliability of multi-disciplinary systems. Traditional as well as progressive failure analysis methods for system reliability estimation are demonstrated, through a numerical example of a heat exchanger system involving failure modes in structural, heat transfer and fluid flow disciplines.
Porter, Anna K; Wen, Fang; Herring, Amy H; Rodríguez, Daniel A; Messer, Lynne C; Laraia, Barbara A; Evenson, Kelly R
2018-06-01
Reliable and stable environmental audit instruments are needed to successfully identify the physical and social attributes that may influence physical activity. This study described the reliability and stability of the PIN3 environmental audit instrument in both urban and rural neighborhoods. Four randomly sampled road segments in and around a one-quarter mile buffer of participants' residences from the Pregnancy, Infection, and Nutrition (PIN3) study were rated twice, approximately 2 weeks apart. One year later, 253 of the year 1 sampled roads were re-audited. The instrument included 43 measures that resulted in 73 item scores for calculation of percent overall agreement, kappa statistics, and log-linear models. For same-day reliability, 81% of items had moderate to outstanding kappa statistics (kappas ≥ 0.4). Two-week reliability was slightly lower, with 77% of items having moderate to outstanding agreement using kappa statistics. One-year stability had 68% of items showing moderate to outstanding agreement using kappa statistics. The reliability of the audit measures was largely consistent when comparing urban to rural locations, with only 8% of items exhibiting significant differences (α < 0.05) by urbanicity. The PIN3 instrument is a reliable and stable audit tool for studies assessing neighborhood attributes in urban and rural environments.
NDE reliability and probability of detection (POD) evolution and paradigm shift
NASA Astrophysics Data System (ADS)
Singh, Surendra
2014-02-01
The subject of NDE Reliability and POD has gone through multiple phases since its humble beginning in the late 1960s. This was followed by several programs including the important one nicknamed "Have Cracks - Will Travel" or in short "Have Cracks" by Lockheed Georgia Company for US Air Force during 1974-1978. This and other studies ultimately led to a series of developments in the field of reliability and POD starting from the introduction of fracture mechanics and Damaged Tolerant Design (DTD) to statistical framework by Bernes and Hovey in 1981 for POD estimation to MIL-STD HDBK 1823 (1999) and 1823A (2009). During the last decade, various groups and researchers have further studied the reliability and POD using Model Assisted POD (MAPOD), Simulation Assisted POD (SAPOD), and applying Bayesian Statistics. All and each of these developments had one objective, i.e., improving accuracy of life prediction in components that to a large extent depends on the reliability and capability of NDE methods. Therefore, it is essential to have a reliable detection and sizing of large flaws in components. Currently, POD is used for studying reliability and capability of NDE methods, though POD data offers no absolute truth regarding NDE reliability, i.e., system capability, effects of flaw morphology, and quantifying the human factors. Furthermore, reliability and POD have been reported alike in meaning but POD is not NDE reliability. POD is a subset of the reliability that consists of six phases: 1) samples selection using DOE, 2) NDE equipment setup and calibration, 3) System Measurement Evaluation (SME) including Gage Repeatability &Reproducibility (Gage R&R) and Analysis Of Variance (ANOVA), 4) NDE system capability and electronic and physical saturation, 5) acquiring and fitting data to a model, and data analysis, and 6) POD estimation. This paper provides an overview of all major POD milestones for the last several decades and discuss rationale for using Integrated Computational Materials Engineering (ICME), MAPOD, SAPOD, and Bayesian statistics for studying controllable and non-controllable variables including human factors for estimating POD. Another objective is to list gaps between "hoped for" versus validated or fielded failed hardware.
Molander, Linda; Hanberg, Annika; Rudén, Christina; Ågerstrand, Marlene; Beronius, Anna
2017-03-01
Different tools have been developed that facilitate systematic and transparent evaluation and handling of toxicity data in the risk assessment process. The present paper sets out to explore the combined use of two web-based tools for study evaluation and identification of reliable data relevant to health risk assessment. For this purpose, a case study was performed using in vivo toxicity studies investigating low-dose effects of bisphenol A on mammary gland development. The reliability of the mammary gland studies was evaluated using the Science in Risk Assessment and Policy (SciRAP) criteria for toxicity studies. The Health Assessment Workspace Collaborative (HAWC) was used for characterizing and visualizing the mammary gland data in terms of type of effects investigated and reported, and the distribution of these effects within the dose interval. It was then investigated whether there was any relationship between study reliability and the type of effects reported and/or their distribution in the dose interval. The combination of the SciRAP and HAWC tools allowed for transparent evaluation and visualization of the studies investigating developmental effects of BPA on the mammary gland. The use of these tools showed that there were no apparent differences in the type of effects and their distribution in the dose interval between the five studies assessed as most reliable and the whole data set. Combining the SciRAP and HAWC tools was found to be a useful approach for evaluating in vivo toxicity studies and identifying reliable and sensitive information relevant to regulatory risk assessment of chemicals. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Towards early software reliability prediction for computer forensic tools (case study).
Abu Talib, Manar
2016-01-01
Versatility, flexibility and robustness are essential requirements for software forensic tools. Researchers and practitioners need to put more effort into assessing this type of tool. A Markov model is a robust means for analyzing and anticipating the functioning of an advanced component based system. It is used, for instance, to analyze the reliability of the state machines of real time reactive systems. This research extends the architecture-based software reliability prediction model for computer forensic tools, which is based on Markov chains and COSMIC-FFP. Basically, every part of the computer forensic tool is linked to a discrete time Markov chain. If this can be done, then a probabilistic analysis by Markov chains can be performed to analyze the reliability of the components and of the whole tool. The purposes of the proposed reliability assessment method are to evaluate the tool's reliability in the early phases of its development, to improve the reliability assessment process for large computer forensic tools over time, and to compare alternative tool designs. The reliability analysis can assist designers in choosing the most reliable topology for the components, which can maximize the reliability of the tool and meet the expected reliability level specified by the end-user. The approach of assessing component-based tool reliability in the COSMIC-FFP context is illustrated with the Forensic Toolkit Imager case study.
Dreessen, L; Arntz, A
1998-01-01
The short-interval test-retest interrater reliability of the Structured Clinical Interview for DSM-III-R personality disorders (SCID-II) was studied in a psychotherapy outpatient group whose main complaint was mostly an Axis I anxiety disorder. Using a test-retest approach to assess interrater reliability, three sources of variance were taken into account (rater variance in the elicitation and interpretation of information and patient variance across interviews). Base rate requirements were established before calculating reliability coefficients. On the whole, interrater agreement on the SCID-II was found to be satisfactory, except for the histrionic personality traits. This is the first study that has estimated short-interval test-retest interrater reliability of the SCID-II in outpatients, and also the first that has studied single SCID-II traits and dimensional diagnoses. The results found support the use of the SCID-II as a diagnostic instrument for clinical and research purposes.
Validation of a new classification system for skin tears.
LeBlanc, Kimberly; Baranoski, Sharon; Holloway, Samantha; Langemo, Diane
2013-06-01
The aim of this study was to validate and establish reliability of the International Skin Tear classification system. A consensus panel of 12 internationally recognized key opinion leaders convened in 2011 to establish consensus statements on the prevention, prediction, assessment, and treatment of skin tears. Subsequently, a new skin tear classification system was proposed. The system was then tested for interrater and intrarater reliability between the experts before being tested more widely on a sample of 327 individuals from the United States, Canada, and Europe. The results of the study indicated a substantial level of agreement for the expert panel (Fleiss κ = 0.619; 2-month follow-up = 0.653). Intrarater reliability was high (Cohen κ = 0.877). Interrater reliability was moderate (Fleiss κ = 0.555) for healthcare professionals (n = 303) and fair for non-health professionals (Fleiss κ = 0.338; n = 24). This international study established the reliability and validity of a new classification system for skin tears.
Jung, Kyoung-Sim; Jung, Jin-Hwa; In, Tae-Sung; Cho, Hwi-Young
2016-09-01
[Purpose] The purpose of this study was to establish the reliability and validity of the Short Musculoskeletal Function Assessment questionnaire, which was translated into Korean, for patients with musculoskeletal disorder. [Subjects and Methods] Fifty-five subjects (26 males and 29 females) with musculoskeletal diseases participated in the study. The Short Musculoskeletal Function Assessment questionnaire focuses on a limited range of physical functions and includes a dysfunction index and a bother index. Reliability was determined using the intraclass correlation coefficient, and validity was examined by correlating short musculoskeletal function assessment scores with the 36-item Short-Form Health Survey (SF-36) score. [Results] The reliability was 0.97 for the dysfunction index and 0.94 for the bother index. Validity was established by comparison with Korean version of the SF-36. [Conclusion] This study demonstrated that the Korean version of the Short Musculoskeletal Function Assessment questionnaire is a reliable and valid instrument for the assessment of musculoskeletal disorders.
Test-retest reliability of resting-state magnetoencephalography power in sensor and source space.
Martín-Buro, María Carmen; Garcés, Pilar; Maestú, Fernando
2016-01-01
Several studies have reported changes in spontaneous brain rhythms that could be used as clinical biomarkers or in the evaluation of neuropsychological and drug treatments in longitudinal studies using magnetoencephalography (MEG). There is an increasing necessity to use these measures in early diagnosis and pathology progression; however, there is a lack of studies addressing how reliable they are. Here, we provide the first test-retest reliability estimate of MEG power in resting-state at sensor and source space. In this study, we recorded 3 sessions of resting-state MEG activity from 24 healthy subjects with an interval of a week between each session. Power values were estimated at sensor and source space with beamforming for classical frequency bands: delta (2-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), low beta (13-20 Hz), high beta (20-30 Hz), and gamma (30-45 Hz). Then, test-retest reliability was evaluated using the intraclass correlation coefficient (ICC). We also evaluated the relation between source power and the within-subject variability. In general, ICC of theta, alpha, and low beta power was fairly high (ICC > 0.6) while in delta and gamma power was lower. In source space, fronto-posterior alpha, frontal beta, and medial temporal theta showed the most reliable profiles. Signal-to-noise ratio could be partially responsible for reliability as low signal intensity resulted in high within-subject variability, but also the inherent nature of some brain rhythms in resting-state might be driving these reliability patterns. In conclusion, our results described the reliability of MEG power estimates in each frequency band, which could be considered in disease characterization or clinical trials. © 2015 Wiley Periodicals, Inc.
Research on Novel Algorithms for Smart Grid Reliability Assessment and Economic Dispatch
NASA Astrophysics Data System (ADS)
Luo, Wenjin
In this dissertation, several studies of electric power system reliability and economy assessment methods are presented. To be more precise, several algorithms in evaluating power system reliability and economy are studied. Furthermore, two novel algorithms are applied to this field and their simulation results are compared with conventional results. As the electrical power system develops towards extra high voltage, remote distance, large capacity and regional networking, the application of a number of new technique equipments and the electric market system have be gradually established, and the results caused by power cut has become more and more serious. The electrical power system needs the highest possible reliability due to its complication and security. In this dissertation the Boolean logic Driven Markov Process (BDMP) method is studied and applied to evaluate power system reliability. This approach has several benefits. It allows complex dynamic models to be defined, while maintaining its easy readability as conventional methods. This method has been applied to evaluate IEEE reliability test system. The simulation results obtained are close to IEEE experimental data which means that it could be used for future study of the system reliability. Besides reliability, modern power system is expected to be more economic. This dissertation presents a novel evolutionary algorithm named as quantum evolutionary membrane algorithm (QEPS), which combines the concept and theory of quantum-inspired evolutionary algorithm and membrane computation, to solve the economic dispatch problem in renewable power system with on land and offshore wind farms. The case derived from real data is used for simulation tests. Another conventional evolutionary algorithm is also used to solve the same problem for comparison. The experimental results show that the proposed method is quick and accurate to obtain the optimal solution which is the minimum cost for electricity supplied by wind farm system.
The Yale-Brown Obsessive Compulsive Scale: A Reliability Generalization Meta-Analysis.
López-Pina, José Antonio; Sánchez-Meca, Julio; López-López, José Antonio; Marín-Martínez, Fulgencio; Núñez-Núñez, Rosa Maria; Rosa-Alcázar, Ana I; Gómez-Conesa, Antonia; Ferrer-Requena, Josefa
2015-10-01
The Yale-Brown Obsessive Compulsive Scale (Y-BOCS) is the most frequently applied test to assess obsessive compulsive symptoms. We conducted a reliability generalization meta-analysis on the Y-BOCS to estimate the average reliability, examine the variability among the reliability estimates, search for moderators, and propose a predictive model that researchers and clinicians can use to estimate the expected reliability of the Y-BOCS. We included studies where the Y-BOCS was applied to a sample of adults and reliability estimate was reported. Out of the 11,490 references located, 144 studies met the selection criteria. For the total scale, the mean reliability was 0.866 for coefficients alpha, 0.848 for test-retest correlations, and 0.922 for intraclass correlations. The moderator analyses led to a predictive model where the standard deviation of the total test and the target population (clinical vs. nonclinical) explained 38.6% of the total variability among coefficients alpha. Finally, clinical implications of the results are discussed. © The Author(s) 2014.
Larson, Tomas; Kerekes, Nóra; Selinus, Eva Norén; Lichtenstein, Paul; Gumpert, Clara Hellner; Anckarsäter, Henrik; Nilsson, Thomas; Lundström, Sebastian
2014-02-01
The Autism-Tics, AD/HD, and other Comorbidities (A-TAC) inventory is used in epidemiological research to assess neurodevelopmental problems and coexisting conditions. Although the A-TAC has been applied in various populations, data on retest reliability are limited. The objective of the present study was to present additional reliability data. The A-TAC was administered by lay assessors and was completed on two occasions by parents of 400 individual twins, with an average interval of 70 days between test sessions. Intra- and inter-rater reliability were analysed with intraclass correlations and Cohen's kappa. A-TAC showed excellent test-retest intraclass correlations for both autism spectrum disorder and attention deficit hyperactivity disorder (each at .84). Most modules in the A-TAC had intra- and inter-rater reliability intraclass correlation coefficients of > or = .60. Cohen's kappa indi- cated acceptable reliability. The current study provides statistical evidence that the A-TAC yields good test-retest reliability in a population-based cohort of children.
A Study of Reliability of Marking and Absolute Grading in Secondary Schools
ERIC Educational Resources Information Center
Abdul Gafoor, K.; Jisha, P.
2014-01-01
Using a non-experimental comparative group design in a sample consisting of 100 English teachers randomly selected from 30 secondary schools of a district of Kerala and assigning fifty teachers to groups for marking and grading, this study compares inter and intra-individual reliability in marking and absolute grading. Studying (1) the in marking…
Water Awareness Scale for Pre-Service Science Teachers: Validity and Reliability Study
ERIC Educational Resources Information Center
Filik Iscen, Cansu
2015-01-01
The role of teachers in the formation of environmentally sensitive behaviors in students is quite high. Thus, the water awareness of teachers, who represent role models for students, is rather important. The main purpose of this study is to identify the reliability and validity study outcomes of the Water Awareness Scale, which was developed to…
Developing a Scale for Innovation Management at Schools: A Study of Validity and Reliability
ERIC Educational Resources Information Center
Bulbul, Tuncer
2012-01-01
The purpose of this study is to develop a valid and reliable assessment tool for use in determining the competency beliefs of school administrators about innovation management. The scale applied to a study group of 216 school administrators, after work Centered on assessing intelligibility and specialized opinion. Exploratory and confirmatory…
ERIC Educational Resources Information Center
Higgs, Philip; Keevy, James
2007-01-01
This article reflects on the reliability of the evidence contained in the National Qualifications Framework Impact Study, a longitudinal comparative study conducted by the South African Qualifications Authority since 2002. In so doing, the veracity of evidence-based research in determining the impact of the South African Qualifications Framework…
ERIC Educational Resources Information Center
Sayin, Ayfer; Sahin, Mustafa Yasar
2017-01-01
The present study aimed to provide a Turkish adaptation of the Organizational Justice in Sport Scale and perform reliability and validity studies. Answers provided by 260 participants who work as football, male basketball and female basketball coaches in National Collegiate Athletic Association (NCAA) were analysed using the original scale that…
Evaluation of Reading Habits of Teacher Candidates: Study of Scale Development
ERIC Educational Resources Information Center
Erkan, Senem Seda Sahenk; Dagal, Asude Balaban; Tezcan, Özlem
2016-01-01
The main purpose of this study was to develop a valid and reliable scale for printed and digital competencies ("The Printed and Digital Reading Habits Scale"). The problem statement of this research can be expressed as: "The Printed and Digital Reading Habits Scale: is a valid and reliable scale?" In this study, the scale…
The Depression Anxiety and Stress Scale (DASS): The Study of Validity and Reliability
ERIC Educational Resources Information Center
Akin, Ahmet; Cetin, Bayram
2007-01-01
This study investigated the validity and reliability of the Turkish version of the Depression Anxiety Stress Scale (DASS). The sample of the study consisted of 590 university students, 121 English teachers and 136 emotionally disturbed individuals who sought treatment in various clinics and counseling centers. Factor loadings of the scale ranged…
ERIC Educational Resources Information Center
Sprenger-Charolles, Liliane; Cole, Pascale; Kipffer-Piquard, Agnes; Pinton, Florence; Billard, Catherine
2009-01-01
In the present study, conducted with French-speaking children, we examined the reliability (group study) and the prevalence (multiple-case study) of dyslexics' phonological deficits in reading and reading-related skills in comparison with Reading Level (RL) controls. All dyslexics with no comorbidity problem schooled in a special institution for…
Basic School Skills Inventory-3: Validity and Reliability Study
ERIC Educational Resources Information Center
Yildiz, F. Ülkü; Çagdas, Aysel; Kayili, Gökhan
2017-01-01
The purpose of this study is to perform the validity-reliability analysis of the three subtests of Basic School Skills Inventory 3--Mathematics, Classroom Behavior and Daily Life skills--and do its adaptation for four to six year-old Turkish children. The sample of the study included 595 four to six year-old Turkish children attending public and…
ERIC Educational Resources Information Center
Wei, Meifen; Alvarez, Alvin N.; Ku, Tsun-Yao; Russell, Daniel W.; Bonett, Douglas G.
2010-01-01
Four studies were conducted to develop and validate the Coping With Discrimination Scale (CDS). In Study 1, an exploratory factor analysis (N = 328) identified 5 factors: Education/Advocacy, Internalization, Drug and Alcohol Use, Resistance, and Detachment, with internal consistency reliability estimates ranging from 0.72 to 0.90. In Study 2, a…
Developing Valid and Reliable Map Literacy Scale
ERIC Educational Resources Information Center
Koç, Hakan; Demir, Selçuk Besir
2014-01-01
The purpose of the present study is to develop a valid and reliable map literacy scale that is able to determine map literacy of individuals, especially that of high school and university students. The study sample was composed of 518 students studying at various faculties at Cumhuriyet University and high schools in Sivas and its counties. With…
Reliability Generalization of the Psychopathy Checklist Applied in Youthful Samples
ERIC Educational Resources Information Center
Campbell, Justin S.; Pulos, Steven; Hogan, Mike; Murry, Francie
2005-01-01
This study examines the average reliability of Hare Psychopathy Checklists (PCLs) adapted for use in samples of youthful offenders (aged 12 to 21 years). Two forms of reliability are examined: 18 alpha estimates of internal consistency and 18 intraclass correlation (two or more raters) estimates of interrater reliability. The results, an average…
Performance Evaluation of Reliable Multicast Protocol for Checkout and Launch Control Systems
NASA Technical Reports Server (NTRS)
Shu, Wei Wennie; Porter, John
2000-01-01
The overall objective of this project is to study reliability and performance of Real Time Critical Network (RTCN) for checkout and launch control systems (CLCS). The major tasks include reliability and performance evaluation of Reliable Multicast (RM) package and fault tolerance analysis and design of dual redundant network architecture.
ERIC Educational Resources Information Center
Morgan, Grant B.; Zhu, Min; Johnson, Robert L.; Hodge, Kari J.
2014-01-01
Common estimators of interrater reliability include Pearson product-moment correlation coefficients, Spearman rank-order correlations, and the generalizability coefficient. The purpose of this study was to examine the accuracy of estimators of interrater reliability when varying the true reliability, number of scale categories, and number of…
Testing the Difference between Reliability Coefficients Alpha and Omega
ERIC Educational Resources Information Center
Deng, Lifang; Chan, Wai
2017-01-01
Reliable measurements are key to social science research. Multiple measures of reliability of the total score have been developed, including coefficient alpha, coefficient omega, the greatest lower bound reliability, and others. Among these, the coefficient alpha has been most widely used, and it is reported in nearly every study involving the…
The Reliability of Difference Scores in Populations and Samples
ERIC Educational Resources Information Center
Zimmerman, Donald W.
2009-01-01
This study was an investigation of the relation between the reliability of difference scores, considered as a parameter characterizing a population of examinees, and the reliability estimates obtained from random samples from the population. The parameters in familiar equations for the reliability of difference scores were redefined in such a way…
Cerin, Ester; Sit, Cindy H P; Huang, Ya-Jun; Barnett, Anthony; Macfarlane, Duncan J; Wong, Stephen S H
2014-06-06
Physical activity and sedentary behaviour are important contributors to adolescents' health. These behaviours may be affected by the school and neighbourhood built environments. However, current evidence on such effects is mainly limited to Western countries. The International Physical Activity and the Environment Network (IPEN)-Adolescent study aims to examine associations of the built environment with adolescent physical activity and sedentary behaviour across five continents.We report on the repeatability of measures of in-school and out-of school physical activity, plus measures of out-of-school sedentary and travel behaviours adopted by the IPEN - Adolescent study and adapted for Chinese-speaking Hong Kong adolescents participating in the international Healthy environments and active living in teenagers-(Hong Kong) [iHealt(H)] study, which is part of IPEN-Adolescent. Items gauging in-school physical activity and out-of-school physical activity, and out-of-school sedentary and travel behaviours developed for the IPEN - Adolescent study were translated from English into Chinese, adapted, and pilot tested. Sixty-eight Chinese-speaking 12-17 year old secondary school students (36 boys; 32 girls) residing in areas of Hong Kong differing in transport-related walkability were recruited. They self-completed the survey items twice, 8-16 days apart. Test-retest reliability was assessed for the whole sample and by gender using one-way random effects intra-class correlation coefficients (ICC). Test-retest reliability of items with restricted variability was assessed using percentage agreement. Overall test-retest reliability of items and scales was moderate to excellent (ICC = 0.47-0.92). Items with restricted variability in responses had a high percentage agreement (92%-100%). Test-retest reliability was similar in girls and boys, with the exception of daily hours of homework (reliability higher in girls) and number of school-based sports teams or after-school physical activity classes (reliability higher in boys). The translated and adapted self-report measures of physical activity, sedentary and travel behaviours used in the iHealt(H) study are sufficiently reliable. Levels of reliability are comparable or slightly higher than those observed for the original measures.
Relating design and environmental variables to reliability
NASA Astrophysics Data System (ADS)
Kolarik, William J.; Landers, Thomas L.
The combination of space application and nuclear power source demands high reliability hardware. The possibilities of failure, either an inability to provide power or a catastrophic accident, must be minimized. Nuclear power experiences on the ground have led to highly sophisticated probabilistic risk assessment procedures, most of which require quantitative information to adequately assess such risks. In the area of hardware risk analysis, reliability information plays a key role. One of the lessons learned from the Three Mile Island experience is that thorough analyses of critical components are essential. Nuclear grade equipment shows some reliability advantages over commercial. However, no statistically significant difference has been found. A recent study pertaining to spacecraft electronics reliability, examined some 2500 malfunctions on more than 300 aircraft. The study classified the equipment failures into seven general categories. Design deficiencies and lack of environmental protection accounted for about half of all failures. Within each class, limited reliability modeling was performed using a Weibull failure model.
Time-to-pregnancy and pregnancy outcomes in a South African population
2010-01-01
Background Time-to-pregnancy (TTP) has never been studied in an African setting and there are no data on the rates of adverse pregnancy outcomes in South Africa. The study objectives were to measure TTP and the rates of adverse pregnancy outcomes in South Africa, and to determine the reliability of the questionnaire tool. Methods The study was cross-sectional and applied systematic stratified sampling to obtain a representative sample of reproductive age women for a South African population. Data on socio-demographic, work, health and reproductive variables were collected on 1121 women using a standardized questionnaire. A small number (n = 73) of randomly selected questionnaires was repeated to determine reliability of the questionnaire. Data was described using simple summary statistics while Kappa and intra-class correlation statistics were calculated for reliability. Results Of the 1121 women, 47 (4.2%) had never been pregnant. Mean gravidity was 2.3 while mean parity was 2.0 There were a total of 2467 pregnancies; most (87%) resulted in live births, 9.5% in spontaneous abortion and 2.2% in still births. The proportion of planned pregnancies was 39% and the median TTP was 6 months. The reliability of the questionnaire for TTP data was good; 63% for all participants and 97% when censored at 14 months. Overall reliability of reporting adverse pregnancy outcomes was very high, ranging from 90 - 98% for most outcomes. Conclusion This is the first comprehensive population-based reproductive health study in South Africa, to describe the biologic fertility of the population, and provides rates for planned pregnancies and adverse pregnancy outcomes. The reliability of the study questionnaire was substantial, with most outcomes within 70 - 100% reliability index. The study provides important public information for health practitioners and researchers in reproductive health. It also highlights the need for public health intervention programmes and epidemiological research on biologic fertility and adverse pregnancy outcomes in the population. PMID:20858279
Method for the Study of Category III Airborne Procedure Reliability
DOT National Transportation Integrated Search
1973-03-01
A method for the study of Category 3 airborne-procedure reliability is presented. The method, based on PERT concepts, is considered to have utility at the outset of a procedure-design cycle and during the early accumulation of actual performance data...
Inter-rater and intra-rater reliability of a movement control test in shoulder.
Rajasekar, S; Bangera, Rakshith K; Sekaran, Padmanaban
2017-07-01
Movement faults are commonly observed in patients with musculoskeletal pain. The Kinetic Medial Rotation Test (KMRT) is a movement control test used to identify movement faults of the scapula and gleno-humeral joints during arm movement. Objective tests such as the KMRT need to be reliable and valid for the results to be applied across different clinical settings and patient populations. The primary objective of the present study was to determine the intra-rater and inter-rater reliability of KMRT in subjects with and without shoulder pain. Sixty subjects were included in this study based on specific inclusion and exclusion criteria. Two musculoskeletal physiotherapists with different levels of clinical experience performed the tests. The intra-rater reliability was tested in twenty asymptomatic subjects by a single assessor at two week intervals. An equal number of subjects with and without shoulder pain were tested by both the assessors to determine the inter-rater reliability. Both components of the KMRT, the Gleno- Humeral Anterior Translation (GHAT) and the Scapular Forward Tilt (SCFT) were tested. The Kappa values for inter-rater reliability of the GHAT and SCFT were K = 0.68 & K = 0.65 respectively in subjects with shoulder pain. In asymptomatic subjects, the inter-rater reliability of GHAT was K = 0.61 and SCFT was K = 0.85. Intra-rater reliability ranged from K = 0.66 for GHAT to K = 0.87 for SCFT. Our study found substantial agreement in inter-rater reliability of KMRT in subjects with shoulder pain, whereas substantial to near perfect agreement was found in intra-rater and inter-rater reliability of KMRT in subjects without shoulder pain. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evaluating information skills training in health libraries: a systematic review.
Brettle, Alison
2007-12-01
Systematic reviews have shown that there is limited evidence to demonstrate that the information literacy training health librarians provide is effective in improving clinicians' information skills or has an impact on patient care. Studies lack measures which demonstrate validity and reliability in evaluating the impact of training. To determine what measures have been used; the extent to which they are valid and reliable; to provide guidance for health librarians who wish to evaluate the impact of their information skills training. Systematic review methodology involved searching seven databases, and personal files. Studies were included if they were about information skills training, used an objective measure to assess outcomes, and occurred in a health setting. Fifty-four studies were included in the review. Most outcome measures used in the studies were not tested for the key criteria of validity and reliability. Three tested for validity and reliability are described in more detail. Selecting an appropriate measure to evaluate the impact of training is a key factor in carrying out any evaluation. This systematic review provides guidance to health librarians by highlighting measures used in various circumstances, and those that demonstrate validity and reliability.
Fatehi, Zahra; Baradaran, Hamid Reza; Asadpour, Mohamad; Rezaeian, Mohsen
2017-01-01
Background: Individuals' listening styles differs based on their characters, professions and situations. This study aimed to assess the validity and reliability of Listening Styles Profile- Revised (LSP- R) in Iranian students. Methods: After translating into Persian, LSP-R was employed in a sample of 240 medical and nursing Persian speaking students in Iran. Statistical analysis was performed to test the reliability and validity of the LSP-R. Results: The study revealed high internal consistency and good test-retest reliability for the Persian version of the questionnaire. The Cronbach's alpha coefficient was 0.72 and intra-class correlation coefficient 0.87. The means for the content validity index and the content validity ratio (CVR) were 0.90 and 0.83, respectively. Exploratory factor analysis (EFA) yielded a four-factor solution accounted for 60.8% of the observed variance. Majority of medical students (73%) as well as majority of nursing students (70%) stated that their listening styles were task-oriented. Conclusion: In general, the study finding suggests that the Persian version of LSP-R is a valid and reliable instrument for assessing listening styles profile in the studied sample.
Yang, Nan; Waddington, Gordon; Adams, Roger; Han, Jia
2018-05-01
Quantitative assessments of handedness and footedness are often required in studies of human cognition and behaviour, yet no reliable Chinese versions of commonly used handedness and footedness questionnaires are available. Accordingly, the objective of the present study was to translate the Edinburgh Handedness Inventory (EHI) and the Waterloo Footedness Questionnaire-Revised (WFQ-R) into Mandarin Chinese and to evaluate the reliability and validity of these translated versions in healthy Chinese people. In the first stage of the study, Chinese versions of the EHI and WFQ-R were produced from a process of translation, back translation and examination, with necessary cultural adaptations. The second stage involved determining the reliability and validity of the translated EHI and WFQ-R for the Chinese population. One hundred and ten Chinese participants were tested online, and the results showed that the Cronbach's alpha coefficient of internal consistency was 0.877 for the translated EHI and 0.855 for the translated WFQ-R. Another 170 Chinese participants were tested and re-tested after a 30-day interval. The intra-class correlation coefficients showed high reliability, 0.898 for the translated EHI and 0.869 for the translated WFQ-R. This preliminary validation study found the translated versions to be reliable and valid tools for assessing handedness and footedness in this population.
Validity and Reliability Study of the Korean Tinetti Mobility Test for Parkinson's Disease.
Park, Jinse; Koh, Seong-Beom; Kim, Hee Jin; Oh, Eungseok; Kim, Joong-Seok; Yun, Ji Young; Kwon, Do-Young; Kim, Younsoo; Kim, Ji Seon; Kwon, Kyum-Yil; Park, Jeong-Ho; Youn, Jinyoung; Jang, Wooyoung
2018-01-01
Postural instability and gait disturbance are the cardinal symptoms associated with falling among patients with Parkinson's disease (PD). The Tinetti mobility test (TMT) is a well-established measurement tool used to predict falls among elderly people. However, the TMT has not been established or widely used among PD patients in Korea. The purpose of this study was to evaluate the reliability and validity of the Korean version of the TMT for PD patients. Twenty-four patients diagnosed with PD were enrolled in this study. For the interrater reliability test, thirteen clinicians scored the TMT after watching a video clip. We also used the test-retest method to determine intrarater reliability. For concurrent validation, the unified Parkinson's disease rating scale, Hoehn and Yahr staging, Berg Balance Scale, Timed-Up and Go test, 10-m walk test, and gait analysis by three-dimensional motion capture were also used. We analyzed receiver operating characteristic curve to predict falling. The interrater reliability and intrarater reliability of the Korean Tinetti balance scale were 0.97 and 0.98, respectively. The interrater reliability and intra-rater reliability of the Korean Tinetti gait scale were 0.94 and 0.96, respectively. The Korean TMT scores were significantly correlated with the other clinical scales and three-dimensional motion capture. The cutoff values for predicting falling were 14 points (balance subscale) and 10 points (gait subscale). We found that the Korean version of the TMT showed excellent validity and reliability for gait and balance and had high sensitivity and specificity for predicting falls among patients with PD.
Lemeunier, Nadège; da Silva-Oolup, S; Chow, N; Southerst, D; Carroll, L; Wong, J J; Shearer, H; Mastragostino, P; Cox, J; Côté, E; Murnaghan, K; Sutton, D; Côté, P
2017-09-01
To determine the reliability and validity of clinical tests to assess the anatomical integrity of the cervical spine in adults with neck pain and its associated disorders. We updated the systematic review of the 2000-2010 Bone and Joint Decade Task Force on Neck Pain and its Associated Disorders. We also searched the literature to identify studies on the reliability and validity of Doppler velocimetry for the evaluation of cervical arteries. Two independent reviewers screened and critically appraised studies. We conducted a best evidence synthesis of low risk of bias studies and ranked the phases of investigations using the classification proposed by Sackett and Haynes. We screened 9022 articles and critically appraised 8 studies; all 8 studies had low risk of bias (three reliability and five validity Phase II-III studies). Preliminary evidence suggests that the extension-rotation test may be reliable and has adequate validity to rule out pain arising from facet joints. The evidence suggests variable reliability and preliminary validity for the evaluation of cervical radiculopathy including neurological examination (manual motor testing, dermatomal sensory testing, deep tendon reflexes, and pathological reflex testing), Spurling's and the upper limb neurodynamic tests. No evidence was found for doppler velocimetry. Little evidence exists to support the use of clinical tests to evaluate the anatomical integrity of the cervical spine in adults with neck pain and its associated disorders. We found preliminary evidence to support the use of the extension-rotation test, neurological examination, Spurling's and the upper limb neurodynamic tests.
RELIABILITY AND VALIDITY OF A BIOMECHANICALLY BASED ANALYSIS METHOD FOR THE TENNIS SERVE
Kibler, W. Ben; Lamborn, Leah; Smith, Belinda J.; English, Tony; Jacobs, Cale; Uhl, Tim L.
2017-01-01
Background An observational tennis serve analysis (OTSA) tool was developed using previously established body positions from three-dimensional kinematic motion analysis studies. These positions, defined as nodes, have been associated with efficient force production and minimal joint loading. However, the tool has yet to be examined scientifically. Purpose The primary purpose of this investigation was to determine the inter-observer reliability for each node between two health care professionals (HCPs) that developed the OTSA, and secondarily to investigate the validity of the OTSA. Methods Two separate studies were performed to meet these objectives. An inter-observer reliability study preceded the validity study by examining 28 videos of players serving. Two HCPs graded each video and scored the presence or absence of obtaining each node. Discriminant validity was determined in 33 tennis players using video taped records of three first serves. Serve mechanics were graded using the OSTA and categorized players into those with good ( ≥ 5) and poor ( ≤ 4) mechanics. Participants performed a series of field tests to evaluate trunk flexibility, lower extremity and trunk power, and dynamic balance. Results The group with good mechanics demonstrated greater backward trunk flexibility (p=0.02), greater rotational power (p=0.02), and higher single leg countermovement jump (p=0.05). Reliability of the OTSA ranged from K = 0.36-1.0, with the majority of all the nodes displaying substantial reliability (K>0.61). Conclusion This study provides HCPs with a valid and reliable field tool used to assess serve mechanics. Physical characteristics of trunk mobility and power appear to discriminate serve mechanics between players. Future intervention studies are needed to determine if improvement in physical function contribute to improved serve mechanics. Level of Evidence 3 PMID:28593098
NASA Technical Reports Server (NTRS)
Feldstein, J. F.
1977-01-01
Failure data from 16 commercial spacecraft were analyzed to evaluate failure trends, reliability growth, and effectiveness of tests. It was shown that the test programs were highly effective in ensuring a high level of in-orbit reliability. There was only a single catastrophic problem in 44 years of in-orbit operation on 12 spacecraft. The results also indicate that in-orbit failure rates are highly correlated with unit and systems test failure rates. The data suggest that test effectiveness estimates can be used to guide the content of a test program to ensure that in-orbit reliability goals are achieved.
Reliability analysis of interdependent lattices
NASA Astrophysics Data System (ADS)
Limiao, Zhang; Daqing, Li; Pengju, Qin; Bowen, Fu; Yinan, Jiang; Zio, Enrico; Rui, Kang
2016-06-01
Network reliability analysis has drawn much attention recently due to the risks of catastrophic damage in networked infrastructures. These infrastructures are dependent on each other as a result of various interactions. However, most of the reliability analyses of these interdependent networks do not consider spatial constraints, which are found important for robustness of infrastructures including power grid and transport systems. Here we study the reliability properties of interdependent lattices with different ranges of spatial constraints. Our study shows that interdependent lattices with strong spatial constraints are more resilient than interdependent Erdös-Rényi networks. There exists an intermediate range of spatial constraints, at which the interdependent lattices have minimal resilience.
Study of Fuze Structure and Reliability Design Based on the Direct Search Method
NASA Astrophysics Data System (ADS)
Lin, Zhang; Ning, Wang
2017-03-01
Redundant design is one of the important methods to improve the reliability of the system, but mutual coupling of multiple factors is often involved in the design. In my study, Direct Search Method is introduced into the optimum redundancy configuration for design optimization, in which, the reliability, cost, structural weight and other factors can be taken into account simultaneously, and the redundant allocation and reliability design of aircraft critical system are computed. The results show that this method is convenient and workable, and applicable to the redundancy configurations and optimization of various designs upon appropriate modifications. And this method has a good practical value.
Hu, Zhi-Jun; He, Jian; Zhao, Feng-Dong; Fang, Xiang-Qian; Zhou, Li-Na; Fan, Shun-Wu
2011-06-01
A reliability study was conducted. To estimate the intra- and intermeasurement errors in the measurements of functional cross-sectional area (FCSA), density, and T2 signal intensity of paraspinal muscles using computed tomography (CT) scan and magnetic resonance imaging (MRI). CT scan and MRI had been used widely to measure the cross-sectional area and degeneration of the back muscles in spine and muscle research. But there is still no systemic study to analyze the reliability of these measurements. This study measured the FCSA and fatty infiltration (density on CT scan and T2 signal intensity on MRI) of the paraspinal muscles at L3-L4, L4-L5, and L5-S1 in 29 patients with chronic low back pain. Two experienced musculoskeletal radiologists and one superior spine surgeon traced the region of interest twice within 3 weeks for measurement of the intra- and interobserver reliability. The intraclass correlation coefficients (ICCs) of the intra-reliability ranged from fair to excellent for FCSA, and good to excellent for fatty infiltration. The ICCs of the inter-reliability ranged from fair to excellent for FCSA, and good to excellent for fatty infiltration. There were no significant differences between CT scan and MRI in reliability results, except in the relative standard error of fatty infiltration measurement. The ICCs of the FCSA measurement between CT scan and MRI ranged from poor to good. The reliabilities of the CT scan and MRI for measuring the FCSA and fatty infiltration of the atrophied lumbar paraspinal muscles were acceptable. It was reliable for using uniform one image method for a single paraspinal muscle evaluation study. And the authors preferred to advise the MRI other than CT scan for paraspinal muscles measurements of FCSA and fatty infiltration.
Clinical instruments: reliability and validity critical appraisal.
Brink, Yolandi; Louw, Quinette A
2012-12-01
RATIONALE, AIM AND OBJECTIVES: There is a lack of health care practitioners using objective clinical tools with sound psychometric properties. There is also a need for researchers to improve their reporting of the validity and reliability results of these clinical tools. Therefore, to promote the use of valid and reliable tools or tests for clinical evaluation, this paper reports on the development of a critical appraisal tool to assess the psychometric properties of objective clinical tools. A five-step process was followed to develop the new critical appraisal tool: (1) preliminary conceptual decisions; (2) defining key concepts; (3) item generation; (4) assessment of face validity; and (5) formulation of the final tool. The new critical appraisal tool consists of 13 items, of which five items relate to both validity and reliability studies, four items to validity studies only and four items to reliability studies. The 13 items could be scored as 'yes', 'no' or 'not applicable'. This critical appraisal tool will aid both the health care practitioner to critically appraise the relevant literature and researchers to improve the quality of reporting of the validity and reliability of objective clinical tools. © 2011 Blackwell Publishing Ltd.
Reliability of temporal summation and diffuse noxious inhibitory control
Cathcart, Stuart; Winefield, Anthony H; Rolan, Paul; Lushington, Kurt
2009-01-01
BACKGROUND: The test-retest reliability of temporal summation (TS) and diffuse noxious inhibitory control (DNIC) has not been reported to date. Establishing such reliability would support the possibility of future experimental studies examining factors affecting TS and DNIC. Similarly, the use of manual algometry to induce TS, or an occlusion cuff to induce DNIC of TS to mechanical stimuli, has not been reported to date. Such devices may offer a simpler method than current techniques for inducing TS and DNIC, affording assessment at more anatomical locations and in more varied research settings. METHOD: The present study assessed the test-retest reliability of TS and DNIC using the above techniques. Sex differences on these measures were also investigated. RESULTS: Repeated measures ANOVA indicated successful induction of TS and DNIC, with no significant differences across test-retest occasions. Sex effects were not significant for any measure or interaction. Intraclass correlations indicated high test-retest reliability for all measures; however, there was large interindividual variation between test and retest measurements. CONCLUSION: The present results indicate acceptable within-session test-retest reliability of TS and DNIC. The results support the possibility of future experimental studies examining factors affecting TS and DNIC. PMID:20011713
Nazary-Moghadam, Salman; Zeinalzadeh, Afsaneh; Salavati, Mahyar; Almasi, Simin; Negahban, Hossein
2017-01-01
The aim of the present study was to culturally adapt and evaluate reliability and validity of Health Assessment Questionnaire-Disability Index (HAQ-DI) in Iranian patients with rheumatoid arthritis (RA). 234 patients with RA for validation study, Eighty-six participants for reliability study. Test-retest relative reliability and internal consistency of Persian version of HAQ-DI were examined by intraclass correlation coefficient (ICC) and Cronbach's alpha, respectively. Additionally, HAQ-DI construct validity (Spearman's correlation) was examined using Persian version of Short-Form 36 Health survey (SF-36), activity and severity parameters. Persian version of HAQ-DI total score showed excellent test-retest reliability (ICC = 0.98) and internal consistency (Cronbach's alpha = 0.95). Spearman's correlations between the total PHAQ-DI score and activity and severity parameters were above 0.55. Correlation between PHAQ-DI and SF-36 Physical Health were higher as compared with SF-36 Mental Health. Persian version of HAQ-DI is a reliable and valid culturally-adapted instrument in order to measure functional limitations in Iranian people with RA. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bosakova, Lucia; Kolarcik, Peter; Bobakova, Daniela; Sulcova, Martina; Van Dijk, Jitse P; Reijneveld, Sijmen A; Geckova, Andrea Madarasova
2016-04-01
Participation in organized activities is related with a range of positive outcomes, but the way such participation is measured has not been scrutinized. Test-retest reliability as an important indicator of a scale's reliability has been assessed rarely and for "The scale of participation in organized activities" lacks completely. This test-retest study is based on the Health Behaviour in School-aged Children study and is consistent with its methodology. We obtained data from 353 Czech (51.9 % boys) and 227 Slovak (52.9 % boys) primary school pupils, grades five and nine, who participated in this study in 2013. We used Cohen's kappa statistic and single measures of the intraclass correlation coefficient to estimate the test-retest reliability of all selected items in the sample, stratified by gender, age and country. We mostly observed a large correlation between the test and retest in all of the examined variables (κ ranged from 0.46 to 0.68). Test-retest reliability of the sum score of individual items showed substantial agreement (ICC = 0.64). The scale of participation in organized activities has an acceptable level of agreement, indicating good reliability.
Ertuğ, Nurcan
2018-06-01
The aim of this study was to determine the validity and reliability of the Turkish version of the V-scale, which measures nurses' attitudes towards vital signs monitoring in the detection of clinical deterioration. This validity and reliability study was conducted at a tertiary hospital in Ankara, Turkey, in 2016. A total of 169 ward nurses participated in the study. Exploratory factor analysis, Cronbach's alpha coefficient, and the intraclass correlation coefficient were used to determine the validity and reliability of the scale. A 5-factor, 16-item scale explained 60.823% of the total variance according to the validity analysis. Our version matched the original scale in terms of the number of items and factor structure. Cronbach's alpha coefficient of the Turkish version of the V-scale was 0.764. The test-retest reliability results were 0.855 for the overall intraclass correlation coefficient, and the t-test result was P > 0.05. The V-scale is a reliable and valid instrument to measure Turkish nurses' attitudes towards vital signs monitoring in the detection of clinical deterioration. © 2018 John Wiley & Sons Australia, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roach, Mack, E-mail: mroach@radonc.ucsf.edu; Ceron Lizarraga, Tania L.; Lazar, Ann A.
Purpose: The optimal treatment of clinically localized prostate cancer is controversial. Most studies focus on biochemical (PSA) failure when comparing radical prostatectomy (RP) with radiation therapy (RT), but this endpoint has not been validated as predictive of overall survival (OS) or cause-specific survival (CSS). We analyzed the available literature to determine whether reliable conclusions could be made concerning the effectiveness of RP compared with RT with or without androgen deprivation therapy (ADT), assuming current treatment standards. Methods: Articles published between February 29, 2004, and March 1, 2015, that compared OS and CSS after RP or RT with or without ADTmore » were included. Because the GRADE (Grading of Recommendations, Assessment, Development, and Evaluation) system emphasis is on randomized controlled clinical trials, a reliability score (RS) was explored to further understand the issues associated with the study quality of observational studies, including appropriateness of treatment, source of data, clinical characteristics, and comorbidity. Lower RS values indicated lower reliability. Results: Fourteen studies were identified, and 13 were completely evaluable. Thirteen of the 14 studies (93%) were observational studies with low-quality evidence. The median RS was 12 (range, 5-18); the median difference in 10-year OS and CSS favored RP over RT: 10% and 4%, respectively. In studies with a RS ≤12 (average RS 9) the 10-year OS and CSS median differences were 17% and 6%, respectively. For studies with a RS >12 (average RS 15.5), the 10-year OS and CSS median differences were 5.5% and 1%, respectively. Thus, we observed an association between low RS and a higher percentage difference in OS and CSS. Conclusions: Reliable evidence that RP provides a superior CSS to RT with ADT is lacking. The most reliable studies suggest that the differences in 10-year CSS between RP and RT are small, possibly <1%.« less
Sample size requirements for the design of reliability studies: precision consideration.
Shieh, Gwowen
2014-09-01
In multilevel modeling, the intraclass correlation coefficient based on the one-way random-effects model is routinely employed to measure the reliability or degree of resemblance among group members. To facilitate the advocated practice of reporting confidence intervals in future reliability studies, this article presents exact sample size procedures for precise interval estimation of the intraclass correlation coefficient under various allocation and cost structures. Although the suggested approaches do not admit explicit sample size formulas and require special algorithms for carrying out iterative computations, they are more accurate than the closed-form formulas constructed from large-sample approximations with respect to the expected width and assurance probability criteria. This investigation notes the deficiency of existing methods and expands the sample size methodology for the design of reliability studies that have not previously been discussed in the literature.
Hadi, Azlihanis Abdul; Naing, Nyi Nyi; Daud, Aziah; Nordin, Rusli
2006-11-01
This study was conducted to assess the reliability and construct validity of the Malay version of Job Content Questionnaire (JCQ) among secondary school teachers in Kota Bharu, Kelantan. A total of 68 teachers consented to participate in the study and were administered the Malay version of JCQ. Reliability was determined using Cronbach's alpha for internal consistency whilst construct validity was assessed using factor analysis. The results indicated that Cronbach's alpha coefficients revealed decision latitude (0.75), psychological job demand (0.50) and social support (0.84). Factor analysis showed three meaningful common factors that could explain the construct of Karasek's demand-control-social support model. The study suggests the JCQ scales are reliable and valid tools for assessing job stress in school teachers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamachi La Commare, Kristina
Metrics for reliability, such as the frequency and duration of power interruptions, have been reported by electric utilities for many years. This study examines current utility practices for collecting and reporting electricity reliability information and discusses challenges that arise in assessing reliability because of differences among these practices. The study is based on reliability information for year 2006 reported by 123 utilities in 37 states representing over 60percent of total U.S. electricity sales. We quantify the effects that inconsistencies among current utility reporting practices have on comparisons of System Average Interruption Duration Index (SAIDI) and System Average Interruption Frequency Indexmore » (SAIFI) reported by utilities. We recommend immediate adoption of IEEE Std. 1366-2003 as a consistent method for measuring and reporting reliability statistics.« less
An, Hyeong Su; Moon, Won-Jin; Ryu, Jae-Kyun; Park, Ju Yeon; Yun, Won Sung; Choi, Jin Woo; Jahng, Geon-Ho; Park, Jang-Yeon
2017-12-01
This prospective multi-center study aimed to evaluate the inter-vendor and test-retest reliabilities of resting-state functional magnetic resonance imaging (RS-fMRI) by assessing the temporal signal-to-noise ratio (tSNR) and functional connectivity. Study included 10 healthy subjects and each subject was scanned using three 3T MR scanners (GE Signa HDxt, Siemens Skyra, and Philips Achieva) in two sessions. The tSNR was calculated from the time course data. Inter-vendor and test-retest reliabilities were assessed with intra-class correlation coefficients (ICCs) derived from variant component analysis. Independent component analysis was performed to identify the connectivity of the default-mode network (DMN). In result, the tSNR for the DMN was not significantly different among the GE, Philips, and Siemens scanners (P=0.638). In terms of vendor differences, the inter-vendor reliability was good (ICC=0.774). Regarding the test-retest reliability, the GE scanner showed excellent correlation (ICC=0.961), while the Philips (ICC=0.671) and Siemens (ICC=0.726) scanners showed relatively good correlation. The DMN pattern of the subjects between the two sessions for each scanner and between three scanners showed the identical patterns of functional connectivity. The inter-vendor and test-retest reliabilities of RS-fMRI using different 3T MR scanners are good. Thus, we suggest that RS-fMRI could be used in multicenter imaging studies as a reliable imaging marker. Copyright © 2017 Elsevier Inc. All rights reserved.
López-Pascual, Juan; Cáceres, Magda Liliana; De Rosario, Helios; Page, Álvaro
2016-02-08
The reliability of joint rotation measurements is an issue of major interest, especially in clinical applications. The effect of instrumental errors and soft tissue artifacts on the variability of human motion measures is well known, but the influence of the representation of joint motion has not yet been studied. The aim of the study was to compare the within-subject reliability of three rotation formalisms for the calculation of the shoulder elevation joint angles. Five repetitions of humeral elevation in the scapular plane of 27 healthy subjects were recorded using a stereophotogrammetry system. The humerothoracic joint angles were calculated using the YX'Y" and XZ'Y" Euler angle sequences and the attitude vector. A within-subject repeatability study was performed for the three representations. ICC, SEM and CV were the indices used to estimate the error in the calculation of the angle amplitudes and the angular waveforms with each method. Excellent results were obtained in all representations for the main angle (elevation), but there were remarkable differences for axial rotation and plane of elevation. The YX'Y" sequence generally had the poorest reliability in the secondary angles. The XZ'Y' sequence proved to be the most reliable representation of axial rotation, whereas the attitude vector had the highest reliability in the plane of elevation. These results highlight the importance of selecting the method used to describe the joint motion when within-subjects reliability is an important issue of the experiment. This may be of particular importance when the secondary angles of motions are being studied. Copyright © 2016 Elsevier Ltd. All rights reserved.
Becker, Anne E.; Roberts, Andrea L.; Perloe, Alexandra; Bainivualiku, Asenaca; Richards, Lauren K.; Gilman, Stephen E.; Striegel-Moore, Ruth H.
2010-01-01
Objective The Global School-based Student Health Survey (GSHS) is an assessment for adolescent health risk behaviors and exposures, supported by the World Health Organization. Although already widely implemented—and intended for youth assessment across diverse ethnic and national contexts—no reliability data have yet been reported for GSHS-based assessment in any ethnicity or country-specific population. This study reports test-retest reliability for GSHS content adapted for a female adolescent ethnic Fijian study sample in Fiji. Design We adapted and translated GSHS content to assess health risk behaviors as part of a larger study investigating the impact of social transition on ethnic Fijian secondary schoolgirls in Fiji. In order to evaluate the performance of this measure for our ethnic Fijian study sample (n=523), we examined its test-retest reliability with kappa coefficients, % agreement, and prevalence estimates in a sub-sample (n=81). Reliability among strata defined by topic, age, and language was also examined. Results Average agreement between test and retest was 77%, and average Cohen's kappa was 0.47. Mean kappas for questions from core modules about alcohol use, tobacco use, and sexual behavior were substantial, and higher than those for modules relating to other risk behaviors. Conclusions Although test-retest reliability of responses within this country-specific version of GSHS content was substantial in several topical domains for this ethnic Fijian sample, only fair reliability for the module assessing dietary behaviors and other individual items suggests that population-specific psychometric evaluation is essential to interpreting language and country-specific GSHS data. PMID:20234961
Helmerhorst, Hendrik J F; Brage, Søren; Warren, Janet; Besson, Herve; Ekelund, Ulf
2012-08-31
Physical inactivity is one of the four leading risk factors for global mortality. Accurate measurement of physical activity (PA) and in particular by physical activity questionnaires (PAQs) remains a challenge. The aim of this paper is to provide an updated systematic review of the reliability and validity characteristics of existing and more recently developed PAQs and to quantitatively compare the performance between existing and newly developed PAQs.A literature search of electronic databases was performed for studies assessing reliability and validity data of PAQs using an objective criterion measurement of PA between January 1997 and December 2011. Articles meeting the inclusion criteria were screened and data were extracted to provide a systematic overview of measurement properties. Due to differences in reported outcomes and criterion methods a quantitative meta-analysis was not possible.In total, 31 studies testing 34 newly developed PAQs, and 65 studies examining 96 existing PAQs were included. Very few PAQs showed good results on both reliability and validity. Median reliability correlation coefficients were 0.62-0.71 for existing, and 0.74-0.76 for new PAQs. Median validity coefficients ranged from 0.30-0.39 for existing, and from 0.25-0.41 for new PAQs.Although the majority of PAQs appear to have acceptable reliability, the validity is moderate at best. Newly developed PAQs do not appear to perform substantially better than existing PAQs in terms of reliability and validity. Future PAQ studies should include measures of absolute validity and the error structure of the instrument.
Chen, Hong-Lin; Cao, Ying-Juan; Zhang, Wei; Wang, Jing; Huai, Bao-Sha
2017-02-01
The inter-rater reliability of Braden Scale is not so good. We modified the Braden(ALB) scale by defining nutrition subscale based on serum albumin, then assessed it's the validity and reliability in hospital patients. We designed a retrospective study for validity analysis, and a prospective study for reliability analysis. Receiver operating curve (ROC) and area under the curve (AUC) were used to evaluate the predictive validity. Intra-class correlation coefficient (ICC) was used to investigate the inter-rater reliability. Two thousand five hundred twenty-five patients were included for validity analysis, 76 patients (3.0%) developed pressure ulcer. Positive correlation was found between serum albumin and nutrition score in Braden scale (Spearman's coefficient 0.2203, P<0.0001). The AUCs for Braden scale and Braden(ALB) scale predicting pressure ulcer risk were 0.813 (95% CI 0.797-0.828; P<0.0001), and 0.859 (95% CI 0.845-0.872; P<0.0001), respectively. The Braden(ALB) scale was even more valid than the Braden scale (z=1.860, P=0.0628). In different age subgroups, the Braden(ALB) scale seems also more valid than the original Braden scale, but no statistically significant differences were found (P>0.05). The inter-rater reliability study showed the ICC-value for nutrition increased 45.9%, and increased 4.3% for total score. The Braden(ALB) scale has similar validity compared with the original Braden scale for in hospital patients. However, the inter-rater reliability was significantly increased. Copyright © 2016 Elsevier Inc. All rights reserved.
2012-01-01
Physical inactivity is one of the four leading risk factors for global mortality. Accurate measurement of physical activity (PA) and in particular by physical activity questionnaires (PAQs) remains a challenge. The aim of this paper is to provide an updated systematic review of the reliability and validity characteristics of existing and more recently developed PAQs and to quantitatively compare the performance between existing and newly developed PAQs. A literature search of electronic databases was performed for studies assessing reliability and validity data of PAQs using an objective criterion measurement of PA between January 1997 and December 2011. Articles meeting the inclusion criteria were screened and data were extracted to provide a systematic overview of measurement properties. Due to differences in reported outcomes and criterion methods a quantitative meta-analysis was not possible. In total, 31 studies testing 34 newly developed PAQs, and 65 studies examining 96 existing PAQs were included. Very few PAQs showed good results on both reliability and validity. Median reliability correlation coefficients were 0.62–0.71 for existing, and 0.74–0.76 for new PAQs. Median validity coefficients ranged from 0.30–0.39 for existing, and from 0.25–0.41 for new PAQs. Although the majority of PAQs appear to have acceptable reliability, the validity is moderate at best. Newly developed PAQs do not appear to perform substantially better than existing PAQs in terms of reliability and validity. Future PAQ studies should include measures of absolute validity and the error structure of the instrument. PMID:22938557
DOT National Transportation Integrated Search
2003-12-01
This study explores the on-time reliability benefits to potential users of a personalized advanced traveler information system (ATIS) providing real-time pre-trip roadway information for the Seattle morning peak period through the application of Heur...
Meta-Analysis of Coefficient Alpha
ERIC Educational Resources Information Center
Rodriguez, Michael C.; Maeda, Yukiko
2006-01-01
The meta-analysis of coefficient alpha across many studies is becoming more common in psychology by a methodology labeled reliability generalization. Existing reliability generalization studies have not used the sampling distribution of coefficient alpha for precision weighting and other common meta-analytic procedures. A framework is provided for…
ShahAli, Shabnam; Arab, Amir Massoud; Talebian, Saeed; Ebrahimi, Esmaeil; Bahmani, Andia; Karimi, Noureddin; Nabavi, Hoda
2015-07-01
The study was designed to evaluate the intra-examiner reliability of ultrasound (US) thickness measurement of abdominal muscles activity when supine lying and during two isometric endurance tests in subjects with and without Low back pain (LBP). A total of 19 women (9 with LBP, 10 without LBP) participated in the study. Within-day reliability of the US thickness measurements at supine lying and the two isometric endurance tests were assessed in all subjects. The intra-class correlation coefficient (ICC) was used to assess the relative reliability of thickness measurement. The standard error of measurement (SEM), minimal detectable change (MDC) and the coefficient of variation (CV) were used to evaluate the absolute reliability. Results indicated high ICC scores (0.73-0.99) and also small SEM and MDC scores for within-day reliability assessment. The Bland-Altman plots of agreement in US measurement of the abdominal muscles during the two isometric endurance tests demonstrated that 95% of the observations fall between the limits of agreement for test and retest measurements. Together the results indicate high intra-tester reliability for the US measurement of the thickness of abdominal muscles in all the positions tested. According to the study's findings, US imaging can be used as a reliable method for assessment of abdominal muscles activity in supine lying and the two isometric endurance tests employed, in participants with and without LBP. Copyright © 2014 Elsevier Ltd. All rights reserved.
Santelmann, Hanno; Franklin, Jeremy; Bußhoff, Jana; Baethge, Christopher
2016-10-01
Schizoaffective disorder is a common diagnosis in clinical practice but its nosological status has been subject to debate ever since it was conceptualized. Although it is key that diagnostic reliability is sufficient, schizoaffective disorder has been reported to have low interrater reliability. Evidence based on systematic review and meta-analysis methods, however, is lacking. Using a highly sensitive literature search in Medline, Embase, and PsycInfo we identified studies measuring the interrater reliability of schizoaffective disorder in comparison to schizophrenia, bipolar disorder, and unipolar disorder. Out of 4126 records screened we included 25 studies reporting on 7912 patients diagnosed by different raters. The interrater reliability of schizoaffective disorder was moderate (meta-analytic estimate of Cohen's kappa 0.57 [95% CI: 0.41-0.73]), and substantially lower than that of its main differential diagnoses (difference in kappa between 0.22 and 0.19). Although there was considerable heterogeneity, analyses revealed that the interrater reliability of schizoaffective disorder was consistently lower in the overwhelming majority of studies. The results remained robust in subgroup and sensitivity analyses (e.g., diagnostic manual used) as well as in meta-regressions (e.g., publication year) and analyses of publication bias. Clinically, the results highlight the particular importance of diagnostic re-evaluation in patients diagnosed with schizoaffective disorder. They also quantify a widely held clinical impression of lower interrater reliability and agree with earlier meta-analysis reporting low test-retest reliability. Copyright © 2016. Published by Elsevier B.V.
Reliability of TMS metrics in patients with chronic incomplete spinal cord injury.
Potter-Baker, K A; Janini, D P; Frost, F S; Chabra, P; Varnerin, N; Cunningham, D A; Sankarasubramanian, V; Plow, E B
2016-11-01
Test-retest reliability analysis in individuals with chronic incomplete spinal cord injury (iSCI). The purpose of this study was to examine the reliability of neurophysiological metrics acquired with transcranial magnetic stimulation (TMS) in individuals with chronic incomplete tetraplegia. Cleveland Clinic Foundation, Cleveland, Ohio, USA. TMS metrics of corticospinal excitability, output, inhibition and motor map distribution were collected in muscles with a higher MRC grade and muscles with a lower MRC grade on the more affected side of the body. Metrics denoting upper limb function were also collected. All metrics were collected at two sessions separated by a minimum of two weeks. Reliability between sessions was determined using Spearman's correlation coefficients and concordance correlation coefficients (CCCs). We found that TMS metrics that were acquired in higher MRC grade muscles were approximately two times more reliable than those collected in lower MRC grade muscles. TMS metrics of motor map output, however, demonstrated poor reliability regardless of muscle choice (P=0.34; CCC=0.51). Correlation analysis indicated that patients with more baseline impairment and/or those in a more chronic phase of iSCI demonstrated greater variability of metrics. In iSCI, reliability of TMS metrics varies depending on the muscle grade of the tested muscle. Variability is also influenced by factors such as baseline motor function and time post SCI. Future studies that use TMS metrics in longitudinal study designs to understand functional recovery should be cautious as choice of muscle and clinical characteristics can influence reliability.
Reliability of movement control tests in the lumbar spine
Luomajoki, Hannu; Kool, Jan; de Bruin, Eling D; Airaksinen, Olavi
2007-01-01
Background Movement control dysfunction [MCD] reduces active control of movements. Patients with MCD might form an important subgroup among patients with non specific low back pain. The diagnosis is based on the observation of active movements. Although widely used clinically, only a few studies have been performed to determine the test reliability. The aim of this study was to determine the inter- and intra-observer reliability of movement control dysfunction tests of the lumbar spine. Methods We videoed patients performing a standardized test battery consisting of 10 active movement tests for motor control in 27 patients with non specific low back pain and 13 patients with other diagnoses but without back pain. Four physiotherapists independently rated test performances as correct or incorrect per observation, blinded to all other patient information and to each other. The study was conducted in a private physiotherapy outpatient practice in Reinach, Switzerland. Kappa coefficients, percentage agreements and confidence intervals for inter- and intra-rater results were calculated. Results The kappa values for inter-tester reliability ranged between 0.24 – 0.71. Six tests out of ten showed a substantial reliability [k > 0.6]. Intra-tester reliability was between 0.51 – 0.96, all tests but one showed substantial reliability [k > 0.6]. Conclusion Physiotherapists were able to reliably rate most of the tests in this series of motor control tasks as being performed correctly or not, by viewing films of patients with and without back pain performing the task. PMID:17850669
Intraday and Interday Reliability of Ultra-Short-Term Heart Rate Variability in Rugby Union Players.
Nakamura, Fábio Y; Pereira, Lucas A; Esco, Michael R; Flatt, Andrew A; Moraes, José E; Cal Abad, Cesar C; Loturco, Irineu
2017-02-01
Nakamura, FY, Pereira, LA, Esco, MR, Flatt, AA, Moraes, JE, Cal Abad, CC, and Loturco, I. Intraday and interday reliability of ultra-short-term heart rate variability in rugby union players. J Strength Cond Res 31(2): 548-551, 2017-The aim of this study was to examine the intraday and interday reliability of ultra-short-term vagal-related heart rate variability (HRV) in elite rugby union players. Forty players from the Brazilian National Rugby Team volunteered to participate in this study. The natural log of the root mean square of successive RR interval differences (lnRMSSD) assessments were performed on 4 different days. The HRV was assessed twice (intraday reliability) on the first day and once per day on the following 3 days (interday reliability). The RR interval recordings were obtained from 2-minute recordings using a portable heart rate monitor. The relative reliability of intraday and interday lnRMSSD measures was analyzed using the intraclass correlation coefficient (ICC). The typical error of measurement (absolute reliability) of intraday and interday lnRMSSD assessments was analyzed using the coefficient of variation (CV). Both intraday (ICC = 0.96; CV = 3.99%) and interday (ICC = 0.90; CV = 7.65%) measures were highly reliable. The ultra-short-term lnRMSSD is a consistent measure for evaluating elite rugby union players, in both intraday and interday settings. This study provides further validity to using this shortened method in practical field conditions with highly trained team sports athletes.
Reliability analysis of a sensitive and independent stabilometry parameter set
Nagymáté, Gergely; Orlovits, Zsanett
2018-01-01
Recent studies have suggested reduced independent and sensitive parameter sets for stabilometry measurements based on correlation and variance analyses. However, the reliability of these recommended parameter sets has not been studied in the literature or not in every stance type used in stabilometry assessments, for example, single leg stances. The goal of this study is to evaluate the test-retest reliability of different time-based and frequency-based parameters that are calculated from the center of pressure (CoP) during bipedal and single leg stance for 30- and 60-second measurement intervals. Thirty healthy subjects performed repeated standing trials in a bipedal stance with eyes open and eyes closed conditions and in a single leg stance with eyes open for 60 seconds. A force distribution measuring plate was used to record the CoP. The reliability of the CoP parameters was characterized by using the intraclass correlation coefficient (ICC), standard error of measurement (SEM), minimal detectable change (MDC), coefficient of variation (CV) and CV compliance rate (CVCR). Based on the ICC, SEM and MDC results, many parameters yielded fair to good reliability values, while the CoP path length yielded the highest reliability (smallest ICC > 0.67 (0.54–0.79), largest SEM% = 19.2%). Usually, frequency type parameters and extreme value parameters yielded poor reliability values. There were differences in the reliability of the maximum CoP velocity (better with 30 seconds) and mean power frequency (better with 60 seconds) parameters between the different sampling intervals. PMID:29664938
Reliability analysis of a sensitive and independent stabilometry parameter set.
Nagymáté, Gergely; Orlovits, Zsanett; Kiss, Rita M
2018-01-01
Recent studies have suggested reduced independent and sensitive parameter sets for stabilometry measurements based on correlation and variance analyses. However, the reliability of these recommended parameter sets has not been studied in the literature or not in every stance type used in stabilometry assessments, for example, single leg stances. The goal of this study is to evaluate the test-retest reliability of different time-based and frequency-based parameters that are calculated from the center of pressure (CoP) during bipedal and single leg stance for 30- and 60-second measurement intervals. Thirty healthy subjects performed repeated standing trials in a bipedal stance with eyes open and eyes closed conditions and in a single leg stance with eyes open for 60 seconds. A force distribution measuring plate was used to record the CoP. The reliability of the CoP parameters was characterized by using the intraclass correlation coefficient (ICC), standard error of measurement (SEM), minimal detectable change (MDC), coefficient of variation (CV) and CV compliance rate (CVCR). Based on the ICC, SEM and MDC results, many parameters yielded fair to good reliability values, while the CoP path length yielded the highest reliability (smallest ICC > 0.67 (0.54-0.79), largest SEM% = 19.2%). Usually, frequency type parameters and extreme value parameters yielded poor reliability values. There were differences in the reliability of the maximum CoP velocity (better with 30 seconds) and mean power frequency (better with 60 seconds) parameters between the different sampling intervals.
Development of Internet-Based Tasks for the Executive Function Performance Test.
Rand, Debbie; Lee Ben-Haim, Keren; Malka, Rachel; Portnoy, Sigal
The Executive Function Performance Test (EFPT) is a reliable and valid performance-based tool to assess executive functions (EFs). This study's objective was to develop and verify two Internet-based tasks for the EFPT. A cross-sectional study assessed the alternate-form reliability of the Internet-based bill-paying and telephone-use tasks in healthy adults and people with subacute stroke (Study 1). It also sought to establish the tasks' criterion reliability for assessing EF deficits by correlating performance with that on the Trail Making Test in five groups: healthy young adults, healthy older adults, people with subacute stroke, people with chronic stroke, and young adults with attention deficit hyperactivity disorder (Study 2). The alternative-form reliability and initial construct validity for the Internet-based bill-paying task were verified. Criterion validity was established for both tasks. The Internet-based tasks are comparable to the original EFPT tasks and can be used for assessment of EF deficits. Copyright © 2018 by the American Occupational Therapy Association, Inc.
Development and psychometric testing of the Mariani Nursing Career Satisfaction Scale.
Mariani, Bette; Allen, Lois Ryan
2014-01-01
The Mariani Nursing Career Satisfaction Scale (MNCSS) was developed to explore the influence of mentoring on career satisfaction of registered nurses (RNs). A review of the literature revealed no contemporary valid and reliable measure of career satisfaction. The MNCSS is a semantic differential of 16 opposite adjective pairs on which participants rate feelings about their nursing career. The MNCSS was used in a pilot study and three major studies exploring career satisfaction of RNs. Validity, reliability, and exploratory factor analysis (FA) were computed to explore the internal structure of the instrument. The newly developed instrument had a content validity index (CVI) of .84 and Cronbach's alpha internal consistency reliabilities of .93-.96 across three major studies. Exploratory FA (N = 496) revealed a univocal instrument with one factor that explains 57.8% of the variance in career satisfaction scores. The MNCSS is a valid and reliable instrument for measuring career satisfaction. FA of the combined data from three studies yielded one factor that measures the concept of career satisfaction.
ERIC Educational Resources Information Center
Dorn, Lorah D.; Sontag-Padilla, Lisa M.; Pabst, Stephanie; Tissot, Abbigail; Susman, Elizabeth J.
2013-01-01
Age at menarche is critical in research and clinical settings, yet there is a dearth of studies examining its reliability in adolescents. We examined age at menarche during adolescence, specifically, (a) average method reliability across 3 years, (b) test-retest reliability between time points and methods, (c) intraindividual variability of…
ERIC Educational Resources Information Center
Vacha-Haase, Tammi; Kogan, Lori R.; Tani, Crystal R.; Woodall, Renee A.
2001-01-01
Used reliability generalization to explore the variance of scores on 10 Minnesota Multiphasic Personality Inventory (MMPI) clinical scales drawing on 1,972 articles in the literature on the MMPI. Results highlight the premise that scores, not tests, are reliable or unreliable, and they show that study characteristics do influence scores on the…
ERIC Educational Resources Information Center
Pantzare, Anna Lind
2015-01-01
In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examines if teachers' ratings of national tests in mathematics can be reliable without using monitoring, training, or other methods of external quality…
Psychometric Inferences from a Meta-Analysis of Reliability and Internal Consistency Coefficients
ERIC Educational Resources Information Center
Botella, Juan; Suero, Manuel; Gambara, Hilda
2010-01-01
A meta-analysis of the reliability of the scores from a specific test, also called reliability generalization, allows the quantitative synthesis of its properties from a set of studies. It is usually assumed that part of the variation in the reliability coefficients is due to some unknown and implicit mechanism that restricts and biases the…
ERIC Educational Resources Information Center
Jones, Corinne A.; Hoffman, Matthew R.; Geng, Zhixian; Abdelhalim, Suzan M.; Jiang, Jack J.; McCulloch, Timothy M.
2014-01-01
Purpose: The purpose of this study was to investigate inter- and intrarater reliability among expert users, novice users, and speech-language pathologists with a semiautomated high-resolution manometry analysis program. We hypothesized that all users would have high intrarater reliability and high interrater reliability. Method: Three expert…
ERIC Educational Resources Information Center
Lung, For-Wey; Chiang, Tung-Liang; Lin, Shio-Jean; Feng, Jui-Ying; Chen, Po-Fei; Shu, Bih-Ching
2011-01-01
The parental report instrument is the most efficient developmental detection method and has shown high validity with professional assessment instruments. The reliability and validity of the Taiwan Birth Cohort Study (TBCS) 6-, 18- and 36-month scales have already been established. In this study, the reliability and validity of the 60-month scale…
ERIC Educational Resources Information Center
Nalbantoglu Yilmaz, Funda
2017-01-01
This study aims to determine the reliability of scores obtained from self-, peer-, and teacher-assessments in terms of teaching materials prepared by teacher candidates. The study group of this research constitutes 56 teacher candidates. In the scope of research, teacher candidates were asked to develop teaching material related to their study.…
Peer Bullying among High School Students: Turkish Version of Bullying Scale
ERIC Educational Resources Information Center
Arslan, Nihan
2017-01-01
The aim of study was to conduct the reliability and validity studies of the Turkish version of The Forms of Bullying Scale (FBS; Shaw at el., 2013). The Turkish form of the scale was applied on 357 high school students. Scale was examined by the reliability analysis and confirmatory factor analysis within the scope of the adaptation study.…
A Validity and Reliability Study of the Motivated Strategies for Learning Questionnaire
ERIC Educational Resources Information Center
Erturan Ilker, Gökçe; Arslan, Yunus; Demirhan, Giyasettin
2014-01-01
The aim of this study is to determine the validity and reliability of the Motivated Strategies for Learning Questionnaire (MSLQ) for high school students. In total, 1605 students (829 girls, 776 boys, average age = 15.67 ± 1.19) from three different high schools in the central district of Ankara voluntarily participated in the study. The MSLQ was…
ERIC Educational Resources Information Center
Dogan, C. Deha; Uluman, Müge
2017-01-01
The aim of this study was to determine the extent at which graded-category rating scales and rubrics contribute to inter-rater reliability. The research was designed as a correlational study. Study group consisted of 82 students attending sixth grade and three writing course teachers in a private elementary school. A performance task was…
Cyber Victim and Bullying Scale: A Study of Validity and Reliability
ERIC Educational Resources Information Center
Cetin, Bayram; Yaman, Erkan; Peker, Adem
2011-01-01
The purpose of this study is to develop a reliable and valid scale, which determines cyber victimization and bullying behaviors of high school students. Research group consisted of 404 students (250 male, 154 male) in Sakarya, in 2009-2010 academic years. In the study sample, mean age is 16.68. Content validity and face validity of the scale was…
Development of Social Media Addiction Test (SMAT17)
ERIC Educational Resources Information Center
Esgi, Necmi
2016-01-01
The aim of this study was to develop a test for assessing individuals' social media addiction; and conducting a reliability and validity study of this scale. Sample for this study was composed of 285 college students between the ages of 18 and 25. Reliability coefficients Cronbach's alpha value was 0.94 and Spearman Brown value was 0.91 for our…
Depression, Anxiety and Stress Scale (DASS): The Study of Validity and Reliability
ERIC Educational Resources Information Center
Basha, Ertan; Kaya, Mehmet
2016-01-01
The purpose of this study is to examine validity and reliability of the Albanian version of the Depression, Anxiety and Stress Scale (DASS), which is developed by Lovibond and Lovibond (1995). The sample of this study is consisted of 555 subjects who were living in Kosovo. The results of confirmatory factor analysis indicated 42 items loaded on…
ERIC Educational Resources Information Center
Özenç, Emine Gül; Dogan, M. Cihangir
2014-01-01
This study aims to perform a validity-reliability test by developing the Functional Literacy Experience Scale based upon Ecological Theory (FLESBUET) for primary education students. The study group includes 209 fifth grade students at Sabri Taskin Primary School in the Kartal District of Istanbul, Turkey during the 2010-2011 academic year.…
A Study of Validity and Reliability on the Instructional Capacity Scale
ERIC Educational Resources Information Center
Yalçin, Mehmet Tufan; Eres, Figen
2018-01-01
The aim of this study is to develop a valid and reliable measurement tool that can determine the instructional capacity, according to teacher opinions. In the academic year of 2016-2017, 1011 teachers working in the public high schools and vocational technical schools in Ankara participated in the study. The total number of items on the scale was…
Development of Creative Behavior Observation Form: A Study on Validity and Reliability
ERIC Educational Resources Information Center
Dere, Zeynep; Ömeroglu, Esra
2018-01-01
This study, Creative Behavior Observation Form was developed to assess creativity of the children. While the study group on the reliability and validity of Creative Behavior Observation Form was being developed, 257 children in total who were at the ages of 5-6 were used as samples with stratified sampling method. Content Validity Index (CVI) and…
ERIC Educational Resources Information Center
Henson, Robin K.; Thompson, Bruce
Given the potential value of reliability generalization (RG) studies in the development of cumulative psychometric knowledge, the purpose of this paper is to provide a tutorial on how to conduct such studies and to serve as a guide for researchers wishing to use this methodology. After some brief comments on classical test theory, the paper…
Park, Myung Sook; Kang, Kyung Ja; Jang, Sun Joo; Lee, Joo Yun; Chang, Sun Ju
2018-03-01
This study aimed to evaluate the components of test-retest reliability including time interval, sample size, and statistical methods used in patient-reported outcome measures in older people and to provide suggestions on the methodology for calculating test-retest reliability for patient-reported outcomes in older people. This was a systematic literature review. MEDLINE, Embase, CINAHL, and PsycINFO were searched from January 1, 2000 to August 10, 2017 by an information specialist. This systematic review was guided by both the Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist and the guideline for systematic review published by the National Evidence-based Healthcare Collaborating Agency in Korea. The methodological quality was assessed by the Consensus-based Standards for the selection of health Measurement Instruments checklist box B. Ninety-five out of 12,641 studies were selected for the analysis. The median time interval for test-retest reliability was 14days, and the ratio of sample size for test-retest reliability to the number of items in each measure ranged from 1:1 to 1:4. The most frequently used statistical methods for continuous scores was intraclass correlation coefficients (ICCs). Among the 63 studies that used ICCs, 21 studies presented models for ICC calculations and 30 studies reported 95% confidence intervals of the ICCs. Additional analyses using 17 studies that reported a strong ICC (>0.09) showed that the mean time interval was 12.88days and the mean ratio of the number of items to sample size was 1:5.37. When researchers plan to assess the test-retest reliability of patient-reported outcome measures for older people, they need to consider an adequate time interval of approximately 13days and the sample size of about 5 times the number of items. Particularly, statistical methods should not only be selected based on the types of scores of the patient-reported outcome measures, but should also be described clearly in the studies that report the results of test-retest reliability. Copyright © 2017 Elsevier Ltd. All rights reserved.
Schiffman, Eric L.; Truelove, Edmond L.; Ohrbach, Richard; Anderson, Gary C.; John, Mike T.; List, Thomas; Look, John O.
2011-01-01
AIMS The purpose of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) Validation Project was to assess the diagnostic validity of this examination protocol. An overview is presented, including Axis I and II methodology and descriptive statistics for the study participant sample. This paper details the development of reliable methods to establish the reference standards for assessing criterion validity of the Axis I RDC/TMD diagnoses. Validity testing for the Axis II biobehavioral instruments was based on previously validated reference standards. METHODS The Axis I reference standards were based on the consensus of 2 criterion examiners independently performing a comprehensive history, clinical examination, and evaluation of imaging. Intersite reliability was assessed annually for criterion examiners and radiologists. Criterion exam reliability was also assessed within study sites. RESULTS Study participant demographics were comparable to those of participants in previous studies using the RDC/TMD. Diagnostic agreement of the criterion examiners with each other and with the consensus-based reference standards was excellent with all kappas ≥ 0.81, except for osteoarthrosis (moderate agreement, k = 0.53). Intrasite criterion exam agreement with reference standards was excellent (k ≥ 0.95). Intersite reliability of the radiologists for detecting computed tomography-disclosed osteoarthrosis and magnetic resonance imaging-disclosed disc displacement was good to excellent (k = 0.71 and 0.84, respectively). CONCLUSION The Validation Project study population was appropriate for assessing the reliability and validity of the RDC/TMD Axis I and II. The reference standards used to assess the validity of Axis I TMD were based on reliable and clinically credible methods. PMID:20213028
Sawchuk, Dena; Currie, Kris; Vich, Manuel Lagravere; Palomo, Juan Martin
2016-01-01
Objective To evaluate the accuracy and reliability of the diagnostic tools available for assessing maxillary transverse deficiencies. Methods An electronic search of three databases was performed from their date of establishment to April 2015, with manual searching of reference lists of relevant articles. Articles were considered for inclusion if they reported the accuracy or reliability of a diagnostic method or evaluation technique for maxillary transverse dimensions in mixed or permanent dentitions. Risk of bias was assessed in the included articles, using the Quality Assessment of Diagnostic Accuracy Studies tool-2. Results Nine articles were selected. The studies were heterogeneous, with moderate to low methodological quality, and all had a high risk of bias. Four suggested that the use of arch width prediction indices with dental cast measurements is unreliable for use in diagnosis. Frontal cephalograms derived from cone-beam computed tomography (CBCT) images were reportedly more reliable for assessing intermaxillary transverse discrepancies than posteroanterior cephalograms. Two studies proposed new three-dimensional transverse analyses with CBCT images that were reportedly reliable, but have not been validated for clinical sensitivity or specificity. No studies reported sensitivity, specificity, positive or negative predictive values or likelihood ratios, or ROC curves of the methods for the diagnosis of transverse deficiencies. Conclusions Current evidence does not enable solid conclusions to be drawn, owing to a lack of reliable high quality diagnostic studies evaluating maxillary transverse deficiencies. CBCT images are reportedly more reliable for diagnosis, but further validation is required to confirm CBCT's accuracy and diagnostic superiority. PMID:27668196
Hales, M; Biros, E; Reznik, J E
2015-01-01
Since 1982, the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) has been used to classify sensation of spinal cord injury (SCI) through pinprick and light touch scores. The absence of proprioception, pain, and temperature within this scale creates questions about its validity and accuracy. To assess whether the sensory component of the ISNCSCI represents a reliable and valid measure of classification of SCI. A systematic review of studies examining the reliability and validity of the sensory component of the ISNCSCI published between 1982 and February 2013 was conducted. The electronic databases MEDLINE via Ovid, CINAHL, PEDro, and Scopus were searched for relevant articles. A secondary search of reference lists was also completed. Chosen articles were assessed according to the Oxford Centre for Evidence-Based Medicine hierarchy of evidence and critically appraised using the McMasters Critical Review Form. A statistical analysis was conducted to investigate the variability of the results given by reliability studies. Twelve studies were identified: 9 reviewed reliability and 3 reviewed validity. All studies demonstrated low levels of evidence and moderate critical appraisal scores. The majority of the articles (~67%; 6/9) assessing the reliability suggested that training was positively associated with better posttest results. The results of the 3 studies that assessed the validity of the ISNCSCI scale were confounding. Due to the low to moderate quality of the current literature, the sensory component of the ISNCSCI requires further revision and investigation if it is to be a useful tool in clinical trials.
Hales, M.; Biros, E.
2015-01-01
Background: Since 1982, the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) has been used to classify sensation of spinal cord injury (SCI) through pinprick and light touch scores. The absence of proprioception, pain, and temperature within this scale creates questions about its validity and accuracy. Objectives: To assess whether the sensory component of the ISNCSCI represents a reliable and valid measure of classification of SCI. Methods: A systematic review of studies examining the reliability and validity of the sensory component of the ISNCSCI published between 1982 and February 2013 was conducted. The electronic databases MEDLINE via Ovid, CINAHL, PEDro, and Scopus were searched for relevant articles. A secondary search of reference lists was also completed. Chosen articles were assessed according to the Oxford Centre for Evidence-Based Medicine hierarchy of evidence and critically appraised using the McMasters Critical Review Form. A statistical analysis was conducted to investigate the variability of the results given by reliability studies. Results: Twelve studies were identified: 9 reviewed reliability and 3 reviewed validity. All studies demonstrated low levels of evidence and moderate critical appraisal scores. The majority of the articles (~67%; 6/9) assessing the reliability suggested that training was positively associated with better posttest results. The results of the 3 studies that assessed the validity of the ISNCSCI scale were confounding. Conclusions: Due to the low to moderate quality of the current literature, the sensory component of the ISNCSCI requires further revision and investigation if it is to be a useful tool in clinical trials. PMID:26363591
[Reliability and validity of depression scales of Chinese version: a systematic review].
Sun, X Y; Li, Y X; Yu, C Q; Li, L M
2017-01-10
Objective: Through systematically reviewing the reliability and validity of depression scales of Chinese version in adults in China to evaluate the psychometric properties of depression scales for different groups. Methods: Eligible studies published before 6 May 2016 were retrieved from the following database: CNKI, Wanfang, PubMed and Embase. The HSROC model of the diagnostic test accuracy (DTA) for Meta-analysis was used to calculate the pooled sensitivity and specificity of the PHQ-9. Results: A total of 44 papers evaluating the performance of depression scales were included. Results showed that the reliability and validity of the common depression scales were eligible, including the Beck depression inventory (BDI), the Hamilton depression scale (HAMD), the center epidemiological studies depression scale (CES-D), the patient health questionnaire (PHQ) and the Geriatric depression scale (GDS). The Cronbach' s coefficient of most tools were larger than 0.8, while the test-retest reliability and split-half reliability were larger than 0.7, indicating good internal consistency and stability. The criterion validity, convergent validity, discrimination validity and screening validity were acceptable though different cut-off points were recommended by different studies. The pooled sensitivity of the 11 studies evaluating PHQ-9 was 0.88 (95 %CI : 0.85-0.91) while the pooled specificity was 0.89 (95 %CI : 0.82-0.94), which demonstrated the applicability of PHQ-9 in screening depression. Conclusion: The reliability and validity of different depression scales of Chinese version are acceptable. The characteristics of different tools and study population should be taken into consideration when choosing a specific scale.
Software development predictors, error analysis, reliability models and software metric analysis
NASA Technical Reports Server (NTRS)
Basili, Victor
1983-01-01
The use of dynamic characteristics as predictors for software development was studied. It was found that there are some significant factors that could be useful as predictors. From a study on software errors and complexity, it was shown that meaningful results can be obtained which allow insight into software traits and the environment in which it is developed. Reliability models were studied. The research included the field of program testing because the validity of some reliability models depends on the answers to some unanswered questions about testing. In studying software metrics, data collected from seven software engineering laboratory (FORTRAN) projects were examined and three effort reporting accuracy checks were applied to demonstrate the need to validate a data base. Results are discussed.
A Meta-Analysis of the Reliability of Free and For-Pay Big Five Scales.
Hamby, Tyler; Taylor, Wyn; Snowden, Audrey K; Peterson, Robert A
2016-01-01
The present study meta-analytically compared coefficient alpha reliabilities reported for free and for-pay Big Five scales. We collected 288 studies from five previous meta-analyses of Big Five traits and harvested 1,317 alphas from these studies. We found that free and for-pay scales measuring Big Five traits possessed comparable reliabilities. However, after we controlled for the numbers of items in the scales with the Spearman-Brown formula, we found that free scales possessed significantly higher alpha coefficients than for-pay scales for each of the Big Five traits. Thus, the study offers initial evidence that Big Five scales that are free more efficiently measure these traits for research purposes than do for-pay scales.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
Ten major studies are included in this report; a separate abstract was prepared for each. In addition, there are 4 appendices related to reliability, namely: (A) Load and Generation Uncertainty, by Norton Savage, DOE; (B) Comparison of Service-Interruption Cost Studies, by Dr. Gay Lamb, DOE; (C) Impact of Large-Scale Fuel-Supply Disruptions on Regional Electric-Power Reliability, by Anthony J. Como and Mark Gielecki, DOE; and (D) Reliability Effects of the 1980 Florida Conservation Act, by William E. Scott and Thomas R. Hitz, Jr., DOE. (LCL)
Infant polysomnography: reliability and validity of infant arousal assessment.
Crowell, David H; Kulp, Thomas D; Kapuniai, Linda E; Hunt, Carl E; Brooks, Lee J; Weese-Mayer, Debra E; Silvestri, Jean; Ward, Sally Davidson; Corwin, Michael; Tinsley, Larry; Peucker, Mark
2002-10-01
Infant arousal scoring based on the Atlas Task Force definition of transient EEG arousal was evaluated to determine (1). whether transient arousals can be identified and assessed reliably in infants and (2). whether arousal and no-arousal epochs scored previously by trained raters can be validated reliably by independent sleep experts. Phase I for inter- and intrarater reliability scoring was based on two datasets of sleep epochs selected randomly from nocturnal polysomnograms of healthy full-term, preterm, idiopathic apparent life-threatening event cases, and siblings of Sudden Infant Death Syndrome infants of 35 to 64 weeks postconceptional age. After training, test set 1 reliability was assessed and discrepancies identified. After retraining, test set 2 was scored by the same raters to determine interrater reliability. Later, three raters from the trained group rescored test set 2 to assess inter- and intrarater reliabilities. Interrater and intrarater reliability kappa's, with 95% confidence intervals, ranged from substantial to almost perfect levels of agreement. Interrater reliabilities for spontaneous arousals were initially moderate and then substantial. During the validation phase, 315 previously scored epochs were presented to four sleep experts to rate as containing arousal or no-arousal events. Interrater expert agreements were diverse and considered as noninterpretable. Concordance in sleep experts' agreements, based on identification of the previously sampled arousal and no-arousal epochs, was used as a secondary evaluative technique. Results showed agreement by two or more experts on 86% of the Collaborative Home Infant Monitoring Evaluation Study arousal scored events. Conversely, only 1% of the Collaborative Home Infant Monitoring Evaluation Study-scored no-arousal epochs were rated as an arousal. In summary, this study presents an empirically tested model with procedures and criteria for attaining improved reliability in transient EEG arousal assessments in infants using the modified Atlas Task Force standards. With training based on specific criteria, substantial inter- and intrarater agreement in identifying infant arousals was demonstrated. Corroborative validation results were too disparate for meaningful interpretation. Alternate evaluation based on concordance agreements supports reliance on infant EEG criteria for assessment. Results mandate additional confirmatory validation studies with specific training on infant EEG arousal assessment criteria.
DOT National Transportation Integrated Search
2018-01-11
Background: This study sought to systematically search the literature to identify reliable and valid survey instruments for fatigue measurement in the Emergency Medical Services (EMS) occupational setting. Methods: A systematic review study design wa...
Research Measures for Dyscalculia: A Validity and Reliability Study.
ERIC Educational Resources Information Center
Geiman, R. M.
1986-01-01
This study sought to evaluate a measure of dyscalculia to determine its validity and reliability. It also tested use of the instrument with seventh graders and ascertained where errors attributed to dyscalculia were also present in an average sample of seventh graders. Results varied. (MNS)
DOT National Transportation Integrated Search
2010-12-01
The purpose of this qualitative case study was to identify the types of obstacles and patterns experienced by a single heavy rail transit agency located in North America that embedded a Reliability Centered Maintenance (RCM) Process. The outcome of t...
Measures To Monitor Developmental Disabilities Quality Assurance: A Study of Reliability.
ERIC Educational Resources Information Center
Dodder, Richard A.; Foster, Luann H.; Bolin, Brien L.
1999-01-01
This study examined the reliability of an instrument used to evaluate services for people with developmental disabilities. Seven types of variables were analyzed: demographic data, residential arrangements, medical needs, adaptive behavior, severity of challenging behavior, frequency of challenging behavior, and the perception that disabled…
Cha, Young Joo; Lee, Jae Jin; Kim, Do Hyun; You, Joshua Sung H
2017-10-23
Core stabilization plays an important role in the regulation of postural stability. To overcome shortcomings associated with pain and severe core instability during conventional core stabilization tests, we recently developed the dynamic neuromuscular stabilization-based heel sliding (DNS-HS) test. The purpose of this study was to establish the criterion validity and test-retest reliability of the novel DNS-HS test. Twenty young adults with core instability completed both the bilateral straight leg lowering test (BSLLT) and DNS-HS test for the criterion validity study and repeated the DNS-HS test for the test-retest reliability study. Criterion validity was determined by comparing hip joint angle data that were obtained from BSLLT and DNS-HS measures. The test-retest reliability was determined by comparing hip joint angle data. Criterion validity was (ICC2,3) = 0.700 (p< 0.05), suggesting a good relationship between the two core stability measures. Test-retest reliability was (ICC3,3) = 0.953 (p< 0.05), indicating excellent consistency between the repeated DNS-HS measurements. Criterion validity data demonstrated a good relationship between the gold standard BSLLT and DNS-HS core stability measures. Test-retest reliability data suggests that DNS-HS core stability was a reliable test for core stability. Clinically, the DNS-HS test is useful to objectively quantify core instability and allow early detection and evaluation.
McCreesh, Karen M; Anjum, Shakeel; Crotty, James M; Lewis, Jeremy S
2016-01-01
Rotator cuff (RC) tendinopathy has been widely ascribed to impingement of the supraspinatus tendon (SsT) in the subacromial space, measured as the acromiohumeral distance (AHD). Ultrasound (US) is suitable for measuring AHD and SsT thickness, but few reliability studies have been carried out in symptomatic populations, and interrater reliability is unconfirmed. This study aimed to examine the intrarater and interrater reliability of US measurements of AHD and SsT thickness in asymptomatic control subjects and patients with RC tendinopathy. Seventy participants were recruited and grouped as healthy controls (n = 25) and RC tendinopathy (n = 45). Repeated US measurements of AHD and SsT thickness were obtained by one rater in both groups and by two raters in the RC tendinopathy group. Intrarater and interrater reliability coefficients were excellent for both measurements (intraclass correlation > 0.92), but the intrarater reliability was superior. The minimal detectable change values in the symptomatic group were 0.7 mm for AHD and 0.6 mm for SsT thickness for a single experienced examiner; the values rose to 1.2 mm and 1.3 mm, respectively, for the pair of examiners. The results support the reliability of US for the measurement of AHD and SsT thickness in patients with symptomatic RC tendinopathy and provide minimal detectable change values for use in future research studies. © 2015 Wiley Periodicals, Inc.
A study of the reliability of the Nociception Coma Scale.
Riganello, F; Cortese, M D; Arcuri, F; Candelieri, A; Guglielmino, F; Dolce, G; Sannita, W G; Schnakers, C
2015-04-01
In this study, we investigated the reliability of the Nociception Coma Scale which has recently been developed to assess nociception in non-communicative, severely brain-injured patients. Prospective cross-sequential study. Semi-intensive care unit and long-term brain injury care. Forty-four patients diagnosed as being in a vegetative state (n=26) or in a minimally conscious state (n=18). Patients were assessed by two experts (rater A and rater B) on two consecutive weeks to measure inter-rater agreement and test-retest reliability. Total scores and subscores of the Nociception Coma Scale. We performed a total of 176 assessments. The inter-rater agreement was moderate for the total scores (k = 0.57) and fair to substantial for the subscores (0.33 ≤ k ≤ 0.62) on week 2. The test-retest reliability was substantial for the total scores (k = 0.66) and moderate to almost perfect for the subscores (0.53 ≤ k ≤ 0.96) for rater A. The inter-rater agreement was weaker on week 1, whereas the test-retest reliability was lower for the least experienced rater (rater B). This study provides further evidence of the psychometric qualities of the Nociception Coma Scale. Future studies should assess the impact of practical experience and background on administration and scoring of the scale. © The Author(s) 2014.
Gómez-Cabello, Alba; Vicente-Rodríguez, Germán; Albers, Ulrike; Mata, Esmeralda; Rodriguez-Marroyo, Jose A.; Olivares, Pedro R.; Gusi, Narcis; Villa, Gerardo; Aznar, Susana; Gonzalez-Gross, Marcela; Casajús, Jose A.; Ara, Ignacio
2012-01-01
Background The elderly EXERNET multi-centre study aims to collect normative anthropometric data for old functionally independent adults living in Spain. Purpose To describe the standardization process and reliability of the anthropometric measurements carried out in the pilot study and during the final workshop, examining both intra- and inter-rater errors for measurements. Materials and Methods A total of 98 elderly from five different regions participated in the intra-rater error assessment, and 10 different seniors living in the city of Toledo (Spain) participated in the inter-rater assessment. We examined both intra- and inter-rater errors for heights and circumferences. Results For height, intra-rater technical errors of measurement (TEMs) were smaller than 0.25 cm. For circumferences and knee height, TEMs were smaller than 1 cm, except for waist circumference in the city of Cáceres. Reliability for heights and circumferences was greater than 98% in all cases. Inter-rater TEMs were 0.61 cm for height, 0.75 cm for knee-height and ranged between 2.70 and 3.09 cm for the circumferences measured. Inter-rater reliabilities for anthropometric measurements were always higher than 90%. Conclusion The harmonization process, including the workshop and pilot study, guarantee the quality of the anthropometric measurements in the elderly EXERNET multi-centre study. High reliability and low TEM may be expected when assessing anthropometry in elderly population. PMID:22860013
Issues in benchmarking human reliability analysis methods : a literature review.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lois, Erasmia; Forester, John Alan; Tran, Tuan Q.
There is a diversity of human reliability analysis (HRA) methods available for use in assessing human performance within probabilistic risk assessment (PRA). Due to the significant differences in the methods, including the scope, approach, and underlying models, there is a need for an empirical comparison investigating the validity and reliability of the methods. To accomplish this empirical comparison, a benchmarking study is currently underway that compares HRA methods with each other and against operator performance in simulator studies. In order to account for as many effects as possible in the construction of this benchmarking study, a literature review was conducted,more » reviewing past benchmarking studies in the areas of psychology and risk assessment. A number of lessons learned through these studies are presented in order to aid in the design of future HRA benchmarking endeavors.« less
Discrete component bonding and thick film materials study
NASA Technical Reports Server (NTRS)
Kinser, D. L.
1975-01-01
The results are summarized of an investigation of discrete component bonding reliability and a fundamental study of new thick film resistor materials. The component bonding study examined several types of solder bonded components with some processing variable studies to determine their influence upon bonding reliability. The bonding reliability was assessed using the thermal cycle: 15 minutes at room temperature, 15 minutes at +125 C 15 minutes at room temperature, and 15 minutes at -55 C. The thick film resistor materials examined were of the transition metal oxide-phosphate glass family with several elemental metal additions of the same transition metal. These studies were conducted by preparing a paste of the subject composition, printing, drying, and firing using both air and reducing atmospheres. The resulting resistors were examined for adherence, resistance, thermal coefficient of resistance, and voltage coefficient of resistance.
Phillips, Nicole Margaret; Street, Maryann; Haesler, Emily
2016-02-01
Patient participation in healthcare is recognised internationally as essential for consumer-centric, high-quality healthcare delivery. Its measurement as part of continuous quality improvement requires development of agreed standards and measurable indicators. This systematic review sought to identify strategies to measure patient participation in healthcare and to report their reliability and validity. In the context of this review, patient participation was constructed as shared decision-making, acknowledging the patient as having critical knowledge regarding their own health and care needs and promoting self-care/autonomy. Following a comprehensive search, studies reporting reliability or validity of an instrument used in a healthcare setting to measure patient participation, published in English between January 2004 and March 2014 were eligible for inclusion. From an initial search, which identified 1582 studies, 156 studies were retrieved and screened against inclusion criteria. Thirty-three studies reporting 24 patient participation measurement tools met inclusion criteria, and were critically appraised. The majority of studies were descriptive psychometric studies using prospective, cross-sectional designs. Almost all the tools completed by patients, family caregivers, observers or more than one stakeholder focused on aspects of patient-professional communication. Few tools designed for completion by patients or family caregivers provided valid and reliable measures of patient participation. There was low correlation between many of the tools and other measures of patient satisfaction. Few reliable and valid tools for measurement of patient participation in healthcare have been recently developed. Of those reported in this review, the dyadic Observing Patient Involvement in Decision Making (dyadic-OPTION) tool presents the most promise for measuring core components of patient participation. There remains a need for further study into valid, reliable and feasible strategies for measuring patient participation as part of continuous quality improvement. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
ERIC Educational Resources Information Center
Smith, Stacey L.; Vannest, Kimberly J.; Davis, John L.
2011-01-01
The reliability of data is a critical issue in decision-making for practitioners in the school. Percent Agreement and Cohen's kappa are the two most widely reported indices of inter-rater reliability, however, a recent Monte Carlo study on the reliability of multi-category scales found other indices to be more trustworthy given the type of data…
O'Connor, S; McCaffrey, N; Whyte, E; Moran, K
2016-07-01
To adapt the trunk stability test to facilitate further sub-classification of higher levels of core stability in athletes for use as a screening tool. To establish the inter-tester and intra-tester reliability of this adapted core stability test. Reliability study. Collegiate athletic therapy facilities. Fifteen physically active male subjects (19.46 ± 0.63) free from any orthopaedic or neurological disorders were recruited from a convenience sample of collegiate students. The intraclass correlation coefficients (ICC) and 95% Confidence Intervals (CI) were computed to establish inter-tester and intra-tester reliability. Excellent ICC values were observed in the adapted core stability test for inter-tester reliability (0.97) and good to excellent intra-tester reliability (0.73-0.90). While the 95% CI were narrow for inter-tester reliability, Tester A and C 95% CI's were widely distributed compared to Tester B. The adapted core stability test developed in this study is a quick and simple field based test to administer that can further subdivide athletes with high levels of core stability. The test demonstrated high inter-tester and intra-tester reliability. Copyright © 2015 Elsevier Ltd. All rights reserved.
Keller, Lisa A; Clauser, Brian E; Swanson, David B
2010-12-01
In recent years, demand for performance assessments has continued to grow. However, performance assessments are notorious for lower reliability, and in particular, low reliability resulting from task specificity. Since reliability analyses typically treat the performance tasks as randomly sampled from an infinite universe of tasks, these estimates of reliability may not be accurate. For tests built according to a table of specifications, tasks are randomly sampled from different strata (content domains, skill areas, etc.). If these strata remain fixed in the test construction process, ignoring this stratification in the reliability analysis results in an underestimate of "parallel forms" reliability, and an overestimate of the person-by-task component. This research explores the effect of representing and misrepresenting the stratification appropriately in estimation of reliability and the standard error of measurement. Both multivariate and univariate generalizability studies are reported. Results indicate that the proper specification of the analytic design is essential in yielding the proper information both about the generalizability of the assessment and the standard error of measurement. Further, illustrative D studies present the effect under a variety of situations and test designs. Additional benefits of multivariate generalizability theory in test design and evaluation are also discussed.
Skinner, Ian W; Hübscher, Markus; Moseley, G Lorimer; Lee, Hopin; Wand, Benedict M; Traeger, Adrian C; Gustin, Sylvia M; McAuley, James H
2017-08-15
Eyetracking is commonly used to investigate attentional bias. Although some studies have investigated the internal consistency of eyetracking, data are scarce on the test-retest reliability and agreement of eyetracking to investigate attentional bias. This study reports the test-retest reliability, measurement error, and internal consistency of 12 commonly used outcome measures thought to reflect the different components of attentional bias: overall attention, early attention, and late attention. Healthy participants completed a preferential-looking eyetracking task that involved the presentation of threatening (sensory words, general threat words, and affective words) and nonthreatening words. We used intraclass correlation coefficients (ICCs) to measure test-retest reliability (ICC > .70 indicates adequate reliability). The ICCs(2, 1) ranged from -.31 to .71. Reliability varied according to the outcome measure and threat word category. Sensory words had a lower mean ICC (.08) than either affective words (.32) or general threat words (.29). A longer exposure time was associated with higher test-retest reliability. All of the outcome measures, except second-run dwell time, demonstrated low measurement error (<6%). Most of the outcome measures reported high internal consistency (α > .93). Recommendations are discussed for improving the reliability of eyetracking tasks in future research.
Objective measurements of excess skin in post bariatric patients--inter-rater reliability.
Biörserud, Christina; Fagevik Olsén, Monika; Elander, Anna; Wiklund, Malin
2016-01-01
An ability to reliably assess excess skin after massive weight loss using well-described and transferrable methods is important. The aim of this trial was to evaluate inter-rater reliability of ptosis and circumference measurements in patients with excess skin after bariatric surgery. Twenty-five postbariatric patients were included in the study, and their excess skin was measured 18 months after surgery. A protocol was designed to measure excess skin in a standardised way. To evaluate the inter-rater reliability in the measuring protocol, all patients were measured twice, by a specialist nurse and a specialist physiotherapist. All circumference measurements on different body parts had an ICC > 0.9, indicating high reliability. Furthermore, all breast and abdominal ptosis measurements had high reliability. In contrast, visual evaluation of abdominal ptosis had poor reliability. Measurements of ptoses on different body parts had an ICC > 0.6. There were no systematic differences between the results of the two testers, except for measurements of the buttocks and maximal knee circumference. The measuring protocol presented in this study has high reliability and, therefore, represents a useful instrument to provide a consistent and objective assessment of excess skin in the postbariatric patient.
Cyril, Sheila; Oldroyd, John C; Renzaho, Andre
2013-05-28
Despite a plethora of studies examining the effect of increased urbanisation on health, no single study has systematically examined the measurement properties of scales used to measure urbanicity. It is critical to distinguish findings from studies that use surrogate measures of urbanicity (e.g. population density) from those that use measures rigorously tested for reliability and validity. The purpose of this study was to assess the measurement reliability and validity of the available urbanicity scales and identify areas where more research is needed to facilitate the development of a standardised measure of urbanicity. Databases searched were MEDLINE with Full Text, CINAHL with Full Text, and PsycINFO (EBSCOhost) as well as Embase (Ovid) covering the period from January 1970 to April 2012. Studies included in this systematic review were those that focused on the development of an urbanicity scale with clearly defined items or the adoption of an existing scale, included at least one outcome measure related to health, published in peer-reviewed journals, the full text was available in English and tested for validity and reliability. Eleven studies met our inclusion criteria which were conducted in Sri Lanka, Austria, China, Nigeria, India and Philippines. They ranged in size from 3327 to 33,404 participants. The number of scale items ranged from 7 to 12 items in 5 studies. One study measured urban area socioeconomic disadvantage instead of urbanicity. The emerging evidence is that increased urbanisation is associated with deleterious health outcomes. It is possible that increased urbanisation is also associated with access and utilisation of health services. However, urbanicity measures differed across studies, and the reliability and validity properties of the used scales were not well established. There is an urgent need for studies to standardise measures of urbanicity. Longitudinal cohort studies to confirm the relationship between increased urbanisation and health outcomes are urgently needed.
2013-01-01
Background Despite a plethora of studies examining the effect of increased urbanisation on health, no single study has systematically examined the measurement properties of scales used to measure urbanicity. It is critical to distinguish findings from studies that use surrogate measures of urbanicity (e.g. population density) from those that use measures rigorously tested for reliability and validity. The purpose of this study was to assess the measurement reliability and validity of the available urbanicity scales and identify areas where more research is needed to facilitate the development of a standardised measure of urbanicity. Methods Databases searched were MEDLINE with Full Text, CINAHL with Full Text, and PsycINFO (EBSCOhost) as well as Embase (Ovid) covering the period from January 1970 to April 2012. Studies included in this systematic review were those that focused on the development of an urbanicity scale with clearly defined items or the adoption of an existing scale, included at least one outcome measure related to health, published in peer-reviewed journals, the full text was available in English and tested for validity and reliability. Results Eleven studies met our inclusion criteria which were conducted in Sri Lanka, Austria, China, Nigeria, India and Philippines. They ranged in size from 3327 to 33,404 participants. The number of scale items ranged from 7 to 12 items in 5 studies. One study measured urban area socioeconomic disadvantage instead of urbanicity. The emerging evidence is that increased urbanisation is associated with deleterious health outcomes. It is possible that increased urbanisation is also associated with access and utilisation of health services. However, urbanicity measures differed across studies, and the reliability and validity properties of the used scales were not well established. Conclusion There is an urgent need for studies to standardise measures of urbanicity. Longitudinal cohort studies to confirm the relationship between increased urbanisation and health outcomes are urgently needed. PMID:23714282
Reliability and Validity of the "Decision-Making Skills Instrument for Children"
ERIC Educational Resources Information Center
Pekdogan, Serpil; Ulutas, Ilkay
2016-01-01
The purpose of this study is to develop a valid and reliable data collection tool to assess the decision-making skills of children at the age of 5 to 6. The study group is composed of 300 children attending independent pre-schools located in the central district of Amasya province and their parents. In the study, four-factor and 29-item…
The purpose of this study is to examine the feasibility of collecting, transmitting,
and analyzing 3-D ultrasound data in the context of a multi-center study of pregnant
women. The study will also examine the reliability of measurements obtained from 3-D
imag...
USDA-ARS?s Scientific Manuscript database
This study assessed the feasibility, reliability and validity of reflection spectroscopy (RS) to assess skin carotenoids in a racially diverse sample. Study 1 was a cross-sectional study of corner store customers (n= 479) in Eastern North Carolina USA who completed the National Cancer Institute Frui...
A Validity and Reliability Study on the Development of the Values Scale in Turkey
ERIC Educational Resources Information Center
Dilmac, Bulent; Aricak, Osman Tolga; Cesur, Sevim
2014-01-01
The purpose of the present study is to examine the initial psychometric properties of the Values Scale for adults. While developing the first stage of the Values Scale, open-ended data on the values held by 216 university students were obtained. During the second stage, the validity and reliability studies of the 60-item Values Scale obtained by…
Lee, Ji-Hyun; Cynn, Heon-Seock; Choi, Woo-Jeong; Jeong, Hyo-Jung; Yoon, Tae-Lim
2016-05-01
The objective of this study was to introduce levator scapulae (LS) measurement using a caliper and the levator scapulae index (LSI) and to investigate intra- and interrater reliability of the LSI in subjects with and without scapular downward rotation syndrome (SDRS). Two raters measured LS length twice in 38 subjects (19 with SDRS and 19 without SDRS). For reliability testing, intraclass correlation coefficients (ICCs), standard error of measurement (SEM), and minimal detectable change (MDC) were calculated. Intrarater reliability analysis resulted with ICCs ranging from 0.94 to 0.98 in subjects with SDRS and 0.96 to 0.98 in subjects without SDRS. These results represented that intrarater reliability in both groups were excellent for measuring LS length with the LSI. Interrater reliability was good (ICC: 0.82) in subjects with SDRS; however, interrater reliability was moderate (ICC: 0.75) in subjects without SDRS. Additionally, SEM and MDC were 0.13% and 0.36% in subjects with SDRS and 0.35% and 0.97% in subjects without SDRS. In subjects with SDRS, low dispersion of the measurement errors and MDC were shown. This study suggested that the LSI is a reliable method to measure LS length and is more reliable for subjects with SDRS. Copyright © 2015 Elsevier Ltd. All rights reserved.
Safipour, Jalal; Tessma, Mesfin Kassaye; Higginbottom, Gina; Emami, Azita
2010-12-01
The objective of the study is to translate and examine the reliability and validity of the Jessor and Jessor Social Alienation Scale for use in a Swedish context. The study involved four phases of testing: (1) Translation and back-translation; (2) a pilot test to evaluate the translation; (3) reliability testing; and (4) a validity test. Main participants of this study were 446 students (Age = 15-19, SD = 1.01, Mean = 17). Results from the reliability test showed high internal consistency and stability. Face, content and construct validity were demonstrated using experts and confirmatory factor analysis. The results of testing the Swedish version of the alienation scale revealed an acceptable level of reliability and validity, and is appropriate for use in the Swedish context. © 2010 The Authors. Scandinavian Journal of Psychology © 2010 The Scandinavian Psychological Associations.
Study samples are too small to produce sufficiently precise reliability coefficients.
Charter, Richard A
2003-04-01
In a survey of journal articles, test manuals, and test critique books, the author found that a mean sample size (N) of 260 participants had been used for reliability studies on 742 tests. The distribution was skewed because the median sample size for the total sample was only 90. The median sample sizes for the internal consistency, retest, and interjudge reliabilities were 182, 64, and 36, respectively. The author presented sample size statistics for the various internal consistency methods and types of tests. In general, the author found that the sample sizes that were used in the internal consistency studies were too small to produce sufficiently precise reliability coefficients, which in turn could cause imprecise estimates of examinee true-score confidence intervals. The results also suggest that larger sample sizes have been used in the last decade compared with those that were used in earlier decades.
The Children's Play Therapy Instrument (CPTI): Description, Development, and Reliability Studies
Kernberg, Paulina F.; Chazan, Saralea E.; Normandin, Lina
1998-01-01
The Children's Play Therapy Instrument (CPTI), its development, and reliability studies are described. The CPTI is a new instrument to examine a child's play activity in individual psychotherapy. Three independent raters used the CPTI to rate eight videotaped play therapy vignettes. Results were compared with the authors' consensual scores from a preliminary study. Generally good to excellent levels of interrater reliability were obtained for the independent raters on intraclass correlation coefficients for ordinal categories of the CPTI. Likewise, kappa levels were acceptable to excellent for nominal categories of the scale. The CPTI holds promise to become a reliable measure of play activity in child psychotherapy. Further research is needed to assess discriminant validity of the CPTI for use as a diagnostic tool and as a measure of process and outcome.(The Journal of Psychotherapy Practice and Research 1998; 7:196–207) PMID:9631341
Spector, Aimee; Hebditch, Molly; Stoner, Charlotte R; Gibbor, Luke
2016-09-01
The ability to identify biological, social, and psychological issues for people with dementia is an important skill for healthcare professionals. Therefore, valid and reliable measures are needed to assess this ability. This study involves the development of a vignette style measure to capture the extent to which health professionals use "Biopsychosocial" thinking in dementia care (VIG-Dem), based on the framework of the model developed by Spector and Orrell (2010). The development process consisted of Phase 1: Developing and refining the vignettes; Phase 2: Field testing (N = 9), and Phase 3: A pilot study to assess reliability and validity (N = 131). The VIG-Dem, consisting of two vignettes with open-ended questions and a standardized scoring scheme, was developed. Evidence for the good inter-rater reliability, convergent validity, and test-retest reliability were established. The VIG-Dem has good psychometric properties and may provide a useful tool in dementia care research and practice.
Developing scale for colleague solidarity among nurses in Turkey.
Uslusoy, Esin Cetinkaya; Alpar, Sule Ecevit
2013-02-01
There is a need for an appropriate instrument to measure colleague solidarity among nurses. This study was carried out to develop a Colleague Solidarity of Nurses' Scale (CSNS). This study was planned to be descriptive and methodological. The CSNS examined content validity, construct validity, test-retest reliability and internal consistency reliability. The trial form of the CSNS, which was composed of 44 items, was given to 200 nurses, followed by validity and reliability analyses. Following the analyses, 21 items were excluded from the scale, leaving an attitude scale made up of 23 items. Factor analysis of the data showed that the scale has a three sub-factor structure: emotional solidarity, academic solidarity and negative opinions about solidarity. The Cronbach's alpha reliability of the whole scale was 0.80. This study provides evidence that the CSNS possesses robust solidarity among nurses. © 2013 Wiley Publishing Asia Pty Ltd.
The reliability of in-training assessment when performance improvement is taken into account.
van Lohuizen, Mirjam T; Kuks, Jan B M; van Hell, Elisabeth A; Raat, A N; Stewart, Roy E; Cohen-Schotanus, Janke
2010-12-01
During in-training assessment students are frequently assessed over a longer period of time and therefore it can be expected that their performance will improve. We studied whether there really is a measurable performance improvement when students are assessed over an extended period of time and how this improvement affects the reliability of the overall judgement. In-training assessment results were obtained from 104 students on rotation at our university hospital or at one of the six affiliated hospitals. Generalisability theory was used in combination with multilevel analysis to obtain reliability coefficients and to estimate the number of assessments needed for reliable overall judgement, both including and excluding performance improvement. Students' clinical performance ratings improved significantly from a mean of 7.6 at the start to a mean of 7.8 at the end of their clerkship. When taking performance improvement into account, reliability coefficients were higher. The number of assessments needed to achieve a reliability of 0.80 or higher decreased from 17 to 11. Therefore, when studying reliability of in-training assessment, performance improvement should be considered.
NASA Technical Reports Server (NTRS)
Wilson, Larry W.
1989-01-01
The longterm goal of this research is to identify or create a model for use in analyzing the reliability of flight control software. The immediate tasks addressed are the creation of data useful to the study of software reliability and production of results pertinent to software reliability through the analysis of existing reliability models and data. The completed data creation portion of this research consists of a Generic Checkout System (GCS) design document created in cooperation with NASA and Research Triangle Institute (RTI) experimenters. This will lead to design and code reviews with the resulting product being one of the versions used in the Terminal Descent Experiment being conducted by the Systems Validations Methods Branch (SVMB) of NASA/Langley. An appended paper details an investigation of the Jelinski-Moranda and Geometric models for software reliability. The models were given data from a process that they have correctly simulated and asked to make predictions about the reliability of that process. It was found that either model will usually fail to make good predictions. These problems were attributed to randomness in the data and replication of data was recommended.
Keppler, Hannah; Dhooge, Ingeborg; Maes, Leen; D'haenens, Wendy; Bockstael, Annelies; Philips, Birgit; Swinnen, Freya; Vinck, Bart
2010-02-01
Knowledge regarding the variability of transient-evoked otoacoustic emissions (TEOAEs) and distortion product otoacoustic emissions (DPOAEs) is essential in clinical settings and improves their utility in monitoring hearing status over time. In the current study, TEOAEs and DPOAEs were measured with commercially available OAE-equipment in 56 normally-hearing ears during three sessions. Reliability was analysed for the retest measurement without probe-refitting, the immediate retest measurement with probe-refitting, and retest measurements after one hour and one week. The highest reliability was obtained in the retest measurement without probe-refitting, and decreased with increasing time-interval between measurements. For TEOAEs, the lowest reliability was seen at half-octave frequency bands 1.0 and 1.4 kHz; whereas for DPOAEs half-octave frequency band 8.0 kHz had also poor reliability. Higher primary tone level combination for DPOAEs yielded to a better reliability of DPOAE amplitudes. External environmental noise seemed to be the dominating noise source in normal-hearing subjects, decreasing the reliability of emission amplitudes especially in the low-frequency region.
Reliability in perceptual analysis of voice quality.
Bele, Irene Velsvik
2005-12-01
This study focuses on speaking voice quality in male teachers (n = 35) and male actors (n = 36), who represent untrained and trained voice users, because we wanted to investigate normal and supranormal voices. In this study, both substantial and methodologic aspects were considered. It includes a method for perceptual voice evaluation, and a basic issue was rater reliability. A listening group of 10 listeners, 7 experienced speech-language therapists, and 3 speech-language therapist students evaluated the voices by 15 vocal characteristics using VA scales. Two sets of voice signals were investigated: text reading (2 loudness levels) and sustained vowel (3 levels). The results indicated a high interrater reliability for most perceptual characteristics. Connected speech was evaluated more reliably, especially at the normal level, but both types of voice signals were evaluated reliably, although the reliability for connected speech was somewhat higher than for vowels. Experienced listeners tended to be more consistent in their ratings than did the student raters. Some vocal characteristics achieved acceptable reliability even with a smaller panel of listeners. The perceptual characteristics grouped in 4 factors reflected perceptual dimensions.
Michels, Nele R M; Driessen, Erik W; Muijtjens, Arno M M; Van Gaal, Luc F; Bossaert, Leo L; De Winter, Benedicte Y
2009-12-01
A portfolio is used to mentor and assess students' clinical performance at the workplace. However, students and raters often perceive the portfolio as a time-consuming instrument. In this study, we investigated whether assessment during medical internship by a portfolio can combine reliability and feasibility. The domain-oriented reliability of 61 double-rated portfolios was measured, using a generalisability analysis with portfolio tasks and raters as sources of variation in measuring the performance of a student. We obtained reliability (Phi coefficient) of 0.87 with this internship portfolio containing 15 double-rated tasks. The generalisability analysis showed that an acceptable level of reliability (Phi = 0.80) was maintained when the amount of portfolio tasks was decreased to 13 or 9 using one and two raters, respectively. Our study shows that a portfolio can be a reliable method for the assessment of workplace learning. The possibility of reducing the amount of tasks or raters while maintaining a sufficient level of reliability suggests an increase in feasibility of portfolio use for both students and raters.
Reliability and validity of the McDonald Play Inventory.
McDonald, Ann E; Vigen, Cheryl
2012-01-01
This study examined the ability of a two-part self-report instrument, the McDonald Play Inventory, to reliably and validly measure the play activities and play styles of 7- to 11-yr-old children and to discriminate between the play of neurotypical children and children with known learning and developmental disabilities. A total of 124 children ages 7-11 recruited from a sample of convenience and a subsample of 17 parents participated in this study. Reliability estimates yielded moderate correlations for internal consistency, total test intercorrelations, and test-retest reliability. Validity estimates were established for content and construct validity. The results suggest that a self-report instrument yields reliable and valid measures of a child's perceived play performance and discriminates between the play of children with and without disabilities. Copyright © 2012 by the American Occupational Therapy Association, Inc.
Reliability modelling and analysis of thermal MEMS
NASA Astrophysics Data System (ADS)
Muratet, Sylvaine; Lavu, Srikanth; Fourniols, Jean-Yves; Bell, George; Desmulliez, Marc P. Y.
2006-04-01
This paper presents a MEMS reliability study methodology based on the novel concept of 'virtual prototyping'. This methodology can be used for the development of reliable sensors or actuators and also to characterize their behaviour in specific use conditions and applications. The methodology is demonstrated on the U-shaped micro electro thermal actuator used as test vehicle. To demonstrate this approach, a 'virtual prototype' has been developed with the modeling tools MatLab and VHDL-AMS. A best practice FMEA (Failure Mode and Effect Analysis) is applied on the thermal MEMS to investigate and assess the failure mechanisms. Reliability study is performed by injecting the identified defaults into the 'virtual prototype'. The reliability characterization methodology predicts the evolution of the behavior of these MEMS as a function of the number of cycles of operation and specific operational conditions.
A reliability analysis of the revised competitiveness index.
Harris, Paul B; Houston, John M
2010-06-01
This study examined the reliability of the Revised Competitiveness Index by investigating the test-retest reliability, interitem reliability, and factor structure of the measure based on a sample of 280 undergraduates (200 women, 80 men) ranging in age from 18 to 28 years (M = 20.1, SD = 2.1). The findings indicate that the Revised Competitiveness Index has high test-retest reliability, high inter-item reliability, and a stable factor structure. The results support the assertion that the Revised Competitiveness Index assesses competitiveness as a stable trait rather than a dynamic state.
COPES Report: System Reliability Study.
ERIC Educational Resources Information Center
Foothill-De Anza Community Coll. District, Los Altos Hills, CA.
The study examines the reliability of the Community College Occupational Programs Evaluation System (COPES). The COPES process is a system for evaluating program strengths and needs. A two-way test, college self-appraisal with third party validation of the self-appraisal, is utilized to assist community colleges in future institutional planning…
On the Scaling Behavior of Reliability-Resilience-Vulnerability Indices in Agricultural Watersheds
Risk indices such as reliability-resilience-vulnerability (R-R-V) have been proposed to assess watershed health. In this study, the spatial scaling behavior of R-R-V indices has been explored for five agricultural watersheds in the midwestern United States. The study was conduc...
Repeat interviews from 4,088 Iowa pesticide applicators participating in the Agricultural Health Study provided the opportunity to evaluate the reliability of self-reported information on pesticide use and various demographic and lifestyle factors. Self-completed questionnaire...
physical phenomena, PV package reliability, and outdoor PV performance. At NREL, he performs research in advanced concept PV modules. Dr. Silverman studies the performance and reliability of PV modules, including previously studied the degradation of solder joints in high-concentration PV and the outdoor performance of
Qualification Users' Perceptions and Experiences of Assessment Reliability
ERIC Educational Resources Information Center
Chamberlain, Suzanne
2013-01-01
This paper presents the findings of a study designed to explore qualification users' perceptions and experiences of reliability in the context of national assessment outcomes in England. The study consisted of 17 focus groups conducted across six sectors of qualification users: students, teachers, trainee teachers, job-seekers, employers and…
Wise Crowd Content Assessment and Educational Rubrics
ERIC Educational Resources Information Center
Passonneau, Rebecca J.; Poddar, Ananya; Gite, Gaurav; Krivokapic, Alisa; Yang, Qian; Perin, Dolores
2018-01-01
Development of reliable rubrics for educational intervention studies that address reading and writing skills is labor-intensive, and could benefit from an automated approach. We compare a main ideas rubric used in a successful writing intervention study to a highly reliable wise-crowd content assessment method developed to evaluate…
Monte Carlo Approach for Reliability Estimations in Generalizability Studies.
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
A Monte Carlo approach is proposed, using the Statistical Analysis System (SAS) programming language, for estimating reliability coefficients in generalizability theory studies. Test scores are generated by a probabilistic model that considers the probability for a person with a given ability score to answer an item with a given difficulty…
de Vries, Nienke M; Staal, J Bart; Olde Rikkert, Marcel G M; Nijhuis-van der Sanden, Maria W G
2013-04-01
Physical activity is assumed to be important in the prevention and treatment of frailty. It is unclear, however, to what extent frailty can be influenced because instruments designed to assess frailty have not been validated as evaluative outcome instruments in clinical practice. The aims of this study were: (1) to develop a frailty index (i.e., the evaluative frailty index for physical activity [EFIP]) based on the method of deficit accumulation and (2) to test the clinimetric properties of the EFIP. The content of the EFIP was determined using a written Delphi procedure. Intrarater reliability, interrater reliability, and construct validity were determined in an observational study (n=24). Intrarater reliability and interrater reliability were calculated using Cohen kappa and intraclass correlation coefficients (ICCs). Construct validity was determined by correlating the score on the EFIP with those on the timed "up & go" test (TUG), the performance-oriented mobility assessment (POMA), and the Cumulative Illness Rating Scale for Geriatrics (CIRS-G). Fifty items were included in the EFIP. Interrater reliability (Cohen kappa=0.72, ICC=.96) and intrarater reliability (Cohen kappa=0.77 and 0.80, ICC=.93 and .98) were good. As expected, a fair to moderate correlation with the TUG, POMA, and CIRS-G was found (.61, -.70, and .66, respectively). Reliability and validity of the EFIP have been tested in a small sample. These and other clinimetric properties, such as responsiveness, will be assessed or reassessed in a larger study population. The EFIP is a reliable and valid instrument to evaluate the effect of physical activity on frailty in research and in clinical practice.
Burke, Shane M; Hwang, Steven W; Mehan, William A; Bedi, Harprit S; Ogbuji, Richard; Riesenburger, Ron I
2016-07-01
Cross-specialty inter-rater reliability has not been explicitly reported for imaging characteristics that are thought to be important in lumbar intervertebral disc degeneration. Sufficient cross-specialty reliability is an essential consideration if radiographic stratification of symptomatic patients to specific treatment modalities is to ever be realized. Therefore the purpose of this study was to directly compare the assessment of such characteristics between neurosurgeons and neuroradiologists. Sixty consecutive patients with a diagnosis of lumbago and appropriate imaging were selected for inclusion. Lumbar MRI were evaluated using the Tufts Degenerative Disc Classification by two neurosurgeons and two neuroradiologists. Inter-rater reliability was assessed using Cohen's κ values both within and between specialties. A sensitivity analysis was performed for a modified grading system, which excluded high intensity zones (HIZ), due to poor cross-specialty inter-rater reliability of HIZ between specialties. The reliability of HIZ between neurosurgeons and neuroradiologists was fair in two of the four cross-specialty comparisons in this study (neurosurgeon 1 versus both radiologists κ=0.364 and κ=0.290). Removing HIZ from the classification improved inter-rater reliability for all comparisons within and between specialties (0.465⩽κ⩽0.576). In addition, intra-rater reliability remained in the moderate to substantial range (0.523⩽κ⩽0.649). Given our findings and corroboration with previous studies, identification of HIZ seems to have a markedly variable reliability. Thus we recommend modification of the original Tufts Degenerative Disc Classification by removing HIZ in order to make the overall grade provided by this classification more reproducible when scored by practitioners of different training backgrounds. Copyright © 2015 Elsevier Ltd. All rights reserved.
Kevern, Mark A.; Beecher, Michael; Rao, Smita
2014-01-01
Context: Athletes who participate in throwing and racket sports consistently demonstrate adaptive changes in glenohumeral-joint internal and external rotation in the dominant arm. Measurements of these motions have demonstrated excellent intrarater and poor interrater reliability. Objective: To determine intrarater reliability, interrater reliability, and standard error of measurement for shoulder internal rotation, external rotation, and total arc of motion using an inclinometer in 3 testing procedures in National Collegiate Athletic Association Division I baseball and softball athletes. Design: Cross-sectional study. Setting: Athletic department. Patients or Other Participants Thirty-eight players participated in the study. Shoulder internal rotation, external rotation, and total arc of motion were measured by 2 investigators in 3 test positions. The standard supine position was compared with a side-lying test position, as well as a supine test position without examiner overpressure. Results: Excellent intrarater reliability was noted for all 3 test positions and ranges of motion, with intraclass correlation coefficient values ranging from 0.93 to 0.99. Results for interrater reliability were less favorable. Reliability for internal rotation was highest in the side-lying position (0.68) and reliability for external rotation and total arc was highest in the supine-without-overpressure position (0.774 and 0.713, respectively). The supine-with-overpressure position yielded the lowest interrater reliability results in all positions. The side-lying position had the most consistent results, with very little variation among intraclass correlation coefficient values for the various test positions. Conclusions: The results of our study clearly indicate that the side-lying test procedure is of equal or greater value than the traditional supine-with-overpressure method. PMID:25188316
Validity, Reliability, and the Questionable Role of Psychometrics in Plastic Surgery
2014-01-01
Summary: This report examines the meaning of validity and reliability and the role of psychometrics in plastic surgery. Study titles increasingly include the word “valid” to support the authors’ claims. Studies by other investigators may be labeled “not validated.” Validity simply refers to the ability of a device to measure what it intends to measure. Validity is not an intrinsic test property. It is a relative term most credibly assigned by the independent user. Similarly, the word “reliable” is subject to interpretation. In psychometrics, its meaning is synonymous with “reproducible.” The definitions of valid and reliable are analogous to accuracy and precision. Reliability (both the reliability of the data and the consistency of measurements) is a prerequisite for validity. Outcome measures in plastic surgery are intended to be surveys, not tests. The role of psychometric modeling in plastic surgery is unclear, and this discipline introduces difficult jargon that can discourage investigators. Standard statistical tests suffice. The unambiguous term “reproducible” is preferred when discussing data consistency. Study design and methodology are essential considerations when assessing a study’s validity. PMID:25289354
Adigozali, Hakimeh; Shadmehr, Azadeh; Ebrahimi, Esmail; Rezasoltani, Asghar; Naderi, Farrokh
2017-01-01
In the present study, the intra-rater reliability of upper trapezius morphology, its mechanical properties and intramuscular blood circulation in females with myofascial pain syndrome were assessed using ultrasonography. A total of 37 patients (31.05 ± 10 years old) participated in this study. Ultrasonography producer was set up in three stages: a) Gray-scale: to measure muscle thickness, size and area of trigger points; b) Ultrasound elastography: to measure muscle stiffness; and c) Doppler imaging: to assess blood flow indices. According to data analysis, all variables, except End Diastolic Velocity (EDV), had excellent reliability (>0.806). Intra-class Correlation Coefficient (ICC) for EDV was 0.738, which was considered a poor to good reliability. The results of this study introduced a reliable method for developing details of upper trapezius features using muscular ultrasonography in female patients. These variables could be used for objective examination and provide guidelines for treatment plans in clinical settings. Copyright © 2016 Elsevier Ltd. All rights reserved.
A simulation model for risk assessment of turbine wheels
NASA Technical Reports Server (NTRS)
Safie, Fayssal M.; Hage, Richard T.
1991-01-01
A simulation model has been successfully developed to evaluate the risk of the Space Shuttle auxiliary power unit (APU) turbine wheels for a specific inspection policy. Besides being an effective tool for risk/reliability evaluation, the simulation model also allows the analyst to study the trade-offs between wheel reliability, wheel life, inspection interval, and rejection crack size. For example, in the APU application, sensitivity analysis results showed that the wheel life limit has the least effect on wheel reliability when compared to the effect of the inspection interval and the rejection crack size. In summary, the simulation model developed represents a flexible tool to predict turbine wheel reliability and study the risk under different inspection policies.
Validity and Reliability of Farsi Version of Youth Sport Environment Questionnaire
Eshghi, Mohammad Ali; Kordi, Ramin; Memari, Amir Hossein; Ghaziasgar, Ahmad; Mansournia, Mohammad-Ali; Zamani Sani, Seyed Hojjat
2015-01-01
The Youth Sport Environment Questionnaire (YSEQ) had been developed from Group Environment Questionnaire, a well-known measure of team cohesion. The aim of this study was to adapt and examine the reliability and validity of the Farsi version of the YSEQ. This version was completed by 455 athletes aged 13–17 years. Results of confirmatory factor analysis indicated that two-factor solution showed a good fit to the data. The results also revealed that the Farsi YSEQ showed high internal consistency, test-retest reliability, and good concurrent validity. This study indicated that the Farsi version of the YSEQ is a valid and reliable measure to assess team cohesion in sport setting. PMID:26464900
A simulation model for risk assessment of turbine wheels
NASA Astrophysics Data System (ADS)
Safie, Fayssal M.; Hage, Richard T.
A simulation model has been successfully developed to evaluate the risk of the Space Shuttle auxiliary power unit (APU) turbine wheels for a specific inspection policy. Besides being an effective tool for risk/reliability evaluation, the simulation model also allows the analyst to study the trade-offs between wheel reliability, wheel life, inspection interval, and rejection crack size. For example, in the APU application, sensitivity analysis results showed that the wheel life limit has the least effect on wheel reliability when compared to the effect of the inspection interval and the rejection crack size. In summary, the simulation model developed represents a flexible tool to predict turbine wheel reliability and study the risk under different inspection policies.
Intra- and Interobserver Reliability of Three Classification Systems for Hallux Rigidus.
Dillard, Sarita; Schilero, Christina; Chiang, Sharon; Pham, Peter
2018-04-18
There are over ten classification systems currently used in the staging of hallux rigidus. This results in confusion and inconsistency with radiographic interpretation and treatment. The reliability of hallux rigidus classification systems has not yet been tested. The purpose of this study was to evaluate intra- and interobserver reliability using three commonly used classifications for hallux rigidus. Twenty-one plain radiograph sets were presented to ten ACFAS board-certified foot and ankle surgeons. Each physician classified each radiograph based on clinical experience and knowledge according to the Regnauld, Roukis, and Hattrup and Johnson classification systems. The two-way mixed single-measure consistency intraclass correlation was used to calculate intra- and interrater reliability. The intrarater reliability of individual sets for the Roukis and Hattrup and Johnson classification systems was "fair to good" (Roukis, 0.62±0.19; Hattrup and Johnson, 0.62±0.28), whereas the intrarater reliability of individual sets for the Regnauld system bordered between "fair to good" and "poor" (0.43±0.24). The interrater reliability of the mean classification was "excellent" for all three classification systems. Conclusions Reliable and reproducible classification systems are essential for treatment and prognostic implications in hallux rigidus. In our study, Roukis classification system had the best intrarater reliability. Although there are various classification systems for hallux rigidus, our results indicate that all three of these classification systems show reliability and reproducibility.
Balaguier, Romain; Madeleine, Pascal; Vuillerme, Nicolas
2016-01-01
The assessment of pressure pain threshold (PPT) provides a quantitative value related to the mechanical sensitivity to pain of deep structures. Although excellent reliability of PPT has been reported in numerous anatomical locations, its absolute and relative reliability in the lower back region remains to be determined. Because of the high prevalence of low back pain in the general population and because low back pain is one of the leading causes of disability in industrialized countries, assessing pressure pain thresholds over the low back is particularly of interest. The purpose of this study study was (1) to evaluate the intra- and inter- absolute and relative reliability of PPT within 14 locations covering the low back region of asymptomatic individuals and (2) to determine the number of trial required to ensure reliable PPT measurements. Fifteen asymptomatic subjects were included in this study. PPTs were assessed among 14 anatomical locations in the low back region over two sessions separated by one hour interval. For the two sessions, three PPT assessments were performed on each location. Reliability was assessed computing intraclass correlation coefficients (ICC), standard error of measurement (SEM) and minimum detectable change (MDC) for all possible combinations between trials and sessions. Bland-Altman plots were also generated to assess potential bias in the dataset. Relative reliability for both intra- and inter- session was almost perfect with ICC ranged from 0.85 to 0.99. With respect to the intra-session, no statistical difference was reported for ICCs and SEM regardless of the conducted comparisons between trials. Conversely, for inter-session, ICCs and SEM values were significantly larger when two consecutive PPT measurements were used for data analysis. No significant difference was observed for the comparison between two consecutive measurements and three measurements. Excellent relative and absolute reliabilities were reported for both intra- and inter-session. Reliable measurements can be equally achieved when using the mean of two or three consecutive PPT measurements, as usually proposed in the literature, or with only the first one. Although reliability was almost perfect regardless of the conducted comparison between PPT assessments, our results suggest using two consecutive measurements to obtain higher short term absolute reliability.
NDE reliability and probability of detection (POD) evolution and paradigm shift
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Surendra
2014-02-18
The subject of NDE Reliability and POD has gone through multiple phases since its humble beginning in the late 1960s. This was followed by several programs including the important one nicknamed “Have Cracks – Will Travel” or in short “Have Cracks” by Lockheed Georgia Company for US Air Force during 1974–1978. This and other studies ultimately led to a series of developments in the field of reliability and POD starting from the introduction of fracture mechanics and Damaged Tolerant Design (DTD) to statistical framework by Bernes and Hovey in 1981 for POD estimation to MIL-STD HDBK 1823 (1999) and 1823Amore » (2009). During the last decade, various groups and researchers have further studied the reliability and POD using Model Assisted POD (MAPOD), Simulation Assisted POD (SAPOD), and applying Bayesian Statistics. All and each of these developments had one objective, i.e., improving accuracy of life prediction in components that to a large extent depends on the reliability and capability of NDE methods. Therefore, it is essential to have a reliable detection and sizing of large flaws in components. Currently, POD is used for studying reliability and capability of NDE methods, though POD data offers no absolute truth regarding NDE reliability, i.e., system capability, effects of flaw morphology, and quantifying the human factors. Furthermore, reliability and POD have been reported alike in meaning but POD is not NDE reliability. POD is a subset of the reliability that consists of six phases: 1) samples selection using DOE, 2) NDE equipment setup and calibration, 3) System Measurement Evaluation (SME) including Gage Repeatability and Reproducibility (Gage R and R) and Analysis Of Variance (ANOVA), 4) NDE system capability and electronic and physical saturation, 5) acquiring and fitting data to a model, and data analysis, and 6) POD estimation. This paper provides an overview of all major POD milestones for the last several decades and discuss rationale for using Integrated Computational Materials Engineering (ICME), MAPOD, SAPOD, and Bayesian statistics for studying controllable and non-controllable variables including human factors for estimating POD. Another objective is to list gaps between “hoped for” versus validated or fielded failed hardware.« less
Myer, Gregory D; Wordeman, Samuel C; Sugimoto, Dai; Bates, Nathaniel A; Roewer, Benjamin D; Medina McKeon, Jennifer M; DiCesare, Christopher A; Di Stasi, Stephanie L; Barber Foss, Kim D; Thomas, Staci M; Hewett, Timothy E
2014-05-01
Multi-center collaborations provide a powerful alternative to overcome the inherent limitations to single-center investigations. Specifically, multi-center projects can support large-scale prospective, longitudinal studies that investigate relatively uncommon outcomes, such as anterior cruciate ligament injury. This project was conceived to assess within- and between-center reliability of an affordable, clinical nomogram utilizing two-dimensional video methods to screen for risk of knee injury. The authors hypothesized that the two-dimensional screening methods would provide good-to-excellent reliability within and between institutions for assessment of frontal and sagittal plane biomechanics. Nineteen female, high school athletes participated. Two-dimensional video kinematics of the lower extremity during a drop vertical jump task were collected on all 19 study participants at each of the three facilities. Within-center and between-center reliability were assessed with intra- and inter-class correlation coefficients. Within-center reliability of the clinical nomogram variables was consistently excellent, but between-center reliability was fair-to-good. Within-center intra-class correlation coefficient for all nomogram variables combined was 0.98, while combined between-center inter-class correlation coefficient was 0.63. Injury risk screening protocols were reliable within and repeatable between centers. These results demonstrate the feasibility of multi-site biomechanical studies and establish a framework for further dissemination of injury risk screening algorithms. Specifically, multi-center studies may allow for further validation and optimization of two-dimensional video screening tools. 2b.
Reliability and Agreement in Student Ratings of the Class Environment
ERIC Educational Resources Information Center
Nelson, Peter M.; Christ, Theodore J.
2016-01-01
The current study estimated the reliability and agreement of student ratings of the classroom environment obtained using the Responsive Environmental Assessment for Classroom Teaching (REACT; Christ, Nelson, & Demers, 2012; Nelson, Demers, & Christ, 2014). Coefficient alpha, class-level reliability, and class agreement indices were…
Test Assembly Implications for Providing Reliable and Valid Subscores
ERIC Educational Resources Information Center
Lee, Minji K.; Sweeney, Kevin; Melican, Gerald J.
2017-01-01
This study investigates the relationships among factor correlations, inter-item correlations, and the reliability estimates of subscores, providing a guideline with respect to psychometric properties of useful subscores. In addition, it compares subscore estimation methods with respect to reliability and distinctness. The subscore estimation…
Effect of knee angle on neuromuscular assessment of plantar flexor muscles: A reliability study
Cornu, Christophe; Jubeau, Marc
2018-01-01
Introduction This study aimed to determine the intra- and inter-session reliability of neuromuscular assessment of plantar flexor (PF) muscles at three knee angles. Methods Twelve young adults were tested for three knee angles (90°, 30° and 0°) and at three time points separated by 1 hour (intra-session) and 7 days (inter-session). Electrical (H reflex, M wave) and mechanical (evoked and maximal voluntary torque, activation level) parameters were measured on the PF muscles. Intraclass correlation coefficients (ICC) and coefficients of variation were calculated to determine intra- and inter-session reliability. Results The mechanical measurements presented excellent (ICC>0.75) intra- and inter-session reliabilities regardless of the knee angle considered. The reliability of electrical measurements was better for the 90° knee angle compared to the 0° and 30° angles. Conclusions Changes in the knee angle may influence the reliability of neuromuscular assessments, which indicates the importance of considering the knee angle to collect consistent outcomes on the PF muscles. PMID:29596480
Reliability and Validity of the Dyadic Observed Communication Scale (DOCS).
Hadley, Wendy; Stewart, Angela; Hunter, Heather L; Affleck, Katelyn; Donenberg, Geri; Diclemente, Ralph; Brown, Larry K
2013-02-01
We evaluated the reliability and validity of the Dyadic Observed Communication Scale (DOCS) coding scheme, which was developed to capture a range of communication components between parents and adolescents. Adolescents and their caregivers were recruited from mental health facilities for participation in a large, multi-site family-based HIV prevention intervention study. Seventy-one dyads were randomly selected from the larger study sample and coded using the DOCS at baseline. Preliminary validity and reliability of the DOCS was examined using various methods, such as comparing results to self-report measures and examining interrater reliability. Results suggest that the DOCS is a reliable and valid measure of observed communication among parent-adolescent dyads that captures both verbal and nonverbal communication behaviors that are typical intervention targets. The DOCS is a viable coding scheme for use by researchers and clinicians examining parent-adolescent communication. Coders can be trained to reliably capture individual and dyadic components of communication for parents and adolescents and this complex information can be obtained relatively quickly.
Between-day reliability of the trapezius muscle H-reflex and M-wave.
Vangsgaard, Steffen; Hansen, Ernst A; Madeleine, Pascal
2015-12-01
The aim of this study was to investigate the between-day reliability of the trapezius muscle H-reflex and M-wave. Sixteen healthy subjects were studied on 2 consecutive days. Trapezius muscle H-reflexes were evoked by electrical stimulation of the C3/4 cervical nerves; M-waves were evoked by electrical stimulation of the accessory nerve. Relative reliability was estimated by intraclass correlation coefficients (ICC2,1 ). Absolute reliability was estimated by computing the standard error of measurement (SEM) and the smallest real difference (SRD). Bland-Altman plots were constructed to detect any systematic bias. Variables showed substantial to excellent relative reliability (ICC = 0.70-0.99). The relative SEM ranged from 1.4% to 34.8%; relative SRD ranged from 3.8% to 96.5%. No systematic bias was present in the data. The amplitude and latency of the trapezius muscle H-reflex and M-wave in healthy young subjects can be measured reliably across days. © 2015 Wiley Periodicals, Inc.
Inter-rater and intra-rater reliability of the Bahasa Melayu version of Rose Angina Questionnaire.
Hassan, N B; Choudhury, S R; Naing, L; Conroy, R M; Rahman, A R A
2007-01-01
The objective of the study is to translate the Rose Questionnaire (RQ) into a Bahasa Melayu version and adapt it cross-culturally, and to measure its inter-rater and intrarater reliability. This cross sectional study was conducted in the respondents' homes or workplaces in Kelantan, Malaysia. One hundred respondents aged 30 and above with different socio-demographic status were interviewed for face validity. For each inter-rater and intra-rater reliability, a sample of 150 respondents was interviewed. Inter-rater and intra-rater reliabilities were assessed by Cohen's kappa. The overall inter-rater agreements by the five pair of interviewers at point one and two were 0.86, and intrarater reliability by the five interviewers on the seven-item questionnaire at poinone and two was 0.88, as measured by kappa coefficient. The translated Malay version of RQ demonstrated an almost perfect inter-rater and intra-rater reliability and further validation such as sensitivity and specificity analysis of this translated questionnaire is highly recommended.
Proposed reliability cost model
NASA Technical Reports Server (NTRS)
Delionback, L. M.
1973-01-01
The research investigations which were involved in the study include: cost analysis/allocation, reliability and product assurance, forecasting methodology, systems analysis, and model-building. This is a classic example of an interdisciplinary problem, since the model-building requirements include the need for understanding and communication between technical disciplines on one hand, and the financial/accounting skill categories on the other. The systems approach is utilized within this context to establish a clearer and more objective relationship between reliability assurance and the subcategories (or subelements) that provide, or reenforce, the reliability assurance for a system. Subcategories are further subdivided as illustrated by a tree diagram. The reliability assurance elements can be seen to be potential alternative strategies, or approaches, depending on the specific goals/objectives of the trade studies. The scope was limited to the establishment of a proposed reliability cost-model format. The model format/approach is dependent upon the use of a series of subsystem-oriented CER's and sometimes possible CTR's, in devising a suitable cost-effective policy.
Holm, Søren; Hofmann, Bjørn
2017-10-01
A precondition for reducing scientific misconduct is evidence about scientists' attitudes. We need reliable survey instruments, and this study investigates the reliability of Kalichman's "Survey 2: research misconduct" questionnaire. The study is a post hoc analysis of data from three surveys among biomedical doctoral students in Scandinavia (2010-2015). We perform reliability analysis, and exploratory and confirmatory factor analysis using a split-sample design as a partial validation. The results indicate that a reliable 13-item scale can be formed (Cronbach's α = .705), and factor analysis indicates that there are four reliable subscales each tapping a different construct: (a) general attitude to misconduct (α = .768), (b) attitude to personal misconduct (α = .784), (c) attitude to whistleblowing (α = .841), and (d) attitude to blameworthiness/punishment (α = .877). A full validation of the questionnaire requires further research. We, nevertheless, hope that the results will facilitate the increased use of the questionnaire in research.
NASA Astrophysics Data System (ADS)
Chaitusaney, Surachai; Yokoyama, Akihiko
In distribution system, Distributed Generation (DG) is expected to improve the system reliability as its backup generation. However, DG contribution in fault current may cause the loss of the existing protection coordination, e.g. recloser-fuse coordination and breaker-breaker coordination. This problem can drastically deteriorate the system reliability, and it is more serious and complicated when there are several DG sources in the system. Hence, the above conflict in reliability aspect unavoidably needs a detailed investigation before the installation or enhancement of DG is done. The model of composite DG fault current is proposed to find the threshold beyond which existing protection coordination is lost. Cases of protection miscoordination are described, together with their consequences. Since a distribution system may be tied with another system, the issues of tie line and on-site DG are integrated into this study. Reliability indices are evaluated and compared in the distribution reliability test system RBTS Bus 2.
Reliability and validity of the workplace social distance scale.
Yoshii, Hatsumi; Mandai, Nozomu; Saito, Hidemitsu; Akazawa, Kouhei
2014-10-29
Self-stigma, defined by a negative attitude toward oneself combined with the consciousness of being a target of prejudice, is a critical problem for psychiatric patients. Self-stigma studies among psychiatric patients have indicated that high stigma is predictive of detrimental effects such as the delay of treatment and decreases in social participation in patients, and levels of self-stigma should be statistically evaluated. In this study, we developed the Workplace Social Distance Scale (WSDS), rephrasing the eight items of the Japanese version of the Social Distance Scale (SDSJ) to apply to the work setting in Japan. We examined the reliability and validity of the WSDS among 83 psychiatric patients. Factor analysis extracted three factors from the scale items: "work relations," "shallow relationships," and "employment." These factors are similar to the assessment factors of the SDSJ. Cronbach's alpha coefficient for the WSDS was 0.753. The split-half reliability for the WSDS was 0.801, indicating significant correlations. In addition, the WSDS was significantly correlated with the SDSJ. These findings suggest that the WSDS represents an approximation of self-stigma in the workplace among psychiatric patients. Our study assessed the reliability and validity of the WSDS for measuring self-stigma in Japan. Future studies should investigate the reliability and validity of the scale in other countries.
Kurland, Jacquie; Naeser, Margaret A.; Baker, Errol H.; Doron, Karl; Martin, Paula I.; Seekins, Heidi E.; Bogdan, Andrew; Renshaw, Perry; Yurgelun-Todd, Deborah
2005-01-01
Cortical reorganization in poststroke aphasia is not well understood. Few studies have investigated neural mechanisms underlying language recovery in severe aphasia patients, who are typically viewed as having a poor prognosis for language recovery. Although test-retest reliability is routinely demonstrated during collection of language data in single-subject aphasia research, this is rarely examined in fMRI studies investigating the underlying neural mechanisms in aphasia recovery. The purpose of this study was to acquire fMRI test-retest data examining semantic decisions both within and between two aphasia patients. Functional MRI was utilized to image individuals with chronic, moderate-severe nonfluent aphasia during nonverbal, yes/no button-box semantic judgments of iconic sentences presented in the Computer-assisted Visual Communication (C-ViC) program. We investigated the critical issue of intra-subject reliability by exploring similarities and differences in regions of activation during participants’ performance of identical tasks twice on the same day. Each participant demonstrated high intra-subject reliability, with response decrements typical of task familiarity. Differences between participants included greater left hemisphere perilesional activation in the individual with better response to C-ViC training. This study provides fMRI reliability in chronic nonfluent aphasia, and adds to evidence supporting differences in individual cortical reorganization in aphasia recovery. PMID:15706052
[Turkish validity and reliability study of fear of pain questionnaire-III].
Ünver, Seher; Turan, Fatma Nesrin
2018-01-01
This study aimed to develop a Turkish version of the Fear of Pain Questionnaire-III developed by McNeil and Rainwater (1998) and examine its validity and reliability indicators. The study was conducted with 459 university students studying in the nursing department. The Turkish translation of the scale was conducted by language experts and the original scale owner. Expert opinions were taken for language validity, and the Lawshe's content validity ratio formula was used to calculate the content validity. Exploratory factor analysis was used to assess the construct validity. The factors were rotated using the Varimax rotation (orthogonal) method. For reliability indicators of the questionnaire, the internal consistency coefficient and test re-test reliability were utilized. Explanatory factor analyses using the three-factor model (explaining 50.5% of the total variance) revealed that the item factor loads varied were above the limit value of 0.30 which indicated that the questionnaire had good construct validity. The Cronbach's alpha value for the total questionnaire was 0.938, and test re-test value was 0.846 for the total scale. The Turkish version of the Fear of Pain Questionnaire-III had sufficiently high reliability and validity to be used as a tool in evaluating the fear of pain among the young Turkish population.
Reliability study of high-brightness multiple single emitter diode lasers
NASA Astrophysics Data System (ADS)
Zhu, Jing; Yang, Thomas; Zhang, Cuipeng; Lang, Chao; Jiang, Xiaochen; Liu, Rui; Gao, Yanyan; Guo, Weirong; Jiang, Yuhua; Liu, Yang; Zhang, Luyan; Chen, Louisa
2015-03-01
In this study the chip bonding processes for various chips from various chip suppliers around the world have been optimized to achieve reliable chip on sub-mount for high performance. These chip on sub-mounts, for examples, includes three types of bonding, 8xx nm-1.2W/10.0W Indium bonded lasers, 9xx nm 10W-20W AuSn bonded lasers and 1470 nm 6W Indium bonded lasers will be reported below. The MTTF@25 of 9xx nm chip on sub-mount (COS) is calculated to be more than 203,896 hours. These chips from various chip suppliers are packaged into many multiple single emitter laser modules, using similar packaging techniques from 2 emitters per module to up to 7 emitters per module. A reliability study including aging test is performed on those multiple single emitter laser modules. With research team's 12 years' experienced packaging design and techniques, precise optical and fiber alignment processes and superior chip bonding capability, we have achieved a total MTTF exceeding 177,710 hours of life time with 60% confidence level for those multiple single emitter laser modules. Furthermore, a separated reliability study on wavelength stabilized laser modules have shown this wavelength stabilized module packaging process is reliable as well.
Golden, Sherita Hill; Sánchez, Brisa N.; DeSantis, Amy S.; Wu, Meihua; Castro, Cecilia; Seeman, Teresa E.; Tadros, Sameh; Shrager, Sandi; Diez Roux, Ana V.
2014-01-01
Collection of salivary cortisol has become increasingly popular in large population-based studies. However, the impact of protocol compliance on day-to-day reliabilities of measures, and the extent to which reliabilities differ systematically according to socio-demographic characteristics, has not been well characterized in large-scale population-based studies to date. Using data on 935 men and women from the Multi-ethnic Study of Atherosclerosis, we investigated whether sampling protocol compliance differs systematically according to socio-demographic factors and whether compliance was associated with cortisol estimates, as well as whether associations of cortisol with both compliance and socio-demographic characteristics were robust to adjustments for one another. We further assessed the day-to-day reliability for cortisol features and the extent to which reliabilities vary according to socio-demographic factors and sampling protocol compliance. Overall, we found higher compliance among persons with higher levels of income and education. Lower compliance was significantly associated with a less pronounced cortisol awakening response (CAR) but was not associated with any other cortisol features, and adjustment for compliance did not affect associations of socio-demographic characteristics with cortisol. Reliability was higher for area under the curve (AUC) and wake up values than for other features, but generally did not vary according to socio-demographic characteristics, with few exceptions. Our findings regarding intra-class correlation coefficients (ICCs) support prior research indicating that multiple day collection is preferable to single day collection, particularly for CAR and slopes, more so than wakeup and AUC. There were few differences in reliability by socio-demographic characteristics. Thus, it is unlikely that group-specific sampling protocols are warranted. PMID:24703168
Singh, Amika S; Chinapaw, Mai J M; Uijtdewilligen, Léonie; Vik, Froydis N; van Lippevelde, Wendy; Fernández-Alvira, Juan M; Stomfai, Sarolta; Manios, Yannis; van der Sluijs, Maria; Terwee, Caroline; Brug, Johannes
2012-08-13
Insight in parental energy balance-related behaviours, their determinants and parenting practices are important to inform childhood obesity prevention. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. The objective of the current study was to examine the test-retest reliability and construct validity of the parent questionnaire used in the ENERGY-project, assessing parental energy balance-related behaviours, their determinants, and parenting practices among parents of 10-12 year old children. We collected data among parents (n = 316 in the test-retest reliability study; n = 109 in the construct validity study) of 10-12 year-old children in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent interview was assessed using ICC and percentage agreement.All but one item showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Construct validity appeared to be good to excellent for 92 out of 121 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 29 items, construct validity was moderate for 24 and poor for 5 items. The reliability and construct validity of the items of the ENERGY-parent questionnaire on multiple energy balance-related behaviours, their potential determinants, and parenting practices appears to be good. Based on the results of the validity study, we strongly recommend adapting parts of the ENERGY-parent questionnaire if used in future research.
Ryman, Tove K; Boyer, Bert B; Hopkins, Scarlett; Philip, Jacques; O'Brien, Diane; Thummel, Kenneth; Austin, Melissa A
2015-02-28
FFQ data can be used to characterise dietary patterns for diet-disease association studies. In the present study, we evaluated three previously defined dietary patterns--'subsistence foods', market-based 'processed foods' and 'fruits and vegetables'--among a sample of Yup'ik people from Southwest Alaska. We tested the reproducibility and reliability of the dietary patterns, as well as the associations of these patterns with dietary biomarkers and participant characteristics. We analysed data from adult study participants who completed at least one FFQ with the Center for Alaska Native Health Research 9/2009-5/2013. To test the reproducibility of the dietary patterns, we conducted a confirmatory factor analysis (CFA) of a hypothesised model using eighteen food items to measure the dietary patterns (n 272). To test the reliability of the dietary patterns, we used the CFA to measure composite reliability (n 272) and intra-class correlation coefficients for test-retest reliability (n 113). Finally, to test the associations, we used linear regression (n 637). All factor loadings, except one, in CFA indicated acceptable correlations between foods and dietary patterns (r>0·40), and model-fit criteria were >0·90. Composite and test-retest reliability of the dietary patterns were, respectively, 0·56 and 0·34 for 'subsistence foods', 0·73 and 0·66 for 'processed foods', and 0·72 and 0·54 for 'fruits and vegetables'. In the multi-predictor analysis, the dietary patterns were significantly associated with dietary biomarkers, community location, age, sex and self-reported lifestyle. This analysis confirmed the reproducibility and reliability of the dietary patterns in the present study population. These dietary patterns can be used for future research and development of dietary interventions in this underserved population.
Reliability of Phase Velocity Measurements of Flexural Acoustic Waves in the Human Tibia In-Vivo.
Vogl, Florian; Schnüriger, Karin; Gerber, Hans; Taylor, William R
2016-01-01
Axial-transmission acoustics have shown to be a promising technique to measure individual bone properties and detect bone pathologies. With the ultimate goal being the in-vivo application of such systems, quantification of the key aspects governing the reliability is crucial to bring this method towards clinical use. This work presents a systematic reliability study quantifying the sources of variability and their magnitudes of in-vivo measurements using axial-transmission acoustics. 42 healthy subjects were measured by an experienced operator twice per week, over a four-month period, resulting in over 150000 wave measurements. In a complementary study to assess the influence of different operators performing the measurements, 10 novice operators were trained, and each measured 5 subjects on a single occasion, using the same measurement protocol as in the first part of the study. The estimated standard error for the measurement protocol used to collect the study data was ∼ 17 m/s (∼ 4% of the grand mean) and the index of dependability, as a measure of reliability, was Φ = 0.81. It was shown that the method is suitable for multi-operator use and that the reliability can be improved efficiently by additional measurements with device repositioning, while additional measurements without repositioning cannot improve the reliability substantially. Phase velocity values were found to be significantly higher in males than in females (p < 10-5) and an intra-class correlation coefficient of r = 0.70 was found between the legs of each subject. The high reliability of this non-invasive approach and its intrinsic sensitivity to mechanical properties opens perspectives for the rapid and inexpensive clinical assessment of bone pathologies, as well as for monitoring programmes without any radiation exposure for the patient.
Bessette, Katie L; Jenkins, Lisanne M; Skerrett, Kristy A; Gowins, Jennifer R; DelDonno, Sophie R; Zubieta, Jon-Kar; McInnis, Melvin G; Jacobs, Rachel H; Ajilore, Olusola; Langenecker, Scott A
2018-01-01
There is substantial variability across studies of default mode network (DMN) connectivity in major depressive disorder, and reliability and time-invariance are not reported. This study evaluates whether DMN dysconnectivity in remitted depression (rMDD) is reliable over time and symptom-independent, and explores convergent relationships with cognitive features of depression. A longitudinal study was conducted with 82 young adults free of psychotropic medications (47 rMDD, 35 healthy controls) who completed clinical structured interviews, neuropsychological assessments, and 2 resting-state fMRI scans across 2 study sites. Functional connectivity analyses from bilateral posterior cingulate and anterior hippocampal formation seeds in DMN were conducted at both time points within a repeated-measures analysis of variance to compare groups and evaluate reliability of group-level connectivity findings. Eleven hyper- (from posterior cingulate) and 6 hypo- (from hippocampal formation) connectivity clusters in rMDD were obtained with moderate to adequate reliability in all but one cluster (ICC's range = 0.50 to 0.76 for 16 of 17). The significant clusters were reduced with a principle component analysis (5 components obtained) to explore these connectivity components, and were then correlated with cognitive features (rumination, cognitive control, learning and memory, and explicit emotion identification). At the exploratory level, for convergent validity, components consisting of posterior cingulate with cognitive control network hyperconnectivity in rMDD were related to cognitive control (inverse) and rumination (positive). Components consisting of anterior hippocampal formation with social emotional network and DMN hypoconnectivity were related to memory (inverse) and happy emotion identification (positive). Thus, time-invariant DMN connectivity differences exist early in the lifespan course of depression and are reliable. The nuanced results suggest a ventral within-network hypoconnectivity associated with poor memory and a dorsal cross-network hyperconnectivity linked to poorer cognitive control and elevated rumination. Study of early course remitted depression with attention to reliability and symptom independence could lead to more readily translatable clinical assessment tools for biomarkers.
Ramírez-Vélez, Robinson; Rodrigues-Bezerra, Diogo; Correa-Bautista, Jorge Enrique; Izquierdo, Mikel; Lobelo, Felipe
2015-01-01
Substantial evidence indicates that youth physical fitness levels are an important marker of lifestyle and cardio-metabolic health profiles and predict future risk of chronic diseases. The reliability physical fitness tests have not been explored in Latino-American youth population. This study's aim was to examine the reliability of health-related physical fitness tests that were used in the Colombian health promotion "Fuprecol study". Participants were 229 Colombian youth (boys n = 124 and girls n = 105) aged 9 to 17.9 years old. Five components of health-related physical fitness were measured: 1) morphological component: height, weight, body mass index (BMI), waist circumference, triceps skinfold, subscapular skinfold, and body fat (%) via impedance; 2) musculoskeletal component: handgrip and standing long jump test; 3) motor component: speed/agility test (4x10 m shuttle run); 4) flexibility component (hamstring and lumbar extensibility, sit-and-reach test); 5) cardiorespiratory component: 20-meter shuttle-run test (SRT) to estimate maximal oxygen consumption. The tests were performed two times, 1 week apart on the same day of the week, except for the SRT which was performed only once. Intra-observer technical errors of measurement (TEMs) and inter-rater (reliability) were assessed in the morphological component. Reliability for the Musculoskeletal, motor and cardiorespiratory fitness components was examined using Bland-Altman tests. For the morphological component, TEMs were small and reliability was greater than 95% of all cases. For the musculoskeletal, motor, flexibility and cardiorespiratory components, we found adequate reliability patterns in terms of systematic errors (bias) and random error (95% limits of agreement). When the fitness assessments were performed twice, the systematic error was nearly 0 for all tests, except for the sit and reach (mean difference: -1.03% [95% CI = -4.35% to -2.28%]. The results from this study indicate that the "Fuprecol study" health-related physical fitness battery, administered by physical education teachers, was reliable for measuring health-related components of fitness in children and adolescents aged 9-17.9 years old in a school setting in Colombia.
Reliability Issues and Solutions in Flexible Electronics Under Mechanical Fatigue
NASA Astrophysics Data System (ADS)
Yi, Seol-Min; Choi, In-Suk; Kim, Byoung-Joon; Joo, Young-Chang
2018-07-01
Flexible devices are of significant interest due to their potential expansion of the application of smart devices into various fields, such as energy harvesting, biological applications and consumer electronics. Due to the mechanically dynamic operations of flexible electronics, their mechanical reliability must be thoroughly investigated to understand their failure mechanisms and lifetimes. Reliability issue caused by bending fatigue, one of the typical operational limitations of flexible electronics, has been studied using various test methodologies; however, electromechanical evaluations which are essential to assess the reliability of electronic devices for flexible applications had not been investigated because the testing method was not established. By employing the in situ bending fatigue test, we has studied the failure mechanism for various conditions and parameters, such as bending strain, fatigue area, film thickness, and lateral dimensions. Moreover, various methods for improving the bending reliability have been developed based on the failure mechanism. Nanostructures such as holes, pores, wires and composites of nanoparticles and nanotubes have been suggested for better reliability. Flexible devices were also investigated to find the potential failures initiated by complex structures under bending fatigue strain. In this review, the recent advances in test methodology, mechanism studies, and practical applications are introduced. Additionally, perspectives including the future advance to stretchable electronics are discussed based on the current achievements in research.
Li, Qi; Yu, Hongtao; Wu, Yan; Gao, Ning
2016-08-26
The integration of multiple sensory inputs is essential for perception of the external world. The spatial factor is a fundamental property of multisensory audiovisual integration. Previous studies of the spatial constraints on bimodal audiovisual integration have mainly focused on the spatial congruity of audiovisual information. However, the effect of spatial reliability within audiovisual information on bimodal audiovisual integration remains unclear. In this study, we used event-related potentials (ERPs) to examine the effect of spatial reliability of task-irrelevant sounds on audiovisual integration. Three relevant ERP components emerged: the first at 140-200ms over a wide central area, the second at 280-320ms over the fronto-central area, and a third at 380-440ms over the parieto-occipital area. Our results demonstrate that ERP amplitudes elicited by audiovisual stimuli with reliable spatial relationships are larger than those elicited by stimuli with inconsistent spatial relationships. In addition, we hypothesized that spatial reliability within an audiovisual stimulus enhances feedback projections to the primary visual cortex from multisensory integration regions. Overall, our findings suggest that the spatial linking of visual and auditory information depends on spatial reliability within an audiovisual stimulus and occurs at a relatively late stage of processing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Dental hygiene faculty calibration in the evaluation of calculus detection.
Garland, Kandis V; Newell, Kathleen J
2009-03-01
The purpose of this pilot study was to explore the impact of faculty calibration training on intra- and interrater reliability regarding calculus detection. After IRB approval, twelve dental hygiene faculty members were recruited from a pool of twenty-two for voluntary participation and randomized into two groups. All subjects provided two pre- and two posttest scorings of calculus deposits on each of three typodonts by recording yes or no indicating if they detected calculus. Accuracy and consistency of calculus detection were evaluated using an answer key. The experimental group received three two-hour training sessions to practice a prescribed exploring sequence and technique for calculus detection. Participants immediately corrected their answers, received feedback from the trainer, and reconciled missed areas. Intra- and interrater reliability (pre- and posttest) was determined using Cohen's Kappa and compared between groups using repeated measures (split-plot) ANOVA. The groups did not differ from pre- to posttraining (intrarater reliability p=0.64; interrater reliability p=0.20). Training had no effect on reliability levels for simulated calculus detection in this study. Recommendations for future studies of faculty calibration when evaluating students include using patients for assessing rater reliability, employing larger samples at multiple sites, and assessing the impact on students' attitudes and learning outcomes.
NASA Astrophysics Data System (ADS)
Franck, Bas A. M.; Dreschler, Wouter A.; Lyzenga, Johannes
2004-12-01
In this study we investigated the reliability and convergence characteristics of an adaptive multidirectional pattern search procedure, relative to a nonadaptive multidirectional pattern search procedure. The procedure was designed to optimize three speech-processing strategies. These comprise noise reduction, spectral enhancement, and spectral lift. The search is based on a paired-comparison paradigm, in which subjects evaluated the listening comfort of speech-in-noise fragments. The procedural and nonprocedural factors that influence the reliability and convergence of the procedure are studied using various test conditions. The test conditions combine different tests, initial settings, background noise types, and step size configurations. Seven normal hearing subjects participated in this study. The results indicate that the reliability of the optimization strategy may benefit from the use of an adaptive step size. Decreasing the step size increases accuracy, while increasing the step size can be beneficial to create clear perceptual differences in the comparisons. The reliability also depends on starting point, stop criterion, step size constraints, background noise, algorithms used, as well as the presence of drifting cues and suboptimal settings. There appears to be a trade-off between reliability and convergence, i.e., when the step size is enlarged the reliability improves, but the convergence deteriorates. .
Reliability Issues and Solutions in Flexible Electronics Under Mechanical Fatigue
NASA Astrophysics Data System (ADS)
Yi, Seol-Min; Choi, In-Suk; Kim, Byoung-Joon; Joo, Young-Chang
2018-03-01
Flexible devices are of significant interest due to their potential expansion of the application of smart devices into various fields, such as energy harvesting, biological applications and consumer electronics. Due to the mechanically dynamic operations of flexible electronics, their mechanical reliability must be thoroughly investigated to understand their failure mechanisms and lifetimes. Reliability issue caused by bending fatigue, one of the typical operational limitations of flexible electronics, has been studied using various test methodologies; however, electromechanical evaluations which are essential to assess the reliability of electronic devices for flexible applications had not been investigated because the testing method was not established. By employing the in situ bending fatigue test, we has studied the failure mechanism for various conditions and parameters, such as bending strain, fatigue area, film thickness, and lateral dimensions. Moreover, various methods for improving the bending reliability have been developed based on the failure mechanism. Nanostructures such as holes, pores, wires and composites of nanoparticles and nanotubes have been suggested for better reliability. Flexible devices were also investigated to find the potential failures initiated by complex structures under bending fatigue strain. In this review, the recent advances in test methodology, mechanism studies, and practical applications are introduced. Additionally, perspectives including the future advance to stretchable electronics are discussed based on the current achievements in research.
A proposed method to investigate reliability throughout a questionnaire.
Wentzel-Larsen, Tore; Norekvål, Tone M; Ulvik, Bjørg; Nygård, Ottar; Pripp, Are H
2011-10-05
Questionnaires are used extensively in medical and health care research and depend on validity and reliability. However, participants may differ in interest and awareness throughout long questionnaires, which can affect reliability of their answers. A method is proposed for "screening" of systematic change in random error, which could assess changed reliability of answers. A simulation study was conducted to explore whether systematic change in reliability, expressed as changed random error, could be assessed using unsupervised classification of subjects by cluster analysis (CA) and estimation of intraclass correlation coefficient (ICC). The method was also applied on a clinical dataset from 753 cardiac patients using the Jalowiec Coping Scale. The simulation study showed a relationship between the systematic change in random error throughout a questionnaire and the slope between the estimated ICC for subjects classified by CA and successive items in a questionnaire. This slope was proposed as an awareness measure--to assessing if respondents provide only a random answer or one based on a substantial cognitive effort. Scales from different factor structures of Jalowiec Coping Scale had different effect on this awareness measure. Even though assumptions in the simulation study might be limited compared to real datasets, the approach is promising for assessing systematic change in reliability throughout long questionnaires. Results from a clinical dataset indicated that the awareness measure differed between scales.
Yildirim, Yücel; Ergin, Gülbin
2013-01-01
Fatigue is primarily a subjective experience and self-report is the most common approach used to measure fatigue. Numerous self-report instruments have been developed to measure fatigue. Unfortunately, each of these measures was tailored for the situation in which fatigue was studied. Therefore, the aim of this study was to determine the reliability and validity of the Turkish language version of the Multidimensional Assessment of Fatigue Scale (MAF-T) in chronic musculoskeletal physical therapy patients. The MAF-T was supplied by the MAPI Research Institute, and 69 chronic musculoskeletal physical therapy patients were evaluated. To validate MAF-T, all participants completed the MAF-T and Short Form-36 (SF-36). The MAF was administered again one week later to assess test-retest reliability. Using Cronbach α, the internal consistency reliability of the MAF-T was 0.90, the Intraclass Correlation Coefficient (ICC) reliability was 0.96. Item-discriminant validity was calculated between r=0.14 and r=0.82. The correlations between the total scores of the MAF-T scale and the subscale scores of SF-36 were negative and significant (p< 0.01). The MAF-T is a valid and reliable scale for assessing fatigue in chronic musculoskeletal physical therapy patients.
Wise, Frances M; Harris, Darren W; Olver, John H
2017-01-01
Considerable research has been undertaken in evaluating the DASS-21 in a variety of clinical populations, but studies of the instrument's psychometric adequacy in healthcare professionals is lacking. This study aimed to establish and improve the construct validity and reliability of the DASS-21 in a cohort of Australian health professionals. 343 rehabilitation health professionals completed the DASS-21, along with a demographic questionnaire. Principal components analysis was performed to identify potential factors in the DASS-21. Factors were interpreted against theoretical constructs underlying the instrument. Items loading on separate factors were then subjected to reliability analysis to determine internal consistency of subscales. Items that demonstrated poor fit, or loaded onto more than one factor, were deleted to maximise the reliability of each subscale. Principal components analysis identified three dimensions (depression, anxiety, stress) in a modified version of the DASS-21 (renamed DASS-14), with appropriate construct validity and good reliability (a=0.73 to 0.88). The three dimensions accounted for over 62% of variance between items. The modified DASS-14 scale is a more parsimonious measure of depression, anxiety, and stress, with acceptable reliability and construct validity, in rehabilitation health professionals and is appropriate for use in studies of similar populations.
Dichter, Martin Nikolaus; Schwab, Christian G G; Meyer, Gabriele; Bartholomeyczik, Sabine; Halek, Margareta
2016-02-01
For people with dementia, the concept of quality of life (Qol) reflects the disease's impact on the whole person. Thus, Qol is an increasingly used outcome measure in dementia research. This systematic review was performed to identify available dementia-specific Qol measurements and to assess the quality of linguistic validations and reliability studies of these measurements (PROSPERO 2013: CRD42014008725). The MEDLINE, CINAHL, EMBASE, PsycINFO, and Cochrane Methodology Register databases were systematically searched without any date restrictions. Forward and backward citation tracking were performed on the basis of selected articles. A total of 70 articles addressing 19 dementia-specific Qol measurements were identified; nine measurements were adapted to nonorigin countries. The quality of the linguistic validations varied from insufficient to good. Internal consistency was the most frequently tested reliability property. Most of the reliability studies lacked internal validity. Qol measurements for dementia are insufficiently linguistic validated and not well tested for reliability. None of the identified measurements can be recommended without further research. The application of international guidelines and quality criteria is strongly recommended for the performance of linguistic validations and reliability studies of dementia-specific Qol measurements. Copyright © 2016 Elsevier Inc. All rights reserved.
Anderson, Donald D; Segal, Neil A; Kern, Andrew M; Nevitt, Michael C; Torner, James C; Lynch, John A
2012-01-01
Recent findings suggest that contact stress is a potent predictor of subsequent symptomatic osteoarthritis development in the knee. However, much larger numbers of knees (likely on the order of hundreds, if not thousands) need to be reliably analyzed to achieve the statistical power necessary to clarify this relationship. This study assessed the reliability of new semiautomated computational methods for estimating contact stress in knees from large population-based cohorts. Ten knees of subjects from the Multicenter Osteoarthritis Study were included. Bone surfaces were manually segmented from sequential 1.0 Tesla magnetic resonance imaging slices by three individuals on two nonconsecutive days. Four individuals then registered the resulting bone surfaces to corresponding bone edges on weight-bearing radiographs, using a semi-automated algorithm. Discrete element analysis methods were used to estimate contact stress distributions for each knee. Segmentation and registration reliabilities (day-to-day and interrater) for peak and mean medial and lateral tibiofemoral contact stress were assessed with Shrout-Fleiss intraclass correlation coefficients (ICCs). The segmentation and registration steps of the modeling approach were found to have excellent day-to-day (ICC 0.93-0.99) and good inter-rater reliability (0.84-0.97). This approach for estimating compartment-specific tibiofemoral contact stress appears to be sufficiently reliable for use in large population-based cohorts.
Morris, Roisin; MacNeela, Padraig; Scott, Anne; Treacy, Pearl; Hyde, Abbey; O'Brien, Julian; Lehwaldt, Daniella; Byrne, Anne; Drennan, Jonathan
2008-04-01
In a study to establish the interrater reliability of the Irish Nursing Minimum Data Set (I-NMDS) for mental health difficulties relating to the choice of reliability test statistic were encountered. The objective of this paper is to highlight the difficulties associated with testing interrater reliability for an ordinal scale using a relatively homogenous sample and the recommended kw statistic. One pair of mental health nurses completed the I-NMDS for mental health for a total of 30 clients attending a mental health day centre over a two-week period. Data was analysed using the kw and percentage agreement statistics. A total of 34 of the 38 I-NMDS for mental health variables with lower than acceptable levels of kw reliability scores achieved acceptable levels of reliability according to their percentage agreement scores. The study findings implied that, due to the homogeneity of the sample, low variability within the data resulted in the 'base rate problem' associated with the use of kw statistic. Conclusions point to the interpretation of kw in tandem with percentage agreement scores. Suggestions that kw scores were low due to chance agreement and that one should strive to use a study sample with known variability are queried.
Day-to-day reliability of gait characteristics in rats.
Raffalt, Peter C; Nielsen, Louise R; Madsen, Stefan; Munk Højberg, Laurits; Pingel, Jessica; Nielsen, Jens Bo; Wienecke, Jacob; Alkjær, Tine
2018-04-27
The purpose of the present study was to determine the day-to-day reliability in stride characteristics in rats during treadmill walking obtained with two-dimensional (2D) motion capture. Kinematics were recorded from 26 adult rats during walking at 8 m/min, 12 m/min and 16 m/min on two separate days. Stride length, stride time, contact time, swing time and hip, knee and ankle joint range of motion were extracted from 15 strides. The relative reliability was assessed using intra-class correlation coefficients (ICC(1,1)) and (ICC(3,1)). The absolute reliability was determined using measurement error (ME). Across walking speeds, the relative reliability ranged from fair to good (ICCs between 0.4 and 0.75). The ME was below 91 mm for strides lengths, below 55 ms for the temporal stride variables and below 6.4° for the joint angle range of motion. In general, the results indicated an acceptable day-to-day reliability of the gait pattern parameters observed in rats during treadmill walking. The results of the present study may serve as a reference material that can help future intervention studies on rat gait characteristics both with respect to the selection of outcome measures and in the interpretation of the results. Copyright © 2018 Elsevier Ltd. All rights reserved.
Interrater reliability: the kappa statistic.
McHugh, Mary L
2012-01-01
The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.
Software reliability models for critical applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, H.; Pham, M.
This report presents the results of the first phase of the ongoing EG G Idaho, Inc. Software Reliability Research Program. The program is studying the existing software reliability models and proposes a state-of-the-art software reliability model that is relevant to the nuclear reactor control environment. This report consists of three parts: (1) summaries of the literature review of existing software reliability and fault tolerant software reliability models and their related issues, (2) proposed technique for software reliability enhancement, and (3) general discussion and future research. The development of this proposed state-of-the-art software reliability model will be performed in the secondmore » place. 407 refs., 4 figs., 2 tabs.« less
Software reliability models for critical applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, H.; Pham, M.
This report presents the results of the first phase of the ongoing EG&G Idaho, Inc. Software Reliability Research Program. The program is studying the existing software reliability models and proposes a state-of-the-art software reliability model that is relevant to the nuclear reactor control environment. This report consists of three parts: (1) summaries of the literature review of existing software reliability and fault tolerant software reliability models and their related issues, (2) proposed technique for software reliability enhancement, and (3) general discussion and future research. The development of this proposed state-of-the-art software reliability model will be performed in the second place.more » 407 refs., 4 figs., 2 tabs.« less
Reliability of physical functioning tests in patients with low back pain: a systematic review.
Denteneer, Lenie; Van Daele, Ulrike; Truijen, Steven; De Hertogh, Willem; Meirte, Jill; Stassijns, Gaetane
2018-01-01
The aim of this study was to provide a comprehensive overview of physical functioning tests in patients with low back pain (LBP) and to investigate their reliability. A systematic computerized search was finalized in four different databases on June 24, 2017: PubMed, Web of Science, Embase, and MEDLINE. Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines were followed during all stages of this review. Clinical studies that investigate the reliability of physical functioning tests in patients with LBP were eligible. The methodological quality of the included studies was assessed with the use of the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) checklist. To come to final conclusions on the reliability of the identified clinical tests, the current review assessed three factors, namely, outcome assessment, methodological quality, and consistency of description. A total of 20 studies were found eligible and 38 clinical tests were identified. Good overall test-retest reliability was concluded for the extensor endurance test (intraclass correlation coefficient [ICC]=0.93-0.97), the flexor endurance test (ICC=0.90-0.97), the 5-minute walking test (ICC=0.89-0.99), the 50-ft walking test (ICC=0.76-0.96), the shuttle walk test (ICC=0.92-0.99), the sit-to-stand test (ICC=0.91-0.99), and the loaded forward reach test (ICC=0.74-0.98). For inter-rater reliability, only one test, namely, the Biering-Sörensen test (ICC=0.88-0.99), could be concluded to have an overall good inter-rater reliability. None of the identified clinical tests could be concluded to have a good intrarater reliability. Further investigation should focus on a better overall study methodology and the use of identical protocols for the description of clinical tests. The assessment of reliability is only a first step in the recommendation process for the use of clinical tests. In future research, the identified clinical tests in the current review should be further investigated for validity. Only when these clinimetric properties of a clinical test have been thoroughly investigated can a final conclusion regarding the clinical and scientific use of the identified tests be made. Copyright © 2017 Elsevier Inc. All rights reserved.
Kolodziejczyk, Julia K; Norman, Gregory J; Rock, Cheryl L; Arredondo, Elva M; Roesch, Scott C; Madanat, Hala; Patrick, Kevin
2016-01-01
This study evaluates the reliability and validity of the strategies for weight management (SWM) measure, a questionnaire that assesses weight management strategies for adults. The SWM includes 20 items that are categorized within the following subscales: (1) energy intake, (2) energy expenditure, (3) self-monitoring, and (4) self-regulation. Baseline and 6-month data were collected from 404 overweight/obese adults (mean age=22±3.8 years, 68% ethnic minority) enrolled in a randomized controlled trial aiming to reduce weight by improving diet and physical activity behaviours. Reliability and validity were assessed for each subscale separately. Cronbach alpha was conducted to assess reliability. Concurrent, construct I (sensitivity to the study treatment condition), and construct II (relationship to the outcomes) validity were assessed using linear regressions with the following outcome measures: weight, self-reported diet, and weekly energy expenditure. All subscales showed strong internal consistency. The strength of the validity evidence depended on subscale and validity type. The strongest validity evidence was concurrent validity of the energy intake and energy expenditure subscales; construct I validity of the energy intake and self-monitoring subscales; and construct II validity of the energy intake, energy expenditure, and self-regulation subscales. Results indicate that the SWM can be used to assess weight management strategies among an ethnically diverse sample of adults as each subscale showed evidence of reliability and select types of validity. As validity is an accumulation of evidence over multiple studies, this study provides initial reliability and validity evidence in one population segment. Copyright © 2015 Asia Oceania Association for the Study of Obesity. Published by Elsevier Ltd. All rights reserved.
Retest reliability of individual p3 topography assessed by high density electroencephalography.
Vázquez-Marrufo, Manuel; González-Rosa, Javier J; Galvao-Carmona, Alejandro; Hidalgo-Muñoz, Antonio; Borges, Mónica; Peña, Juan Luis Ruiz; Izquierdo, Guillermo
2013-01-01
Some controversy remains about the potential applicability of cognitive potentials for evaluating the cerebral activity associated with cognitive capacity. A fundamental requirement is that these neurophysiological parameters show a high level of stability over time. Previous studies have shown that the reliability of diverse parameters of the P3 component (latency and amplitude) ranges between moderate and high. However, few studies have paid attention to the retest reliability of the P3 topography in groups or individuals. Considering that changes in P3 topography have been related to different pathologies and healthy aging, the main objective of this article was to evaluate in a longitudinal study (two sessions) the reliability of P3 topography in a group and at the individual level. The correlation between sessions for P3 topography in the grand average of groups was high (r = 0.977, p<0.001). The within-subject correlation values ranged from 0.626 to 0.981 (mean: 0.888). In the between-subjects topography comparisons, the correlation was always lower for comparisons between different subjects than for within-subjects correlations in the first session but not in the second session. The present study shows that P3 topography is highly reliable for group analysis (comprising the same subjects) in different sessions. The results also confirmed that retest reliability for individual P3 maps is suitable for follow-up studies for a particular subject. Moreover, P3 topography appears to be a specific marker considering that the between-subjects correlations were lower than the within-subject correlations. However, P3 topography appears more similar between subjects in the second session, demonstrating that is modulated by experience. Possible clinical applications of all these results are discussed.
Hakimian, Pantea; Lak, Azadeh
2016-01-01
Background: In spite of the increased range of inactivity and obesity among Iranian adults, insufficient research has been done on environmental factors influencing physical activity. As a result adapting a subjective (self-report) measurement tool for assessment of physical environment in Iran is critical. Accordingly, in this study Neighborhood Environment Walkability Scale (NEWS) was adapted for Iran and also its reliability was evaluated. Methods: This study was conducted using a systematic adaptation method consisting of 3 steps: translate-back translation procedures, revision by a multidisciplinary panel of local experts and a cognitive study. Then NEWS-Iran was completed among adults aged 18 to 65 years (N=19) with an interval of 15 days. Intra-Class Coefficient (ICC) was used to evaluate the reliability of the adapted questionnaire. Results: NEWS-Iran is an adapted version of NEWS-A (abbreviated) and in the adaptation process five items were added from other versions of NEWS, two subscales were significantly modified for a shorter and more effective questionnaire, and five new items were added about climate factors and site-specific uses. NEWS-Iran showed almost perfect reliability (ICCs: more than 0.8) for all subscales, with items having moderate to almost perfect reliability scores (ICCs: 0.56-0.96). Conclusion: This study introduced NEWS-Iran, which is a reliable version of NEWS for measuring environmental perceptions related to physical activity behavior adapted for Iran. It is the first adapted version of NEWS which demonstrates a systematic adaptation process used by earlier studies. It can be used for other developing countries with similar environmental, social and cultural context. PMID:28210592
Hakimian, Pantea; Lak, Azadeh
2016-01-01
Background: In spite of the increased range of inactivity and obesity among Iranian adults, insufficient research has been done on environmental factors influencing physical activity. As a result adapting a subjective (self-report) measurement tool for assessment of physical environment in Iran is critical. Accordingly, in this study Neighborhood Environment Walkability Scale (NEWS) was adapted for Iran and also its reliability was evaluated. Methods: This study was conducted using a systematic adaptation method consisting of 3 steps: translate-back translation procedures, revision by a multidisciplinary panel of local experts and a cognitive study. Then NEWS-Iran was completed among adults aged 18 to 65 years (N=19) with an interval of 15 days. Intra-Class Coefficient (ICC) was used to evaluate the reliability of the adapted questionnaire. Results: NEWS-Iran is an adapted version of NEWS-A (abbreviated) and in the adaptation process five items were added from other versions of NEWS, two subscales were significantly modified for a shorter and more effective questionnaire, and five new items were added about climate factors and site-specific uses. NEWS-Iran showed almost perfect reliability (ICCs: more than 0.8) for all subscales, with items having moderate to almost perfect reliability scores (ICCs: 0.56-0.96). Conclusion: This study introduced NEWS-Iran, which is a reliable version of NEWS for measuring environmental perceptions related to physical activity behavior adapted for Iran. It is the first adapted version of NEWS which demonstrates a systematic adaptation process used by earlier studies. It can be used for other developing countries with similar environmental, social and cultural context.
The development of form two mathematics i-Think module (Mi-T2)
NASA Astrophysics Data System (ADS)
Yao, Foo Jing; Abdullah, Mohd Faizal Nizam Lee; Tien, Lee Tien
2017-05-01
This study aims to develop a training module i-THINK Mathematics Form Two (Mi-T2) to increase the higher-order thinking skills of students. The Mi-T2 training module was built based on the Sidek Module Development Model (2001). Constructivist learning theory, cognitive learning theory, i-THINK map and higher order thinking skills were the building blocks of the module development. In this study, researcher determined the validity and reliability of Mi-T2 module. The design being used in this study was descriptive study. To determine the needs of Mi-T2 module, questionnaires and literature review were used to collect data. When the need of the module was determined, the module was built and a pilot study was conducted to test the reliability of the Mi-T2 module. The pilot study was conducted at a secondary school in North Kinta, Perak. A Form Two class was selected to be the sample study through clustered random sampling. The pilot study was conducted for two months and one topic had been studied. The Mi-T2 module was evaluated by five expert panels to determine the content validity of the module. The instruments being used in the study were questionnaires about the necessity of the Mi-T2 module for guidance, questionnaires about the validity of the module and questionnaires concerning the reliability of the module. Statistical analysis was conducted to determine the validity and reliability coefficients of the Mi-T2 module. The content validity of Mi-T2 module was determined by Cohen's Kappa's (1968) agreement coefficient and the reliability of Mi-T2 module was determined by Cronbach Alpha's value scale. The content validity of Mi-T2 module was 0.89 and the Cronbach Alpha's value of Mi-T2 module was 0.911.
1985-09-01
CoC S~04 COMPARISON OF QUANTITY VERSUS QUALITY USING PERFORMANCE, RELIABILITY, AND LIFE CYCLE COST DATA. A CASE STUDY OF THE F-15, F-16, AND A-10...CYCLE COSTIATU.AT CAE AIR ORE HEO OG .- jAITR UIVERSITY W right.,Patterson Air Force Base, Ohio .! 5ൔ ,6 198 C.IT. U AF’IT/GSL,4/L3Q/65:S Ŗ J...COMPARISON OF QUANTITY VERSUS QUALITY USING PERFORMANCE, RELIABILITY, AND LIFE CYCLE COST DATA. A CASE STUDY OF THE F-15, F-16, AND A-10 AIRCRAFT THESIS David
Evensen, Natalie M; Kvåle, Alice; Braekken, Ingeborg H
2015-09-01
There is a lack of functional objective tests available to measure functional status in women with pelvic girdle pain (PGP). The purpose of this study was to establish test-retest and intertester reliability of the Timed Up and Go (TUG) test and Ten-metre Timed Walk Test (10mTWT) in pregnant women with PGP. A convenience sample of women was recruited over a 4-month period and tested on two occasions, 1 week apart to determine test-retest reliability. Intertester reliability was established between two assessors at the first testing session. Subjects were instructed to undertake the TUG and 10mTWT at maximum speed. One practise trial and two timed trials for each walking test was undertaken on Day 1 and one practise trial and one timed trial on Day 2. Seventeen women with PGP aged 31.1 years (SD [standard deviation] = 2.3) and 28.7 weeks pregnant (SD = 7.4) completed gait testing. Test-retest reliability using the intraclass correlation coefficient (ICC) was excellent for the TUG (0.88) and good for the 10mTWT (0.74). Intertester reliability was determined in the first 13 participants with excellent ICC values being found for both walking tests (TUG: 0.95; 10mTWT: 0.94). This study demonstrated that the TUG and 10mTWT undertaken at fast pace are reliable, objective functional tests in pregnant women with PGP. While both tests are suitable for use in the clinical and research settings, we would recommend the TUG given the findings of higher test-retest reliability and as this test requires less space and time to set up and score. Future studies in a larger sample size are warranted to confirm the results of this study. Copyright © 2015 John Wiley & Sons, Ltd.
Chen, Y-W; HajGhanbari, B; Road, J D; Coxson, H O; Camp, P G; Reid, W D
2018-06-08
Pain is prevalent in chronic obstructive pulmonary disease (COPD) and the Brief Pain Inventory (BPI) appears to be a feasible questionnaire to assess this symptom. However, the reliability and validity of the BPI have not been determined in individuals with COPD. This study aimed to determine the internal consistency, test-retest reliability and validity (construct, convergent, divergent and discriminant) of the BPI in individuals with COPD. In order to examine the test-retest reliability, individuals with COPD were recruited from pulmonary rehabilitation programmes to complete the BPI twice 1 week apart. In order to investigate validity, de-identified data was retrieved from two previous studies, including forced expiratory volume in 1-s, age, sex and data from four questionnaires: the BPI, short-form McGill Pain Questionnaire (SF-MPQ), 36-Item Short Form Survey (SF-36) and Community Health Activities Model Program for Seniors (CHAMPS) questionnaire. In total, 123 participants were included in the analyses (eligible data were retrieved from 86 participants and additional 37 participants were recruited). The BPI demonstrated excellent internal consistency and test-retest reliability. It also showed convergent validity with the SF-MPQ and divergent validity with the SF-36. The factor analysis yielded two factors of the BPI, which demonstrated that the two domains of the BPI measure the intended constructs. The BPI can also discriminate pain levels among COPD patients with varied levels of quality of life (SF-36) and physical activity (CHAMPS). The BPI is a reliable and valid pain questionnaire that can be used to evaluate pain in COPD. This study formally established the reliability and validity of the BPI in individuals with COPD, which have not been determined in this patient group. The results of this study provide strong evidence that assessment results from this pain questionnaire are reliable and valid. © 2018 European Pain Federation - EFIC®.
Validity and Reliability of the Turkish Chronic Pain Acceptance Questionnaire
Akmaz, Hazel Ekin; Uyar, Meltem; Kuzeyli Yıldırım, Yasemin; Akın Korhan, Esra
2018-01-01
Background: Pain acceptance is the process of giving up the struggle with pain and learning to live a worthwhile life despite it. In assessing patients with chronic pain in Turkey, making a diagnosis and tracking the effectiveness of treatment is done with scales that have been translated into Turkish. However, there is as yet no valid and reliable scale in Turkish to assess the acceptance of pain. Aims: To validate a Turkish version of the Chronic Pain Acceptance Questionnaire developed by McCracken and colleagues. Study Design: Methodological and cross sectional study. Methods: A simple randomized sampling method was used in selecting the study sample. The sample was composed of 201 patients, more than 10 times the number of items examined for validity and reliability in the study, which totaled 20. A patient identification form, the Chronic Pain Acceptance Questionnaire, and the Brief Pain Inventory were used to collect data. Data were collected by face-to-face interviews. In the validity testing, the content validity index was used to evaluate linguistic equivalence, content validity, construct validity, and expert views. In reliability testing of the scale, Cronbach’s α coefficient was calculated, and item analysis and split-test reliability methods were used. Principal component analysis and varimax rotation were used in factor analysis and to examine factor structure for construct concept validity. Results: The item analysis established that the scale, all items, and item-total correlations were satisfactory. The mean total score of the scale was 21.78. The internal consistency coefficient was 0.94, and the correlation between the two halves of the scale was 0.89. Conclusion: The Chronic Pain Acceptance Questionnaire, which is intended to be used in Turkey upon confirmation of its validity and reliability, is an evaluation instrument with sufficient validity and reliability, and it can be reliably used to examine patients’ acceptance of chronic pain. PMID:29843496
Reliability of infarct volumetry: Its relevance and the improvement by a software-assisted approach.
Friedländer, Felix; Bohmann, Ferdinand; Brunkhorst, Max; Chae, Ju-Hee; Devraj, Kavi; Köhler, Yvette; Kraft, Peter; Kuhn, Hannah; Lucaciu, Alexandra; Luger, Sebastian; Pfeilschifter, Waltraud; Sadler, Rebecca; Liesz, Arthur; Scholtyschik, Karolina; Stolz, Leonie; Vutukuri, Rajkumar; Brunkhorst, Robert
2017-08-01
Despite the efficacy of neuroprotective approaches in animal models of stroke, their translation has so far failed from bench to bedside. One reason is presumed to be a low quality of preclinical study design, leading to bias and a low a priori power. In this study, we propose that the key read-out of experimental stroke studies, the volume of the ischemic damage as commonly measured by free-handed planimetry of TTC-stained brain sections, is subject to an unrecognized low inter-rater and test-retest reliability with strong implications for statistical power and bias. As an alternative approach, we suggest a simple, open-source, software-assisted method, taking advantage of automatic-thresholding techniques. The validity and the improvement of reliability by an automated method to tMCAO infarct volumetry are demonstrated. In addition, we show the probable consequences of increased reliability for precision, p-values, effect inflation, and power calculation, exemplified by a systematic analysis of experimental stroke studies published in the year 2015. Our study reveals an underappreciated quality problem in translational stroke research and suggests that software-assisted infarct volumetry might help to improve reproducibility and therefore the robustness of bench to bedside translation.
ERIC Educational Resources Information Center
Lee, Guemin; Park, In-Yong
2012-01-01
Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
ERIC Educational Resources Information Center
Kim, Seonghoon; Feldt, Leonard S.
2010-01-01
The primary purpose of this study is to investigate the mathematical characteristics of the test reliability coefficient rho[subscript XX'] as a function of item response theory (IRT) parameters and present the lower and upper bounds of the coefficient. Another purpose is to examine relative performances of the IRT reliability statistics and two…
Kalichman, Leonid; Klindukhov, Alexander; Li, Ling; Linov, Lina
2016-11-01
A reliability and cross-sectional observational study. To introduce a scoring system for visible fat infiltration in paraspinal muscles; to evaluate intertester and intratester reliability of this system and its relationship with indices of muscle density; to evaluate the association between indices of paraspinal muscle degeneration and facet joint osteoarthritis. Current evidence suggests that the paraspinal muscles degeneration is associated with low back pain, facet joint osteoarthritis, spondylolisthesis, and degenerative disc disease. However, the evaluation of paraspinal muscles on computed tomography is not radiological routine, probably because of absence of simple and reliable indices of paraspinal degeneration. One hundred fifty consecutive computed tomography scans of the lower back (N=75) or abdomen (N=75) were evaluated. Mean radiographic density (in Hounsfield units) and SD of the density of multifidus and erector spinae were evaluated at the L4-L5 spinal level. A new index of muscle degeneration, radiographic density ratio=muscle density/SD of density, was calculated. To evaluate the visible fat infiltration in paraspinal muscles, we proposed a 3-graded scoring system. The prevalence of facet joint osteoarthritis was also evaluated. Intraclass correlation and κ statistics were used to evaluate inter-rater and intra-rater reliability. Logistic regression examined the association between paraspinal muscle indices and facet joint osteoarthritis. Intra-rater reliability for fat infiltration score (κ) ranged between 0.87 and 0.92; inter-rater reliability between 0.70 and 0.81. Intra-rater reliability (intraclass correlation) for mean density of paraspinal muscles ranged between 0.96 and 0.99, inter-rater reliability between 0.95 and 0.99; SD intra-rater reliability ranged between 0.82 and 0.91, inter-rater reliability between 0.80 and 0.89. Significant associations (P<0.01) were found between facet joint osteoarthritis, fat infiltration score, and radiographic density ratio. Two suggested indices of paraspinal muscle degeneration showed excellent reliability and were significantly associated with facet joint osteoarthritis. Additional studies are needed to evaluate the associations with other spinal degeneration features and low back pain.
The effect of Web-based Braden Scale training on the reliability of Braden subscale ratings.
Magnan, Morris A; Maklebust, JoAnn
2009-01-01
The primary purpose of this study was to evaluate the effect of Web-based Braden Scale training on the reliability of Braden Scale subscale ratings made by nurses working in acute care hospitals. A secondary purpose was to describe the distribution of reliable Braden subscale ratings before and after Web-based Braden Scale training. Secondary analysis of data from a recently completed quasi-experimental, pretest-posttest, interrater reliability study. A convenience sample of RNs working at 3 Michigan medical centers voluntarily participated in the study. RN participants included nurses who used the Braden Scale regularly at their place of employment ("regular users") as well as nurses who did not use the Braden Scale at their place of employment ("new users"). Using a pretest-posttest, quasi-experimental design, pretest interrater reliability data were collected to identify the percentage of nurses making reliable Braden subscale assessments. Nurses then completed a Web-based Braden Scale training module after which posttest interrater reliability data were collected. The reliability of nurses' Braden subscale ratings was determined by examining the level of agreement/disagreement between ratings made by an RN and an "expert" rating the same patient. In total, 381 RN-to-expert dyads were available for analysis. During both the pretest and posttest periods, the percentage of reliable subscale ratings was highest for the activity subscale, lowest for the moisture subscale, and second lowest for the nutrition subscale. With Web-based Braden Scale training, the percentage of reliable Braden subscale ratings made by new users increased for all 6 subscales with statistically significant improvements in the percentage of reliable assessments made on 3 subscales: sensory-perception, moisture, and mobility. Training had virtually no effect on the percentage of reliable subscale ratings made by regular users of the Braden Scale. With Web-based Braden Scale training the percentage of nurses making reliable ratings increased for all 6 subscales, but this was true for new users only. Additional research is needed to identify educational approaches that effectively improve and sustain the reliability of subscale ratings among regular users of the Braden Scale. Moreover, special attention needs to be given to ensuring that all nurses working with the Braden Scale have a clear understanding of the intended meanings and correct approaches to rating moisture and nutrition subscales.
Lee, Ya-Chen; Yu, Wan-Hui; Hsueh, I-Ping; Chen, Sheng-Shiung; Hsieh, Ching-Lin
2017-10-01
A lack of evidence on the test-retest reliability and responsiveness limits the utility of the BI-based Supplementary Scales (BI-SS) in both clinical and research settings. To examine the test-retest reliability and responsiveness of the BI-based Supplementary Scales (BI-SS) in patients with stroke. A repeated-assessments design (1 week apart) was used to examine the test-retest reliability of the BI-SS. For the responsiveness study, the participants were assessed with the BI-SS and BI (treated as an external criterion) at admission to and discharge from rehabilitation wards. Seven outpatient rehabilitation units and one inpatient rehabilitation unit. Outpatients with chronic stroke. Eighty-four outpatients with chronic stroke participated in the test-retest reliability study. Fifty-seven inpatients completed baseline and follow-up assessments in the responsiveness study. For the test-retest reliability study, the values of the intra-class correlation coefficient and the overall percentage of minimal detectable change for the Ability Scale and Self-perceived Difficulty Scale were 0.97, 12.8%, and 0.78, 35.8%, respectively. For the responsiveness study, the standardized effect size and standardized response mean (representing internal responsiveness) of the Ability Scale and Self-perceived Difficulty Scale were 1.17 and 1.56, and 0.78 and 0.89, respectively. Regarding external responsiveness, the change in score of the Ability Scale had significant and moderate association with that of the BI (r=0.61, P<0.001). The change in score of the Self-perceived Difficulty Scale had non-significant and weak association with that of the BI (r=0.23, P=0.080). The Ability Scale of the BI-SS has satisfactory test-retest reliability and sufficient responsiveness for patients with stroke. However, the Self-perceived Difficulty Scale of the BI-SS has substantial random measurement error and insufficient external responsiveness, which may affect its utility in clinical settings. The findings of this study provide empirical evidence of psychometric properties of the BI-SS for assessing ability and self-perceived difficulty of ADL in patients with stroke.
Hua, Bin; Abbas, Estelle; Hayes, Alan; Ryan, Peter; Nelson, Lisa; O'Brien, Kylie
2012-11-01
Chinese medicine (CM) has its own diagnostic indicators that are used as evidence of change in a patient's condition. The majority of studies investigating efficacy of Chinese herbal medicine (CHM) have utilized biomedical diagnostic endpoints. For CM clinical diagnostic variables to be incorporated into clinical trial designs, there would need to be evidence that these diagnostic variables are reliable. Previous studies have indicated that the reliability of CM syndrome diagnosis is variable. Little information is known about where the variability stems from--the basic data collection level or the synthesis of diagnostic data, or both. No previous studies have investigated systematically the reliability of all four diagnostic methods used in the CM diagnostic process (Inquiry, Inspection, Auscultation/Olfaction, and Palpation). The objective of this study was to assess the inter-rater reliability of data collected using the four diagnostic methods of CM in Australian patients with knee osteoarthritis (OA), in order to investigate if CM variables could be used with confidence as diagnostic endpoints in a clinical trial investigating the efficacy of a CHM in treating OA. An inter-rater reliability study was conducted as a substudy of a clinical trial investigating the treatment of knee OA with Chinese herbal medicine. Two (2) experienced CM practitioners conducted a CM examination separately, within 2 hours of each other, in 40 participants. A CM assessment form was utilized to record the diagnostic data. Cohen's κ coefficient was used as a measure of the level of agreement between 2 practitioners. There was a relatively good level of agreement for Inquiry and Auscultation variables, and, in general, a low level of agreement for (visual) Inspection and Palpation variables. There was variation in the level of agreement between 2 practitioners on clinical information collected using the Four Diagnostic Methods of a CM examination. Some aspects of CM diagnosis appear to be reliable, while others are not. Based on these results, it was inappropriate to use CM diagnostic variables as diagnostic endpoints in the main study, which was an investigation of efficacy of CHM treatment of knee OA.
ERIC Educational Resources Information Center
Yesil, Rüstü
2017-01-01
The purpose of this study was to develop a valid and reliable scale that can be used in determining the civic-mindedness levels of teaching staff working at universities. The study group of the research consisted of 758 students, 256 of whom were male and 524 were female. The item list, which was based on the literature and expert opinions, was…
ERIC Educational Resources Information Center
Maiano, Christophe; Morin, Alexandre J. S.; Begarie, Jerome
2011-01-01
The purpose of this study was to test the factor validity and reliability of the Center for Epidemiologic Studies Depression Scale (CES-D) within a sample of adolescents with mild to moderate Intellectual Disability (ID). A total sample of 189 adolescents (121 boys and 68 girls), aged between 12 and 18 years old, with mild to moderate ID were…
ERIC Educational Resources Information Center
Caliskan, Gokhan
2015-01-01
The current study aims to test the reliability and validity of the Leader-Member Exchange (LMX 7) scale with regard to coach--player relationships in sports settings. A total of 330 professional soccer players from the Turkish Super League as well as from the First and Second Leagues participated in this study. Factor analyses were performed to…
Developing and testing the patient-centred innovation questionnaire for hospital nurses.
Huang, Ching-Yuan; Weng, Rhay-Hung; Wu, Tsung-Chin; Lin, Tzu-En; Hsu, Ching-Tai; Hung, Chiu-Hsia; Tsai, Yu-Chen
2018-03-01
Develop the patient-centred innovation questionnaire for hospital nurses and establish its validity and reliability. Patient-centred care has been adopted by health care managers in their efforts to improve health care quality. It is regarded as a core concept for developing innovation. A cross-sectional study was employed to collect data from hospital nurses in Taiwan. This study was divided into two stages: pilot study and main study. In the main study, 596 valid responses were collected. This study adopted reliability analysis, exploratory factor analysis, confirmatory factor analysis and selected nurse innovation scale as a criterion to test criterion-related validity. Five-dimension patient-centred innovation questionnaire was proposed: access and practicability, co-ordination and communication, sharing power and responsibility, care continuity, family and person focus. Each dimension demonstrated a reliability of 0.89-0.98. All dimensions had acceptable convergent and discriminate validity. The patient-centred innovation questionnaire and nurse innovation scale exhibited a significantly positive correlation. Patient-centred innovation questionnaire not only had a good theoretical basis but also had sufficient reliability and construct validity, and criterion-related validity. Patient-centred innovation questionnaire could give a measure for evaluating the implementation of patient-centred care and could be used as a management tool during the process of nurse innovation. © 2017 John Wiley & Sons Ltd.
Validity and Reliability of the School Physical Activity Environment Questionnaire
ERIC Educational Resources Information Center
Martin, Jeffrey J.; McCaughtry, Nate; Flory, Sara; Murphy, Anne; Wisdom, Kimberlydawn
2011-01-01
The goal of the current study was to establish the factor validity of the Questionnaire Assessing School Physical Activity Environment (Robertson-Wilson, Levesque, & Holden, 2007) using confirmatory factor analysis procedures. Another goal was to establish internal reliability and test-retest reliability. The confirmatory factor analysis…
NEPP DDR Device Reliability FY13 Report
NASA Technical Reports Server (NTRS)
Guertin, Steven M.; Armbar, Mehran
2014-01-01
This document reports the status of the NEPP Double Data Rate (DDR) Device Reliability effort for FY2013. The task targeted general reliability of > 100 DDR2 devices from Hynix, Samsung, and Micron. Detailed characterization of some devices when stressed by several data storage patterns was studied, targeting ability of the data cells to store the different data patterns without refresh, highlighting the weakest bits. DDR2, Reliability, Data Retention, Temperature Stress, Test System Evaluation, General Reliability, IDD measurements, electronic parts, parts testing, microcircuits
Larsen, Camilla Marie; Juul-Kristensen, Birgit; Lund, Hans; Søgaard, Karen
2014-10-01
The aims were to compile a schematic overview of clinical scapular assessment methods and critically appraise the methodological quality of the involved studies. A systematic, computer-assisted literature search using Medline, CINAHL, SportDiscus and EMBASE was performed from inception to October 2013. Reference lists in articles were also screened for publications. From 50 articles, 54 method names were identified and categorized into three groups: (1) Static positioning assessment (n = 19); (2) Semi-dynamic (n = 13); and (3) Dynamic functional assessment (n = 22). Fifteen studies were excluded for evaluation due to no/few clinimetric results, leaving 35 studies for evaluation. Graded according to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN checklist), the methodological quality in the reliability and validity domains was "fair" (57%) to "poor" (43%), with only one study rated as "good". The reliability domain was most often investigated. Few of the assessment methods in the included studies that had "fair" or "good" measurement property ratings demonstrated acceptable results for both reliability and validity. We found a substantially larger number of clinical scapular assessment methods than previously reported. Using the COSMIN checklist the methodological quality of the included measurement properties in the reliability and validity domains were in general "fair" to "poor". None were examined for all three domains: (1) reliability; (2) validity; and (3) responsiveness. Observational evaluation systems and assessment of scapular upward rotation seem suitably evidence-based for clinical use. Future studies should test and improve the clinimetric properties, and especially diagnostic accuracy and responsiveness, to increase utility for clinical practice.
Hanskamp-Sebregts, Mirelle; Zegers, Marieke; Vincent, Charles; van Gurp, Petra J; de Vet, Henrica C W; Wollersheim, Hub
2016-01-01
Objectives Record review is the most used method to quantify patient safety. We systematically reviewed the reliability and validity of adverse event detection with record review. Design A systematic review of the literature. Methods We searched PubMed, EMBASE, CINAHL, PsycINFO and the Cochrane Library and from their inception through February 2015. We included all studies that aimed to describe the reliability and/or validity of record review. Two reviewers conducted data extraction. We pooled κ values (κ) and analysed the differences in subgroups according to number of reviewers, reviewer experience and training level, adjusted for the prevalence of adverse events. Results In 25 studies, the psychometric data of the Global Trigger Tool (GTT) and the Harvard Medical Practice Study (HMPS) were reported and 24 studies were included for statistical pooling. The inter-rater reliability of the GTT and HMPS showed a pooled κ of 0.65 and 0.55, respectively. The inter-rater agreement was statistically significantly higher when the group of reviewers within a study consisted of a maximum five reviewers. We found no studies reporting on the validity of the GTT and HMPS. Conclusions The reliability of record review is moderate to substantial and improved when a small group of reviewers carried out record review. The validity of the record review method has never been evaluated, while clinical data registries, autopsy or direct observations of patient care are potential reference methods that can be used to test concurrent validity. PMID:27550650
Romano-Riquer, S. Patricia; Hernández-Ávila, Mauricio; Gladen, Beth C.; Cupul-Uicab, Lea A.; Longnecker, Matthew P.
2013-01-01
Summary Development of the perineum as well as the external genitalia is determined by dihydrotestosterone, resulting in a greater anogenital distance (AGD) in males than females. In animal experiments with hormonally active agents, anogenital distance is used as a bioassay of fetal androgen action. Use of anogenital distance in human studies has been rare. Because anogenital distance has been an easy-to-measure, sensitive outcome in animal studies, we developed an anthropometric protocol for measurement of anogenital distance in human males. In this paper we describe the method for measurement of three anogenital distances, their reliability, and an assessment of predictors for each in the context of an epidemiological study. We compare the reliabilities and predictors to those for stretched penis length and penis width. A cross-sectional study of 781 newly-delivered male infants was conducted in 2002–2003 in Chiapas, México. Replicate measures were obtained on nearly all subjects. The reliability of the measures of anogenital distance (0.82–0.91) were higher than for stretched penis length (0.78) and width (0.75). Birthweight and gestational length were more strongly related to anogenital distance than to penis length. Anogenital distance was not related to penis length (r = 0.03). Our large study clearly shows that AGD can be measured well in newborn males, and that the measurements were more reliable than those of penis length. Whether AGD measures in humans relate to clinically important outcomes, however, remains to be determined, as does its utility as a measure of androgen action in epidemiological studies. PMID:17439530
Real-time software failure characterization
NASA Technical Reports Server (NTRS)
Dunham, Janet R.; Finelli, George B.
1990-01-01
A series of studies aimed at characterizing the fundamentals of the software failure process has been undertaken as part of a NASA project on the modeling of a real-time aerospace vehicle software reliability. An overview of these studies is provided, and the current study, an investigation of the reliability of aerospace vehicle guidance and control software, is examined. The study approach provides for the collection of life-cycle process data, and for the retention and evaluation of interim software life-cycle products.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Logan, Jeffrey S.; Paranhos, Elizabeth; Kozak, Tracy G.
This study focuses on onshore natural gas operations and examines the extent to which oil and gas firms have embraced certain organizational characteristics that lead to 'high reliability' - understood here as strong safety and reliability records over extended periods of operation. The key questions that motivated this study include whether onshore oil and gas firms engaged in exploration and production (E&P) and midstream (i.e., natural gas transmission and storage) are implementing practices characteristic of high reliability organizations (HROs) and the extent to which any such practices are being driven by industry innovations and standards and/or regulatory requirements.
Modeling of a bubble-memory organization with self-checking translators to achieve high reliability.
NASA Technical Reports Server (NTRS)
Bouricius, W. G.; Carter, W. C.; Hsieh, E. P.; Wadia, A. B.; Jessep, D. C., Jr.
1973-01-01
Study of the design and modeling of a highly reliable bubble-memory system that has the capabilities of: (1) correcting a single 16-adjacent bit-group error resulting from failures in a single basic storage module (BSM), and (2) detecting with a probability greater than 0.99 any double errors resulting from failures in BSM's. The results of the study justify the design philosophy adopted of employing memory data encoding and a translator to correct single group errors and detect double group errors to enhance the overall system reliability.
The Reliability and Validity of the Computerized Double Inclinometer in Measuring Lumbar Mobility
MacDermid, Joy Christine; Arumugam, Vanitha; Vincent, Joshua Israel; Carroll, Krista L
2014-01-01
Study Design : Repeated measures reliability/validity study. Objectives : To determine the concurrent validity, test-retest, inter-rater and intra-rater reliability of lumbar flexion and extension measurements using the Tracker M.E. computerized dual inclinometer (CDI) in comparison to the modified-modified Schober (MMS) Summary of Background : Numerous studies have evaluated the reliability and validity of the various methods of measuring spinal motion, but the results are inconsistent. Differences in equipment and techniques make it difficult to correlate results. Methods : Twenty subjects with back pain and twenty without back pain were selected through convenience sampling. Two examiners measured sagittal plane lumbar range of motion for each subject. Two separate tests with the CDI and one test with the MMS were conducted. Each test consisted of three trials. Instrument and examiner order was randomly assigned. Intra-class correlations (ICCs 2, 2 and 2, 2) and Pearson correlation coefficients (r) were used to calculate reliability and concurrent validity respectively. Results : Intra-trial reliability was high to very high for both the CDI (ICCs 0.85 - 0.96) and MMS (ICCs 0.84 - 0.98). However, the reliability was poor to moderate, when the CDI unit had to be repositioned either by the same rate (ICCs 0.16 - 0.59) or a different rater (ICCs 0.45 - 0.52). Inter-rater reliability for the MMS was moderate to high (ICCs 0.75 - 0.82) which bettered the moderate correlation obtained for the CDI (ICCs 0.45 - 0.52). Correlations between the CDI and MMS were poor for flexion (0.32; p<0.05) and poor to moderate (-0.42 - -0.51; p<0.05) for extension measurements. Conclusion : When using the CDI, an average of subsequent tests is required to obtain moderate reliability. The MMS was highly reliable than the CDI. The MMS and the CDI measure lumbar movement on a different metric that are not highly related to each other. PMID:25352928
Ringdal, Kjetil G; Skaga, Nils Oddvar; Steen, Petter Andreas; Hestnes, Morten; Laake, Petter; Jones, J Mary; Lossius, Hans Morten
2013-01-01
Pre-injury comorbidities can influence the outcomes of severely injured patients. Pre-injury comorbidity status, graded according to the American Society of Anesthesiologists Physical Status (ASA-PS) classification system, is an independent predictor of survival in trauma patients and is recommended as a comorbidity score in the Utstein Trauma Template for Uniform Reporting of Data. Little is known about the reliability of pre-injury ASA-PS scores. The objective of this study was to examine whether the pre-injury ASA-PS system was a reliable scale for grading comorbidity in trauma patients. Nineteen Norwegian trauma registry coders were invited to participate in a reliability study in which 50 real but anonymised patient medical records were distributed. Reliability was analysed using quadratic weighted kappa (κ(w)) analysis with 95% CI as the primary outcome measure and unweighted kappa (κ) analysis, which included unknown values, as a secondary outcome measure. Fifteen of the invitees responded to the invitation, and ten participated. We found moderate (κ(w)=0.77 [95% CI: 0.64-0.87]) to substantial (κ(w)=0.95 [95% CI: 0.89-0.99]) rater-against-reference standard reliability using κ(w) and fair (κ=0.46 [95% CI: 0.29-0.64]) to substantial (κ=0.83 [95% CI: 0.68-0.94]) reliability using κ. The inter-rater reliability ranged from moderate (κ(w)=0.66 [95% CI: 0.45-0.81]) to substantial (κ(w)=0.96 [95% CI: 0.88-1.00]) for κ(w) and from slight (κ=0.36 [95% CI: 0.21-0.54]) to moderate (κ=0.75 [95% CI: 0.62-0.89]) for κ. The rater-against-reference standard reliability varied from moderate to substantial for the primary outcome measure and from fair to substantial for the secondary outcome measure. The study findings indicate that the pre-injury ASA-PS scale is a reliable score for classifying comorbidity in trauma patients. Copyright © 2012 Elsevier Ltd. All rights reserved.
The reliability of cause-of-death coding in The Netherlands.
Harteloh, Peter; de Bruin, Kim; Kardaun, Jan
2010-08-01
Cause-of-death statistics are a major source of information for epidemiological research or policy decisions. Information on the reliability of these statistics is important for interpreting trends in time or differences between populations. Variations in coding the underlying cause of death could hinder the attribution of observed differences to determinants of health. Therefore we studied the reliability of cause-of-death statistics in The Netherlands. We performed a double coding study. Death certificates from the month of May 2005 were coded again in 2007. Each death certificate was coded manually by four coders. Reliability was measured by calculating agreement between coders (intercoder agreement) and by calculating the consistency of each individual coder in time (intracoder agreement). Our analysis covered an amount of 10,833 death certificates. The intercoder agreement of four coders on the underlying cause of death was 78%. In 2.2% of the cases coders agreed on a change of the code assigned in 2005. The (mean) intracoder agreement of four coders was 89%. Agreement was associated with the specificity of the ICD-10 code (chapter, three digits, four digits), the age of the deceased, the number of coders and the number of diseases reported on the death certificate. The reliability of cause-of-death statistics turned out to be high (>90%) for major causes of death such as cancers and acute myocardial infarction. For chronic diseases, such as diabetes and renal insufficiency, reliability was low (<70%). The reliability of cause-of-death statistics varies by ICD-10 code/chapter. A statistical office should provide coders with (additional) rules for coding diseases with a low reliability and evaluate these rules regularly. Users of cause-of-death statistics should exercise caution when interpreting causes of death with a low reliability. Studies of reliability should take into account the number of coders involved and the number of codes on a death certificate.
2014-01-01
Background A balance test provides important information such as the standard to judge an individual’s functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Methods Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). Results The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. Conclusion The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment. PMID:24912769
Park, Dae-Sung; Lee, GyuChang
2014-06-10
A balance test provides important information such as the standard to judge an individual's functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment.
Décary, Simon; Ouellet, Philippe; Vendittoli, Pascal-André; Desmeules, François
2016-12-01
Clinicians often rely on physical examination tests to guide them in the diagnostic process of knee disorders. However, reliability of these tests is often overlooked and may influence the consistency of results and overall diagnostic validity. Therefore, the objective of this study was to systematically review evidence on the reliability of physical examination tests for the diagnosis of knee disorders. A structured literature search was conducted in databases up to January 2016. Included studies needed to report reliability measures of at least one physical test for any knee disorder. Methodological quality was evaluated using the QAREL checklist. A qualitative synthesis of the evidence was performed. Thirty-three studies were included with a mean QAREL score of 5.5 ± 0.5. Based on low to moderate quality evidence, the Thessaly test for meniscal injuries reached moderate inter-rater reliability (k = 0.54). Based on moderate to excellent quality evidence, the Lachman for anterior cruciate ligament injuries reached moderate to excellent inter-rater reliability (k = 0.42 to 0.81). Based on low to moderate quality evidence, the Tibiofemoral Crepitus, Joint Line and Patellofemoral Pain/Tenderness, Bony Enlargement and Joint Pain on Movement tests for knee osteoarthritis reached fair to excellent inter-rater reliability (k = 0.29 to 0.93). Based on low to moderate quality evidence, the Lateral Glide, Lateral Tilt, Lateral Pull and Quality of Movement tests for patellofemoral pain reached moderate to good inter-rater reliability (k = 0.49 to 0.73). Many physical tests appear to reach good inter-rater reliability, but this is based on low-quality and conflicting evidence. High-quality research is required to evaluate the reliability of knee physical examination tests. Copyright © 2016 Elsevier Ltd. All rights reserved.
An Examination of Reliability and Validity Claims of a Foreign Language Proficiency Test
ERIC Educational Resources Information Center
Mircea-Pines, Walter J.
2009-01-01
This dissertation study examined the reliability and validity claims of a modified version of the Spanish Modern Language Association Foreign Language Proficiency Test for Teachers and Advanced Students administered at George Mason University (GMU). The study used the 1999 computerized GMU version that was administered to 277 test-takers via…
Reflective Thinking Scale: A Validity and Reliability Study
ERIC Educational Resources Information Center
Basol, Gulsah; Evin Gencel, Ilke
2013-01-01
The purpose of this study was to adapt Reflective Thinking Scale to Turkish and investigate its validity and reliability over a Turkish university students' sample. Reflective Thinking Scale (RTS) is a 5 point Likert scale (ranging from 1 corresponding Agree Completely, 3 to Neutral, and 5 to Not Agree Completely), purposed to measure reflective…
A Study of E-Readiness Assessment: The Case of Three Universities in Nigeria
ERIC Educational Resources Information Center
Eweni, Samuel O.
2012-01-01
This study investigated the readiness of three higher educational institutions in Nigeria in their attempt to introduce and maintain technology-driven services to students, faculty, and support staff. The prerequisites for participation in the digital, networked economy include the affordable ICT, reliable electric supply, reliable and up-to-date…