NASA Astrophysics Data System (ADS)
Chen, Fan; Huang, Shaoxiong; Ding, Jinjin; Ding, Jinjin; Gao, Bo; Xie, Yuguang; Wang, Xiaoming
2018-01-01
This paper proposes a fast reliability assessing method for distribution grid with distributed renewable energy generation. First, the Weibull distribution and the Beta distribution are used to describe the probability distribution characteristics of wind speed and solar irradiance respectively, and the models of wind farm, solar park and local load are built for reliability assessment. Then based on power system production cost simulation probability discretization and linearization power flow, a optimal power flow objected with minimum cost of conventional power generation is to be resolved. Thus a reliability assessment for distribution grid is implemented fast and accurately. The Loss Of Load Probability (LOLP) and Expected Energy Not Supplied (EENS) are selected as the reliability index, a simulation for IEEE RBTS BUS6 system in MATLAB indicates that the fast reliability assessing method calculates the reliability index much faster with the accuracy ensured when compared with Monte Carlo method.
Lord, Sarah Peregrine; Can, Doğan; Yi, Michael; Marin, Rebeca; Dunn, Christopher W.; Imel, Zac E.; Georgiou, Panayiotis; Narayanan, Shrikanth; Steyvers, Mark; Atkins, David C.
2014-01-01
The current paper presents novel methods for collecting MISC data and accurately assessing reliability of behavior codes at the level of the utterance. The MISC 2.1 was used to rate MI interviews from five randomized trials targeting alcohol and drug use. Sessions were coded at the utterance-level. Utterance-based coding reliability was estimated using three methods and compared to traditional reliability estimates of session tallies. Session-level reliability was generally higher compared to reliability using utterance-based codes, suggesting that typical methods for MISC reliability may be biased. These novel methods in MI fidelity data collection and reliability assessment provided rich data for therapist feedback and further analyses. Beyond implications for fidelity coding, utterance-level coding schemes may elucidate important elements in the counselor-client interaction that could inform theories of change and the practice of MI. PMID:25242192
Lord, Sarah Peregrine; Can, Doğan; Yi, Michael; Marin, Rebeca; Dunn, Christopher W; Imel, Zac E; Georgiou, Panayiotis; Narayanan, Shrikanth; Steyvers, Mark; Atkins, David C
2015-02-01
The current paper presents novel methods for collecting MISC data and accurately assessing reliability of behavior codes at the level of the utterance. The MISC 2.1 was used to rate MI interviews from five randomized trials targeting alcohol and drug use. Sessions were coded at the utterance-level. Utterance-based coding reliability was estimated using three methods and compared to traditional reliability estimates of session tallies. Session-level reliability was generally higher compared to reliability using utterance-based codes, suggesting that typical methods for MISC reliability may be biased. These novel methods in MI fidelity data collection and reliability assessment provided rich data for therapist feedback and further analyses. Beyond implications for fidelity coding, utterance-level coding schemes may elucidate important elements in the counselor-client interaction that could inform theories of change and the practice of MI. Copyright © 2015 Elsevier Inc. All rights reserved.
Chen, J D; Sun, H L
1999-04-01
Objective. To assess and predict reliability of an equipment dynamically by making full use of various test informations in the development of products. Method. A new reliability growth assessment method based on army material system analysis activity (AMSAA) model was developed. The method is composed of the AMSAA model and test data conversion technology. Result. The assessment and prediction results of a space-borne equipment conform to its expectations. Conclusion. It is suggested that this method should be further researched and popularized.
Integrating Formal Methods and Testing 2002
NASA Technical Reports Server (NTRS)
Cukic, Bojan
2002-01-01
Traditionally, qualitative program verification methodologies and program testing are studied in separate research communities. None of them alone is powerful and practical enough to provide sufficient confidence in ultra-high reliability assessment when used exclusively. Significant advances can be made by accounting not only tho formal verification and program testing. but also the impact of many other standard V&V techniques, in a unified software reliability assessment framework. The first year of this research resulted in the statistical framework that, given the assumptions on the success of the qualitative V&V and QA procedures, significantly reduces the amount of testing needed to confidently assess reliability at so-called high and ultra-high levels (10-4 or higher). The coming years shall address the methodologies to realistically estimate the impacts of various V&V techniques to system reliability and include the impact of operational risk to reliability assessment. Combine formal correctness verification, process and product metrics, and other standard qualitative software assurance methods with statistical testing with the aim of gaining higher confidence in software reliability assessment for high-assurance applications. B) Quantify the impact of these methods on software reliability. C) Demonstrate that accounting for the effectiveness of these methods reduces the number of tests needed to attain certain confidence level. D) Quantify and justify the reliability estimate for systems developed using various methods.
Eliasson, Kristina; Palm, Peter; Nyman, Teresia; Forsman, Mikael
2017-07-01
A common way to conduct practical risk assessments is to observe a job and report the observed long term risks for musculoskeletal disorders. The aim of this study was to evaluate the inter- and intra-observer reliability of ergonomists' risk assessments without the support of an explicit risk assessment method. Twenty-one experienced ergonomists assessed the risk level (low, moderate, high risk) of eight upper body regions, as well as the global risk of 10 video recorded work tasks. Intra-observer reliability was assessed by having nine of the ergonomists repeat the procedure at least three weeks after the first assessment. The ergonomists made their risk assessment based on his/her experience and knowledge. The statistical parameters of reliability included agreement in %, kappa, linearly weighted kappa, intraclass correlation and Kendall's coefficient of concordance. The average inter-observer agreement of the global risk was 53% and the corresponding weighted kappa (K w ) was 0.32, indicating fair reliability. The intra-observer agreement was 61% and 0.41 (K w ). This study indicates that risk assessments of the upper body, without the use of an explicit observational method, have non-acceptable reliability. It is therefore recommended to use systematic risk assessment methods to a higher degree. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Assessing the reliability of ecotoxicological studies: An overview of current needs and approaches.
Moermond, Caroline; Beasley, Amy; Breton, Roger; Junghans, Marion; Laskowski, Ryszard; Solomon, Keith; Zahner, Holly
2017-07-01
In general, reliable studies are well designed and well performed, and enough details on study design and performance are reported to assess the study. For hazard and risk assessment in various legal frameworks, many different types of ecotoxicity studies need to be evaluated for reliability. These studies vary in study design, methodology, quality, and level of detail reported (e.g., reviews, peer-reviewed research papers, or industry-sponsored studies documented under Good Laboratory Practice [GLP] guidelines). Regulators have the responsibility to make sound and verifiable decisions and should evaluate each study for reliability in accordance with scientific principles regardless of whether they were conducted in accordance with GLP and/or standardized methods. Thus, a systematic and transparent approach is needed to evaluate studies for reliability. In this paper, 8 different methods for reliability assessment were compared using a number of attributes: categorical versus numerical scoring methods, use of exclusion and critical criteria, weighting of criteria, whether methods are tested with case studies, domain of applicability, bias toward GLP studies, incorporation of standard guidelines in the evaluation method, number of criteria used, type of criteria considered, and availability of guidance material. Finally, some considerations are given on how to choose a suitable method for assessing reliability of ecotoxicity studies. Integr Environ Assess Manag 2017;13:640-651. © 2016 The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals, Inc. on behalf of Society of Environmental Toxicology & Chemistry (SETAC). © 2016 The Authors. Integrated Environmental Assessment and Management published by Wiley Periodicals, Inc. on behalf of Society of Environmental Toxicology & Chemistry (SETAC).
Time-Tagged Risk/Reliability Assessment Program for Development and Operation of Space System
NASA Astrophysics Data System (ADS)
Kubota, Yuki; Takegahara, Haruki; Aoyagi, Junichiro
We have investigated a new method of risk/reliability assessment for development and operation of space system. It is difficult to evaluate risk of spacecraft, because of long time operation, maintenance free and difficulty of test under the ground condition. Conventional methods are FMECA, FTA, ETA and miscellaneous. These are not enough to assess chronological anomaly and there is a problem to share information during R&D. A new method of risk and reliability assessment, T-TRAP (Time-tagged Risk/Reliability Assessment Program) is proposed as a management tool for the development and operation of space system. T-TRAP consisting of time-resolved Fault Tree and Criticality Analyses, upon occurrence of anomaly in the system, facilitates the responsible personnel to quickly identify the failure cause and decide corrective actions. This paper describes T-TRAP method and its availability.
Validity and inter-observer reliability of subjective hand-arm vibration assessments.
Coenen, Pieter; Formanoy, Margriet; Douwes, Marjolein; Bosch, Tim; de Kraker, Heleen
2014-07-01
Exposure to mechanical vibrations at work (e.g., due to handling powered tools) is a potential occupational risk as it may cause upper extremity complaints. However, reliable and valid assessment methods for vibration exposure at work are lacking. Measuring hand-arm vibration objectively is often difficult and expensive, while often used information provided by manufacturers lacks detail. Therefore, a subjective hand-arm vibration assessment method was tested on validity and inter-observer reliability. In an experimental protocol, sixteen tasks handling powered tools were executed by two workers. Hand-arm vibration was assessed subjectively by 16 observers according to the proposed subjective assessment method. As a gold standard reference, hand-arm vibration was measured objectively using a vibration measurement device. Weighted κ's were calculated to assess validity, intra-class-correlation coefficients (ICCs) were calculated to assess inter-observer reliability. Inter-observer reliability of the subjective assessments depicting the agreement among observers can be expressed by an ICC of 0.708 (0.511-0.873). The validity of the subjective assessments as compared to the gold-standard reference can be expressed by a weighted κ of 0.535 (0.285-0.785). Besides, the percentage of exact agreement of the subjective assessment compared to the objective measurement was relatively low (i.e., 52% of all tasks). This study shows that subjectively assessed hand-arm vibrations are fairly reliable among observers and moderately valid. This assessment method is a first attempt to use subjective risk assessments of hand-arm vibration. Although, this assessment method can benefit from some future improvement, it can be of use in future studies and in field-based ergonomic assessments. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Getts, Katherine M; Quinn, Emilee L; Johnson, Donna B; Otten, Jennifer J
2017-11-01
Measuring food waste (ie, plate waste) in school cafeterias is an important tool to evaluate the effectiveness of school nutrition policies and interventions aimed at increasing consumption of healthier meals. Visual assessment methods are frequently applied in plate waste studies because they are more convenient than weighing. The visual quarter-waste method has become a common tool in studies of school meal waste and consumption, but previous studies of its validity and reliability have used correlation coefficients, which measure association but not necessarily agreement. The aims of this study were to determine, using a statistic measuring interrater agreement, whether the visual quarter-waste method is valid and reliable for assessing food waste in a school cafeteria setting when compared with the gold standard of weighed plate waste. To evaluate validity, researchers used the visual quarter-waste method and weighed food waste from 748 trays at four middle schools and five high schools in one school district in Washington State during May 2014. To assess interrater reliability, researcher pairs independently assessed 59 of the same trays using the visual quarter-waste method. Both validity and reliability were assessed using a weighted κ coefficient. For validity, as compared with the measured weight, 45% of foods assessed using the visual quarter-waste method were in almost perfect agreement, 42% of foods were in substantial agreement, 10% were in moderate agreement, and 3% were in slight agreement. For interrater reliability between pairs of visual assessors, 46% of foods were in perfect agreement, 31% were in almost perfect agreement, 15% were in substantial agreement, and 8% were in moderate agreement. These results suggest that the visual quarter-waste method is a valid and reliable tool for measuring plate waste in school cafeteria settings. Copyright © 2017 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bucknor, Matthew; Grabaskas, David; Brunett, Acacia
2015-04-26
Advanced small modular reactor designs include many advantageous design features such as passively driven safety systems that are arguably more reliable and cost effective relative to conventional active systems. Despite their attractiveness, a reliability assessment of passive systems can be difficult using conventional reliability methods due to the nature of passive systems. Simple deviations in boundary conditions can induce functional failures in a passive system, and intermediate or unexpected operating modes can also occur. As part of an ongoing project, Argonne National Laboratory is investigating various methodologies to address passive system reliability. The Reliability Method for Passive Systems (RMPS), amore » systematic approach for examining reliability, is one technique chosen for this analysis. This methodology is combined with the Risk-Informed Safety Margin Characterization (RISMC) approach to assess the reliability of a passive system and the impact of its associated uncertainties. For this demonstration problem, an integrated plant model of an advanced small modular pool-type sodium fast reactor with a passive reactor cavity cooling system is subjected to a station blackout using RELAP5-3D. This paper discusses important aspects of the reliability assessment, including deployment of the methodology, the uncertainty identification and quantification process, and identification of key risk metrics.« less
ASSESSING AND COMBINING RELIABILITY OF PROTEIN INTERACTION SOURCES
LEACH, SONIA; GABOW, AARON; HUNTER, LAWRENCE; GOLDBERG, DEBRA S.
2008-01-01
Integrating diverse sources of interaction information to create protein networks requires strategies sensitive to differences in accuracy and coverage of each source. Previous integration approaches calculate reliabilities of protein interaction information sources based on congruity to a designated ‘gold standard.’ In this paper, we provide a comparison of the two most popular existing approaches and propose a novel alternative for assessing reliabilities which does not require a gold standard. We identify a new method for combining the resultant reliabilities and compare it against an existing method. Further, we propose an extrinsic approach to evaluation of reliability estimates, considering their influence on the downstream tasks of inferring protein function and learning regulatory networks from expression data. Results using this evaluation method show 1) our method for reliability estimation is an attractive alternative to those requiring a gold standard and 2) the new method for combining reliabilities is less sensitive to noise in reliability assignments than the similar existing technique. PMID:17990508
Janssen, Ellen M; Marshall, Deborah A; Hauber, A Brett; Bridges, John F P
2017-12-01
The recent endorsement of discrete-choice experiments (DCEs) and other stated-preference methods by regulatory and health technology assessment (HTA) agencies has placed a greater focus on demonstrating the validity and reliability of preference results. Areas covered: We present a practical overview of tests of validity and reliability that have been applied in the health DCE literature and explore other study qualities of DCEs. From the published literature, we identify a variety of methods to assess the validity and reliability of DCEs. We conceptualize these methods to create a conceptual model with four domains: measurement validity, measurement reliability, choice validity, and choice reliability. Each domain consists of three categories that can be assessed using one to four procedures (for a total of 24 tests). We present how these tests have been applied in the literature and direct readers to applications of these tests in the health DCE literature. Based on a stakeholder engagement exercise, we consider the importance of study characteristics beyond traditional concepts of validity and reliability. Expert commentary: We discuss study design considerations to assess the validity and reliability of a DCE, consider limitations to the current application of tests, and discuss future work to consider the quality of DCEs in healthcare.
Impact of mounting methods in computerized axiography on assessment of condylar inclination.
Schierz, Oliver; Wagner, Philipp; Rauch, Angelika; Reissmann, Daniel R
2017-08-30
Valid and reliable recording is a key requirement for accurately simulating individual jaw movements. Horizontal condylar inclination (HCI) and Bennett's angle were measured using a digital jaw tracker (Cadiax® Compact 2) in 27 young adults. Three mounting methods (paraocclusal tray adapter, periocclusal tray adapter, and tray adapter with mandibular clamp) were tested. The mean values of the HCI differed by up to 10° between the mounting methods; however, the values for Bennett's angle did not differ substantially. While the intersession reliability of the Bennett's angle assessment did not depend on the mounting method, the reliability of the HCI assessment was only fair to good for the paraocclusal mounting method but poor for both periocclusal mounting methods. For attaching the tracing bow of jaw trackers to the mandible, a paraocclusal tray adapter should be applied, to achieve the most reliable results.
Reliability and risk assessment of structures
NASA Technical Reports Server (NTRS)
Chamis, C. C.
1991-01-01
Development of reliability and risk assessment of structural components and structures is a major activity at Lewis Research Center. It consists of five program elements: (1) probabilistic loads; (2) probabilistic finite element analysis; (3) probabilistic material behavior; (4) assessment of reliability and risk; and (5) probabilistic structural performance evaluation. Recent progress includes: (1) the evaluation of the various uncertainties in terms of cumulative distribution functions for various structural response variables based on known or assumed uncertainties in primitive structural variables; (2) evaluation of the failure probability; (3) reliability and risk-cost assessment; and (4) an outline of an emerging approach for eventual certification of man-rated structures by computational methods. Collectively, the results demonstrate that the structural durability/reliability of man-rated structural components and structures can be effectively evaluated by using formal probabilistic methods.
Training and Maintaining System-Wide Reliability in Outcome Management.
Barwick, Melanie A; Urajnik, Diana J; Moore, Julia E
2014-01-01
The Child and Adolescent Functional Assessment Scale (CAFAS) is widely used for outcome management, for providing real time client and program level data, and the monitoring of evidence-based practices. Methods of reliability training and the assessment of rater drift are critical for service decision-making within organizations and systems of care. We assessed two approaches for CAFAS training: external technical assistance and internal technical assistance. To this end, we sampled 315 practitioners trained by external technical assistance approach from 2,344 Ontario practitioners who had achieved reliability on the CAFAS. To assess the internal technical assistance approach as a reliable alternative training method, 140 practitioners trained internally were selected from the same pool of certified raters. Reliabilities were high for both practitioners trained by external technical assistance and internal technical assistance approaches (.909-.995, .915-.997, respectively). 1 and 3-year estimates showed some drift on several scales. High and consistent reliabilities over time and training method has implications for CAFAS training of behavioral health care practitioners, and the maintenance of CAFAS as a global outcome management tool in systems of care.
Cai, Gaigai; Chen, Xuefeng; Li, Bing; Chen, Baojia; He, Zhengjia
2012-01-01
The reliability of cutting tools is critical to machining precision and production efficiency. The conventional statistic-based reliability assessment method aims at providing a general and overall estimation of reliability for a large population of identical units under given and fixed conditions. However, it has limited effectiveness in depicting the operational characteristics of a cutting tool. To overcome this limitation, this paper proposes an approach to assess the operation reliability of cutting tools. A proportional covariate model is introduced to construct the relationship between operation reliability and condition monitoring information. The wavelet packet transform and an improved distance evaluation technique are used to extract sensitive features from vibration signals, and a covariate function is constructed based on the proportional covariate model. Ultimately, the failure rate function of the cutting tool being assessed is calculated using the baseline covariate function obtained from a small sample of historical data. Experimental results and a comparative study show that the proposed method is effective for assessing the operation reliability of cutting tools. PMID:23201980
Issues in Benchmarking Human Reliability Analysis Methods: A Literature Review
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ronald L. Boring; Stacey M. L. Hendrickson; John A. Forester
There is a diversity of human reliability analysis (HRA) methods available for use in assessing human performance within probabilistic risk assessments (PRA). Due to the significant differences in the methods, including the scope, approach, and underlying models, there is a need for an empirical comparison investigating the validity and reliability of the methods. To accomplish this empirical comparison, a benchmarking study comparing and evaluating HRA methods in assessing operator performance in simulator experiments is currently underway. In order to account for as many effects as possible in the construction of this benchmarking study, a literature review was conducted, reviewing pastmore » benchmarking studies in the areas of psychology and risk assessment. A number of lessons learned through these studies are presented in order to aid in the design of future HRA benchmarking endeavors.« less
Reducing random measurement error in assessing postural load on the back in epidemiologic surveys.
Burdorf, A
1995-02-01
The goal of this study was to design strategies to assess postural load on the back in occupational epidemiology by taking into account the reliability of measurement methods and the variability of exposure among the workers under study. Intermethod reliability studies were evaluated to estimate the systematic bias (accuracy) and random measurement error (precision) of various methods to assess postural load on the back. Intramethod reliability studies were reviewed to estimate random variability of back load over time. Intermethod surveys have shown that questionnaires have a moderate reliability for gross activities such as sitting, whereas duration of trunk flexion and rotation should be assessed by observation methods or inclinometers. Intramethod surveys indicate that exposure variability can markedly affect the reliability of estimates of back load if the estimates are based upon a single measurement over a certain time period. Equations have been presented to evaluate various study designs according to the reliability of the measurement method, the optimum allocation of the number of repeated measurements per subject, and the number of subjects in the study. Prior to a large epidemiologic study, an exposure-oriented survey should be conducted to evaluate the performance of measurement instruments and to estimate sources of variability for back load. The strategy for assessing back load can be optimized by balancing the number of workers under study and the number of repeated measurements per worker.
Murphy, Douglas J; Bruce, David A; Mercer, Stewart W; Eva, Kevin W
2009-05-01
To investigate the reliability and feasibility of six potential workplace-based assessment methods in general practice training: criterion audit, multi-source feedback from clinical and non-clinical colleagues, patient feedback (the CARE Measure), referral letters, significant event analysis, and video analysis of consultations. Performance of GP registrars (trainees) was evaluated with each tool to assess the reliabilities of the tools and feasibility, given raters and number of assessments needed. Participant experience of process determined by questionnaire. 171 GP registrars and their trainers, drawn from nine deaneries (representing all four countries in the UK), participated. The ability of each tool to differentiate between doctors (reliability) was assessed using generalisability theory. Decision studies were then conducted to determine the number of observations required to achieve an acceptably high reliability for "high-stakes assessment" using each instrument. Finally, descriptive statistics were used to summarise participants' ratings of their experience using these tools. Multi-source feedback from colleagues and patient feedback on consultations emerged as the two methods most likely to offer a reliable and feasible opinion of workplace performance. Reliability co-efficients of 0.8 were attainable with 41 CARE Measure patient questionnaires and six clinical and/or five non-clinical colleagues per doctor when assessed on two occasions. For the other four methods tested, 10 or more assessors were required per doctor in order to achieve a reliable assessment, making the feasibility of their use in high-stakes assessment extremely low. Participant feedback did not raise any major concerns regarding the acceptability, feasibility, or educational impact of the tools. The combination of patient and colleague views of doctors' performance, coupled with reliable competence measures, may offer a suitable evidence-base on which to monitor progress and completion of doctors' training in general practice.
Mosmuller, David G M; Maal, Thomas J; Prahl, Charlotte; Tan, Robin A; Mulder, Frans J; Schwirtz, Roderic M F; de Vet, Henrica C W; Bergé, Stefaan J; Don Griot, J P W
2017-08-01
For the assessment of the nasolabial appearance in cleft patients, a widely accepted, reliable scoring system is not available. In this study four different methods of assessment are compared, including 2D and 3D asymmetry and aesthetic assessments. The data and ratings from an earlier study using the Asher-McDade aesthetic index on 3D photographs and the outcomes of 3D facial distance mapping were compared to a 2D aesthetic assessment, the Cleft Aesthetic Rating Scale, and to SymNose, a computerized 2D asymmetry assessment technique. The reliability and correlation between the four assessment techniques were tested using a sample of 79 patients. The 3D asymmetry assessment had the highest reliability and could be performed by just one observer (Intraclass correlation coefficient (ICC): 0.99). The 2D asymmetry assessment of the nose was highly reliable when performed by just one observer (ICC: 0.89). However, for the 2D asymmetry assessment of the lip more observers were needed. For the 2D aesthetic assessments 3 observers were needed. The 3D aesthetic assessment had the lowest single-observer reliability (ICC: 0.38-0.56) of all four techniques. The agreement between the different assessment methods is poor to very poor. The highest correlation (R: 0.48) was found between 2D and 3D aesthetic assessments. Remarkably, the lowest correlations were found between 2D and 3D asymmetry assessments (0.08-0.17). Different assessment methods are not in agreement and seem to measure different nasolabial aspects. More research is needed to establish exactly what each assessment technique measures and which measurements or outcomes are relevant for the patients. Copyright © 2017 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Standard setting: comparison of two methods.
George, Sanju; Haque, M Sayeed; Oyebode, Femi
2006-09-14
The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard-setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method. The norm-reference method of standard-setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method. The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% - 87%). The modified Angoff method had an inter-rater reliability of 0.81-0.82 and a test-retest reliability of 0.59-0.74. There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability.
Reliability-based trajectory optimization using nonintrusive polynomial chaos for Mars entry mission
NASA Astrophysics Data System (ADS)
Huang, Yuechen; Li, Haiyang
2018-06-01
This paper presents the reliability-based sequential optimization (RBSO) method to settle the trajectory optimization problem with parametric uncertainties in entry dynamics for Mars entry mission. First, the deterministic entry trajectory optimization model is reviewed, and then the reliability-based optimization model is formulated. In addition, the modified sequential optimization method, in which the nonintrusive polynomial chaos expansion (PCE) method and the most probable point (MPP) searching method are employed, is proposed to solve the reliability-based optimization problem efficiently. The nonintrusive PCE method contributes to the transformation between the stochastic optimization (SO) and the deterministic optimization (DO) and to the approximation of trajectory solution efficiently. The MPP method, which is used for assessing the reliability of constraints satisfaction only up to the necessary level, is employed to further improve the computational efficiency. The cycle including SO, reliability assessment and constraints update is repeated in the RBSO until the reliability requirements of constraints satisfaction are satisfied. Finally, the RBSO is compared with the traditional DO and the traditional sequential optimization based on Monte Carlo (MC) simulation in a specific Mars entry mission to demonstrate the effectiveness and the efficiency of the proposed method.
Systematic review of methods for quantifying teamwork in the operating theatre
Marshall, D.; Sykes, M.; McCulloch, P.; Shalhoub, J.; Maruthappu, M.
2018-01-01
Background Teamwork in the operating theatre is becoming increasingly recognized as a major factor in clinical outcomes. Many tools have been developed to measure teamwork. Most fall into two categories: self‐assessment by theatre staff and assessment by observers. A critical and comparative analysis of the validity and reliability of these tools is lacking. Methods MEDLINE and Embase databases were searched following PRISMA guidelines. Content validity was assessed using measurements of inter‐rater agreement, predictive validity and multisite reliability, and interobserver reliability using statistical measures of inter‐rater agreement and reliability. Quantitative meta‐analysis was deemed unsuitable. Results Forty‐eight articles were selected for final inclusion; self‐assessment tools were used in 18 and observational tools in 28, and there were two qualitative studies. Self‐assessment of teamwork by profession varied with the profession of the assessor. The most robust self‐assessment tool was the Safety Attitudes Questionnaire (SAQ), although this failed to demonstrate multisite reliability. The most robust observational tool was the Non‐Technical Skills (NOTECHS) system, which demonstrated both test–retest reliability (P > 0·09) and interobserver reliability (Rwg = 0·96). Conclusion Self‐assessment of teamwork by the theatre team was influenced by professional differences. Observational tools, when used by trained observers, circumvented this.
Hinds, P S; Scandrett-Hibden, S; McAulay, L S
1990-04-01
The reliability and validity of qualitative research findings are viewed with scepticism by some scientists. This scepticism is derived from the belief that qualitative researchers give insufficient attention to estimating reliability and validity of data, and the differences between quantitative and qualitative methods in assessing data. The danger of this scepticism is that relevant and applicable research findings will not be used. Our purpose is to describe an evaluative strategy for use with qualitative data, a strategy that is a synthesis of quantitative and qualitative assessment methods. Results of the strategy and factors that influence its use are also described.
ERIC Educational Resources Information Center
Pantzare, Anna Lind
2015-01-01
In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examines if teachers' ratings of national tests in mathematics can be reliable without using monitoring, training, or other methods of external quality…
Rahman, Mohd Nasrull Abdol; Mohamad, Siti Shafika
2017-01-01
Computer works are associated with Musculoskeletal Disorders (MSDs). There are several methods have been developed to assess computer work risk factor related to MSDs. This review aims to give an overview of current techniques available for pen-and-paper-based observational methods in assessing ergonomic risk factors of computer work. We searched an electronic database for materials from 1992 until 2015. The selected methods were focused on computer work, pen-and-paper observational methods, office risk factors and musculoskeletal disorders. This review was developed to assess the risk factors, reliability and validity of pen-and-paper observational method associated with computer work. Two evaluators independently carried out this review. Seven observational methods used to assess exposure to office risk factor for work-related musculoskeletal disorders were identified. The risk factors involved in current techniques of pen and paper based observational tools were postures, office components, force and repetition. From the seven methods, only five methods had been tested for reliability. They were proven to be reliable and were rated as moderate to good. For the validity testing, from seven methods only four methods were tested and the results are moderate. Many observational tools already exist, but no single tool appears to cover all of the risk factors including working posture, office component, force, repetition and office environment at office workstations and computer work. Although the most important factor in developing tool is proper validation of exposure assessment techniques, the existing observational method did not test reliability and validity. Futhermore, this review could provide the researchers with ways on how to improve the pen-and-paper-based observational method for assessing ergonomic risk factors of computer work.
McCreesh, Karen M; Crotty, James M; Lewis, Jeremy S
2015-03-01
Narrowing of the subacromial space has been noted as a common feature of rotator cuff (RC) tendinopathy. It has been implicated in the development of symptoms and forms the basis for some surgical and rehabilitation approaches. Various radiological methods have been used to measure the subacromial space, which is represented by a two-dimensional measurement of acromiohumeral distance (AHD). A reliable method of measurement could be used to assess the impact of rehabilitation or surgical interventions for RC tendinopathy; however, there are no published reviews assessing the reliability of AHD measurement. The aim of this review was to systematically assess the evidence for the intrarater and inter-rater reliability of radiological methods of measuring AHD, in order to identify the most reliable method for use in RC tendinopathy. An electronic literature search was carried out and studies describing the reliability of any radiological method of measuring AHD in either healthy or RC tendinopathy groups were included. Eighteen studies met the inclusion criteria and were appraised by two reviewers using the Quality Appraisal for reliability Studies checklist. Eight studies were deemed to be of high methodological quality. Study weaknesses included lack of tester blinding, inadequate description of tester experience, lack of inclusion of symptomatic populations, poor reporting of statistical methods and unclear diagnosis. There was strong evidence for the reliability of ultrasound for measuring AHD, with moderate evidence for MRI and CT measures and conflicting evidence for radiographic methods. Overall, there was lack of research in RC tendinopathy populations, with only six studies including participants with shoulder pain. The results support the reliability of ultrasound and CT or MRI for the measurement of AHD; however, more studies in symptomatic populations are required. The reliability of AHD measurement using radiographs has not been supported by the studies reviewed. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
The Reliability, Impact, and Cost-Effectiveness of Value-Added Teacher Assessment Methods
ERIC Educational Resources Information Center
Yeh, Stuart S.
2012-01-01
This article reviews evidence regarding the intertemporal reliability of teacher rankings based on value-added methods. Value-added methods exhibit low reliability, yet are broadly supported by prominent educational researchers and are increasingly being used to evaluate and fire teachers. The article then presents a cost-effectiveness analysis…
Larsen, Camilla Marie; Juul-Kristensen, Birgit; Lund, Hans; Søgaard, Karen
2014-10-01
The aims were to compile a schematic overview of clinical scapular assessment methods and critically appraise the methodological quality of the involved studies. A systematic, computer-assisted literature search using Medline, CINAHL, SportDiscus and EMBASE was performed from inception to October 2013. Reference lists in articles were also screened for publications. From 50 articles, 54 method names were identified and categorized into three groups: (1) Static positioning assessment (n = 19); (2) Semi-dynamic (n = 13); and (3) Dynamic functional assessment (n = 22). Fifteen studies were excluded for evaluation due to no/few clinimetric results, leaving 35 studies for evaluation. Graded according to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN checklist), the methodological quality in the reliability and validity domains was "fair" (57%) to "poor" (43%), with only one study rated as "good". The reliability domain was most often investigated. Few of the assessment methods in the included studies that had "fair" or "good" measurement property ratings demonstrated acceptable results for both reliability and validity. We found a substantially larger number of clinical scapular assessment methods than previously reported. Using the COSMIN checklist the methodological quality of the included measurement properties in the reliability and validity domains were in general "fair" to "poor". None were examined for all three domains: (1) reliability; (2) validity; and (3) responsiveness. Observational evaluation systems and assessment of scapular upward rotation seem suitably evidence-based for clinical use. Future studies should test and improve the clinimetric properties, and especially diagnostic accuracy and responsiveness, to increase utility for clinical practice.
Comparing Methods for Assessing Reliability Uncertainty Based on Pass/Fail Data Collected Over Time
Abes, Jeff I.; Hamada, Michael S.; Hills, Charles R.
2017-12-20
In this paper, we compare statistical methods for analyzing pass/fail data collected over time; some methods are traditional and one (the RADAR or Rationale for Assessing Degradation Arriving at Random) was recently developed. These methods are used to provide uncertainty bounds on reliability. We make observations about the methods' assumptions and properties. Finally, we illustrate the differences between two traditional methods, logistic regression and Weibull failure time analysis, and the RADAR method using a numerical example.
Comparing Methods for Assessing Reliability Uncertainty Based on Pass/Fail Data Collected Over Time
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abes, Jeff I.; Hamada, Michael S.; Hills, Charles R.
In this paper, we compare statistical methods for analyzing pass/fail data collected over time; some methods are traditional and one (the RADAR or Rationale for Assessing Degradation Arriving at Random) was recently developed. These methods are used to provide uncertainty bounds on reliability. We make observations about the methods' assumptions and properties. Finally, we illustrate the differences between two traditional methods, logistic regression and Weibull failure time analysis, and the RADAR method using a numerical example.
Peterson, Eleanor B; Calhoun, Aaron W; Rider, Elizabeth A
2014-09-01
With increased recognition of the importance of sound communication skills and communication skills education, reliable assessment tools are essential. This study reports on the psychometric properties of an assessment tool based on the Kalamazoo Consensus Statement Essential Elements Communication Checklist. The Gap-Kalamazoo Communication Skills Assessment Form (GKCSAF), a modified version of an existing communication skills assessment tool, the Kalamazoo Essential Elements Communication Checklist-Adapted, was used to assess learners in a multidisciplinary, simulation-based communication skills educational program using multiple raters. 118 simulated conversations were available for analysis. Internal consistency and inter-rater reliability were determined by calculating a Cronbach's alpha score and intra-class correlation coefficients (ICC), respectively. The GKCSAF demonstrated high internal consistency with a Cronbach's alpha score of 0.844 (faculty raters) and 0.880 (peer observer raters), and high inter-rater reliability with an ICC of 0.830 (faculty raters) and 0.89 (peer observer raters). The Gap-Kalamazoo Communication Skills Assessment Form is a reliable method of assessing the communication skills of multidisciplinary learners using multi-rater methods within the learning environment. The Gap-Kalamazoo Communication Skills Assessment Form can be used by educational programs that wish to implement a reliable assessment and feedback system for a variety of learners. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Comparing Interrater reliability between eye examination and eye self-examination 1
de Lima, Maria Alzete; Pagliuca, Lorita Marlena Freitag; do Nascimento, Jennara Cândido; Caetano, Joselany Áfio
2017-01-01
Resume Objective: to compare Interrater reliability concerning two eye assessment methods. Method: quasi-experimental study conducted with 324 college students including eye self-examination and eye assessment performed by the researchers in a public university. Kappa coefficient was used to verify agreement. Results: reliability coefficients between Interraters ranged from 0.85 to 0.95, with statistical significance at 0.05. The exams to check for near acuity and peripheral vision presented a reasonable kappa >0.2. The remaining coefficients were higher, ranging from very to totally reliable. Conclusion: comparatively, the results of both methods were similar. The virtual manual on eye self-examination can be used to screen for eye conditions. PMID:29069269
A proposed method to investigate reliability throughout a questionnaire
2011-01-01
Background Questionnaires are used extensively in medical and health care research and depend on validity and reliability. However, participants may differ in interest and awareness throughout long questionnaires, which can affect reliability of their answers. A method is proposed for "screening" of systematic change in random error, which could assess changed reliability of answers. Methods A simulation study was conducted to explore whether systematic change in reliability, expressed as changed random error, could be assessed using unsupervised classification of subjects by cluster analysis (CA) and estimation of intraclass correlation coefficient (ICC). The method was also applied on a clinical dataset from 753 cardiac patients using the Jalowiec Coping Scale. Results The simulation study showed a relationship between the systematic change in random error throughout a questionnaire and the slope between the estimated ICC for subjects classified by CA and successive items in a questionnaire. This slope was proposed as an awareness measure - to assessing if respondents provide only a random answer or one based on a substantial cognitive effort. Scales from different factor structures of Jalowiec Coping Scale had different effect on this awareness measure. Conclusions Even though assumptions in the simulation study might be limited compared to real datasets, the approach is promising for assessing systematic change in reliability throughout long questionnaires. Results from a clinical dataset indicated that the awareness measure differed between scales. PMID:21974842
Reliability and Validity of the Footprint Assessment Method Using Photoshop CS5 Software.
Gutiérrez-Vilahú, Lourdes; Massó-Ortigosa, Núria; Costa-Tutusaus, Lluís; Guerra-Balic, Myriam
2015-05-01
Several sophisticated methods of footprint analysis currently exist. However, it is sometimes useful to apply standard measurement methods of recognized evidence with an easy and quick application. We sought to assess the reliability and validity of a new method of footprint assessment in a healthy population using Photoshop CS5 software (Adobe Systems Inc, San Jose, California). Forty-two footprints, corresponding to 21 healthy individuals (11 men with a mean ± SD age of 20.45 ± 2.16 years and 10 women with a mean ± SD age of 20.00 ± 1.70 years) were analyzed. Footprints were recorded in static bipedal standing position using optical podography and digital photography. Three trials for each participant were performed. The Hernández-Corvo, Chippaux-Smirak, and Staheli indices and the Clarke angle were calculated by manual method and by computerized method using Photoshop CS5 software. Test-retest was used to determine reliability. Validity was obtained by intraclass correlation coefficient (ICC). The reliability test for all of the indices showed high values (ICC, 0.98-0.99). Moreover, the validity test clearly showed no difference between techniques (ICC, 0.99-1). The reliability and validity of a method to measure, assess, and record the podometric indices using Photoshop CS5 software has been demonstrated. This provides a quick and accurate tool useful for the digital recording of morphostatic foot study parameters and their control.
Varga, Zsuzsanna; Cassoly, Estelle; Li, Qiyu; Oehlschlegel, Christian; Tapia, Coya; Lehr, Hans Anton; Klingbiel, Dirk; Thürlimann, Beat; Ruhstaller, Thomas
2015-01-01
Background Proliferative activity (Ki-67 Labelling Index) in breast cancer increasingly serves as an additional tool in the decision for or against adjuvant chemotherapy in midrange hormone receptor positive breast cancer. Ki-67 Index has been previously shown to suffer from high inter-observer variability especially in midrange (G2) breast carcinomas. In this study we conducted a systematic approach using different Ki-67 assessments on large tissue sections in order to identify the method with the highest reliability and the lowest variability. Materials and Methods Five breast pathologists retrospectively analyzed proliferative activity of 50 G2 invasive breast carcinomas using large tissue sections by assessing Ki-67 immunohistochemistry. Ki-67-assessments were done on light microscopy and on digital images following these methods: 1) assessing five regions, 2) assessing only darkly stained nuclei and 3) considering only condensed proliferative areas (‘hotspots’). An individual review (the first described assessment from 2008) was also performed. The assessments on light microscopy were done by estimating. All measurements were performed three times. Inter-observer and intra-observer reliabilities were calculated using the approach proposed by Eliasziw et al. Clinical cutoffs (14% and 20%) were tested using Fleiss’ Kappa. Results There was a good intra-observer reliability in 5 of 7 methods (ICC: 0.76–0.89). The two highest inter-observer reliability was fair to moderate (ICC: 0.71 and 0.74) in 2 methods (region-analysis and individual-review) on light microscopy. Fleiss’-kappa-values (14% cut-off) were the highest (moderate) using the original recommendation on light-microscope (Kappa 0.58). Fleiss’ kappa values (20% cut-off) were the highest (Kappa 0.48 each) in analyzing hotspots on light-microscopy and digital-analysis. No methodologies using digital-analysis were superior to the methods on light microscope. Conclusion Our results show that all methods on light-microscopy for Ki-67 assessment in large tissue sections resulted in a good intra-observer reliability. Region analysis and individual review (the original recommendation) on light-microscopy yielded the highest inter-observer reliability. These results show slight improvement to previously published data on poor-reproducibility and thus might be a practical-pragmatic way for routine assessment of Ki-67 Index in G2 breast carcinomas. PMID:25885288
Issues in benchmarking human reliability analysis methods : a literature review.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lois, Erasmia; Forester, John Alan; Tran, Tuan Q.
There is a diversity of human reliability analysis (HRA) methods available for use in assessing human performance within probabilistic risk assessment (PRA). Due to the significant differences in the methods, including the scope, approach, and underlying models, there is a need for an empirical comparison investigating the validity and reliability of the methods. To accomplish this empirical comparison, a benchmarking study is currently underway that compares HRA methods with each other and against operator performance in simulator studies. In order to account for as many effects as possible in the construction of this benchmarking study, a literature review was conducted,more » reviewing past benchmarking studies in the areas of psychology and risk assessment. A number of lessons learned through these studies are presented in order to aid in the design of future HRA benchmarking endeavors.« less
3D photography is a reliable method of measuring infantile haemangioma volume over time.
Robertson, Sarah A; Kimble, Roy M; Storey, Kristen J; Gee Kee, Emma L; Stockton, Kellie A
2016-09-01
Infantile haemangiomas are common lesions of infancy. With the development of novel treatments utilised to accelerate their regression, there is a need for a method of assessing these lesions over time. Volume is an ideal assessment method because of its quantifiable nature. This study investigated whether 3D photography is a valid tool for measuring the volume of infantile haemangiomas over time. Thirteen children with infantile haemangiomas presenting to the Vascular Anomalies Clinic, Royal Children's Hospital/Lady Cilento Children's Hospital treated with propranolol were included in the study. Lesion volume was assessed using 3D photography at presentation, one month and three months follow up. Intrarater reliability was determined by retracing all images several months after the initial mapping. Interrater reliability of the 3D camera software was determined by two investigators, blinded to each other's results, independently assessing infantile haemangioma volume. Lesion volume decreased significantly between presentation and three-month follow-up (p<0.001). Volume intra- and interrater reliability were excellent with ICC 0.991 (95% CI 0.982, 0.995) and 0.978 (95% CI 0.955, 0.989), respectively. This study demonstrates images taken with the 3D LifeViz™ camera and lesion volume calculated with Dermapix® software is a reliable method for assessing infantile haemangioma volume over time. Copyright © 2016 Elsevier Inc. All rights reserved.
Reliability of Hypernasality Rating: Comparison of 3 Different Methods for Perceptual Assessment.
Yamashita, Renata Paciello; Borg, Elisabet; Granqvist, Svante; Lohmander, Anette
2018-01-01
To compare reliability in auditory-perceptual assessment of hypernasality for 3 different methods and to explore the influence of language background. Comparative methodological study. Participants and Materials: Audio recordings of 5-year-old Swedish-speaking children with repaired cleft lip and palate consisting of 73 stimuli of 9 nonnasal single-word strings in 3 different randomized orders. Four experienced speech-language pathologists (2 native speakers of Brazilian-Portuguese and 2 native speakers of Swedish) participated as listeners. After individual training, each listener performed the hypernasality rating task. Each order of stimuli was analyzed individually using the 2-step, VISOR and Borg centiMax scale methods. Comparison of intra- and inter-rater reliability, and consistency for each method within language of the listener and between listener languages (Swedish and Brazilian-Portuguese). Good to excellent intra-rater reliability was found within each listener for all methods, 2-step: κ = 0.59-0.93; VISOR: intraclass correlation coefficient (ICC) = 0.80-0.99; Borg centiMax (cM) scale: ICC = 0.80-1.00. The highest inter-rater reliability was demonstrated for VISOR (ICC = 0.60-0.90) and Borg cM-scale (ICC = 0.40-0.80). High consistency within each method was found with the highest for the Borg cM scale (ICC = 0.89-0.91). There was a significant difference in the ratings between the Swedish and the Brazilian listeners for all methods. The category-ratio scale Borg cM was considered most reliable in the assessment of hypernasality. Language background of Brazilian-Portuguese listeners influenced the perceptual ratings of hypernasality in Swedish speech samples, despite their experience in perceptual assessment of cleft palate speech disorders.
Reliability Evaluation of Machine Center Components Based on Cascading Failure Analysis
NASA Astrophysics Data System (ADS)
Zhang, Ying-Zhi; Liu, Jin-Tong; Shen, Gui-Xiang; Long, Zhe; Sun, Shu-Guang
2017-07-01
In order to rectify the problems that the component reliability model exhibits deviation, and the evaluation result is low due to the overlook of failure propagation in traditional reliability evaluation of machine center components, a new reliability evaluation method based on cascading failure analysis and the failure influenced degree assessment is proposed. A direct graph model of cascading failure among components is established according to cascading failure mechanism analysis and graph theory. The failure influenced degrees of the system components are assessed by the adjacency matrix and its transposition, combined with the Pagerank algorithm. Based on the comprehensive failure probability function and total probability formula, the inherent failure probability function is determined to realize the reliability evaluation of the system components. Finally, the method is applied to a machine center, it shows the following: 1) The reliability evaluation values of the proposed method are at least 2.5% higher than those of the traditional method; 2) The difference between the comprehensive and inherent reliability of the system component presents a positive correlation with the failure influenced degree of the system component, which provides a theoretical basis for reliability allocation of machine center system.
A proposed method to investigate reliability throughout a questionnaire.
Wentzel-Larsen, Tore; Norekvål, Tone M; Ulvik, Bjørg; Nygård, Ottar; Pripp, Are H
2011-10-05
Questionnaires are used extensively in medical and health care research and depend on validity and reliability. However, participants may differ in interest and awareness throughout long questionnaires, which can affect reliability of their answers. A method is proposed for "screening" of systematic change in random error, which could assess changed reliability of answers. A simulation study was conducted to explore whether systematic change in reliability, expressed as changed random error, could be assessed using unsupervised classification of subjects by cluster analysis (CA) and estimation of intraclass correlation coefficient (ICC). The method was also applied on a clinical dataset from 753 cardiac patients using the Jalowiec Coping Scale. The simulation study showed a relationship between the systematic change in random error throughout a questionnaire and the slope between the estimated ICC for subjects classified by CA and successive items in a questionnaire. This slope was proposed as an awareness measure--to assessing if respondents provide only a random answer or one based on a substantial cognitive effort. Scales from different factor structures of Jalowiec Coping Scale had different effect on this awareness measure. Even though assumptions in the simulation study might be limited compared to real datasets, the approach is promising for assessing systematic change in reliability throughout long questionnaires. Results from a clinical dataset indicated that the awareness measure differed between scales.
A study of fault prediction and reliability assessment in the SEL environment
NASA Technical Reports Server (NTRS)
Basili, Victor R.; Patnaik, Debabrata
1986-01-01
An empirical study on estimation and prediction of faults, prediction of fault detection and correction effort, and reliability assessment in the Software Engineering Laboratory environment (SEL) is presented. Fault estimation using empirical relationships and fault prediction using curve fitting method are investigated. Relationships between debugging efforts (fault detection and correction effort) in different test phases are provided, in order to make an early estimate of future debugging effort. This study concludes with the fault analysis, application of a reliability model, and analysis of a normalized metric for reliability assessment and reliability monitoring during development of software.
Simpson, V; Hughes, M; Wilkinson, J; Herrick, A L; Dinsdale, G
2018-03-01
Digital ulcers are a major problem in patients with systemic sclerosis (SSc), causing severe pain and impairment of hand function. In addition, digital ulcers heal slowly and sometimes become infected, which can lead to gangrene and necessitate amputation if appropriate intervention is not taken. A reliable, objective method for assessing digital ulcer healing or progression is needed in both the clinical and research arenas. This study was undertaken to compare 2 computer-assisted planimetry methods of measurement of digital ulcer area on photographs (ellipse and freehand regions of interest [ROIs]), and to assess the reliability of photographic calibration and the 2 methods of area measurement. Photographs were taken of 107 digital ulcers in 36 patients with SSc spectrum disease. Three raters assessed the photographs. Custom software allowed raters to calibrate photograph dimensions and draw ellipse or freehand ROIs. The shapes and dimensions of the ROIs were saved for further analysis. Calibration (by a single rater performing 5 repeats per image) produced an intraclass correlation coefficient (intrarater reliability) of 0.99. The mean ± SD areas of digital ulcers assessed using ellipse and freehand ROIs were 18.7 ± 20.2 mm 2 and 17.6 ± 19.3 mm 2 , respectively. Intrarater and interrater reliability of the ellipse ROI were 0.97 and 0.77, respectively. For the freehand ROI, the intrarater and interrater reliability were 0.98 and 0.76, respectively. Our findings indicate that computer-assisted planimetry methods applied to SSc-related digital ulcers can be extremely reliable. Further work is needed to move toward applying these methods as outcome measures for clinical trials and in clinical settings. © 2017, American College of Rheumatology.
Reliability of in-Shoe Plantar Pressure Measurements in Rheumatoid Arthritis Patients
ERIC Educational Resources Information Center
Vidmar, Gaj; Novak, Primoz
2009-01-01
Plantar pressures measurement is a frequently used method in rehabilitation and related research. Metric characteristics of the F-Scan system have been assessed from different standpoints and in different patients, but not its reliability in rheumatoid arthritis patients. Therefore, our objective was to assess reliability of the F-Scan plantar…
Bowman, Gene L.; Shannon, Jackilen; Ho, Emily; Traber, Maret G.; Frei, Balz; Oken, Barry S.; Kaye, Jeffery A.; Quinn, Joseph F.
2010-01-01
Introduction There is great interest in nutritional strategies for the prevention of age-related cognitive decline, yet the best methods for nutritional assessment in populations at risk for dementia are still evolving. Our study objective was to test the reliability and validity of two common nutritional assessments (plasma nutrient biomarkers and Food Frequency Questionnaire) in people at risk for dementia. Methods Thirty-eight elders, half with amnestic -Mild Cognitive Impairment and half with intact cognition were recruited. Nutritional assessments were collected together at baseline and again at 1 month. Intraclass and Pearson correlation coefficients quantified reliability and validity. Results Twenty-six nutrients were examined and reliability was very good or better for 77% (20/26, ICC ≥ .75) of the plasma nutrient biomarkers and for 88% of the FFQ estimates. Twelve of the plasma nutrient estimates were as reliable as the commonly measured plasma cholesterol (ICC=.92). FFQ and plasma long-chain fatty acids (docosahexaenoic acid, r =.39, eicosapentaenoic acid, r = .39) and carotenoids (α-carotene, r =.49; lutein + zeaxanthin, r = .48; β-carotene, r = .43; β-cryptoxanthin, r = .41) were correlated, but no other FFQ estimates correlated with respective nutrient biomarkers. Correlations between FFQ and plasma fatty acids and carotenoids were significantly stronger after removing subjects with MCI. Conclusion The reliability and validity of plasma and FFQ nutrient estimates vary according to the nutrient of interest. Memory deficit attenuates FFQ estimate validity and inflates FFQ estimate reliability. Many plasma nutrient biomarkers have very good reliability over 1-month regardless of memory state. This method can circumvent sources of error seen in other less direct methods of nutritional assessment. PMID:20856100
Schiffman, Eric L.; Truelove, Edmond L.; Ohrbach, Richard; Anderson, Gary C.; John, Mike T.; List, Thomas; Look, John O.
2011-01-01
AIMS The purpose of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) Validation Project was to assess the diagnostic validity of this examination protocol. An overview is presented, including Axis I and II methodology and descriptive statistics for the study participant sample. This paper details the development of reliable methods to establish the reference standards for assessing criterion validity of the Axis I RDC/TMD diagnoses. Validity testing for the Axis II biobehavioral instruments was based on previously validated reference standards. METHODS The Axis I reference standards were based on the consensus of 2 criterion examiners independently performing a comprehensive history, clinical examination, and evaluation of imaging. Intersite reliability was assessed annually for criterion examiners and radiologists. Criterion exam reliability was also assessed within study sites. RESULTS Study participant demographics were comparable to those of participants in previous studies using the RDC/TMD. Diagnostic agreement of the criterion examiners with each other and with the consensus-based reference standards was excellent with all kappas ≥ 0.81, except for osteoarthrosis (moderate agreement, k = 0.53). Intrasite criterion exam agreement with reference standards was excellent (k ≥ 0.95). Intersite reliability of the radiologists for detecting computed tomography-disclosed osteoarthrosis and magnetic resonance imaging-disclosed disc displacement was good to excellent (k = 0.71 and 0.84, respectively). CONCLUSION The Validation Project study population was appropriate for assessing the reliability and validity of the RDC/TMD Axis I and II. The reference standards used to assess the validity of Axis I TMD were based on reliable and clinically credible methods. PMID:20213028
Otis, Colombe; Gervais, Julie; Guillot, Martin; Gervais, Julie-Anne; Gauvin, Dominique; Péthel, Catherine; Authier, Simon; Dansereau, Marc-André; Sarret, Philippe; Martel-Pelletier, Johanne; Pelletier, Jean-Pierre; Beaudry, Francis; Troncy, Eric
2016-06-23
Lack of validity in osteoarthritis pain models and assessment methods is suspected. Our goal was to 1) assess the repeatability and reproducibility of measurement and the influence of environment, and acclimatization, to different pain assessment outcomes in normal rats, and 2) test the concurrent validity of the most reliable methods in relation to the expression of different spinal neuropeptides in a chemical model of osteoarthritic pain. Repeatability and inter-rater reliability of reflexive nociceptive mechanical thresholds, spontaneous static weight-bearing, treadmill, rotarod, and operant place escape/avoidance paradigm (PEAP) were assessed by the intraclass correlation coefficient (ICC). The most reliable acclimatization protocol was determined by comparing coefficients of variation. In a pilot comparative study, the sensitivity and responsiveness to treatment of the most reliable methods were tested in the monosodium iodoacetate (MIA) model over 21 days. Two MIA (2 mg) groups (including one lidocaine treatment group) and one sham group (0.9 % saline) received an intra-articular (50 μL) injection. No effect of environment (observer, inverted circadian cycle, or exercise) was observed; all tested methods except mechanical sensitivity (ICC <0.3), offered good repeatability (ICC ≥0.7). The most reliable acclimatization protocol included five assessments over two weeks. MIA-related osteoarthritic change in pain was demonstrated with static weight-bearing, punctate tactile allodynia evaluation, treadmill exercise and operant PEAP, the latter being the most responsive to analgesic intra-articular lidocaine. Substance P and calcitonin gene-related peptide were higher in MIA groups compared to naive (adjusted P (adj-P) = 0.016) or sham-treated (adj-P = 0.029) rats. Repeated post-MIA lidocaine injection resulted in 34 times lower downregulation for spinal substance P compared to MIA alone (adj-P = 0.029), with a concomitant increase of 17 % in time spent on the PEAP dark side (indicative of increased comfort). This study of normal rats and rats with pain established the most reliable and sensitive pain assessment methods and an optimized acclimatization protocol. Operant PEAP testing was more responsive to lidocaine analgesia than other tests used, while neuropeptide spinal concentration is an objective quantification method attractive to support and validate different centralized pain functional assessment methods.
Risk assessment for construction projects of transport infrastructure objects
NASA Astrophysics Data System (ADS)
Titarenko, Boris
2017-10-01
The paper analyzes and compares different methods of risk assessment for construction projects of transport objects. The management of such type of projects demands application of special probabilistic methods due to large level of uncertainty of their implementation. Risk management in the projects requires the use of probabilistic and statistical methods. The aim of the work is to develop a methodology for using traditional methods in combination with robust methods that allow obtaining reliable risk assessments in projects. The robust approach is based on the principle of maximum likelihood and in assessing the risk allows the researcher to obtain reliable results in situations of great uncertainty. The application of robust procedures allows to carry out a quantitative assessment of the main risk indicators of projects when solving the tasks of managing innovation-investment projects. Calculation of damage from the onset of a risky event is possible by any competent specialist. And an assessment of the probability of occurrence of a risky event requires the involvement of special probabilistic methods based on the proposed robust approaches. Practice shows the effectiveness and reliability of results. The methodology developed in the article can be used to create information technologies and their application in automated control systems for complex projects.
Qin, D L; Jin, X N; Wang, S J; Wang, J J; Mamat, N; Wang, F J; Wang, Y; Shen, Z A; Sheng, L G; Forsman, M; Yang, L Y; Wang, S; Zhang, Z B; He, L H
2018-06-18
To form a new assessment method to evaluate postural workload comprehensively analyzing the dynamic and static postural workload for workers during their work process to analyze the reliability and validity, and to study the relation between workers' postural workload and work-related musculoskeletal disorders (WMSDs). In the study, 844 workers from electronic and railway vehicle manufacturing factories were selected as subjects investigated by using the China Musculoskeletal Questionnaire (CMQ) to form the postural workload comprehensive assessment method. The Cronbach's α, cluster analysis and factor analysis were used to assess the reliability and validity of the new assessment method. Non-conditional Logistic regression was used to analyze the relation between workers' postural workload and WMSDs. Reliability of the assessment method for postural workload: internal consistency analysis results showed that Cronbach's α was 0.934 and the results of split-half reliability indicated that Spearman-Brown coefficient was 0.881 and the correlation coefficient between the first part and the second was 0.787. Validity of the assessment method for postural workload: the results of cluster analysis indicated that square Euclidean distance between dynamic and static postural workload assessment in the same part or work posture was the shortest. The results of factor analysis showed that 2 components were extracted and the cumulative percentage of variance achieved 65.604%. The postural workload score of the different occupational workers showed significant difference (P<0.05) by covariance analysis. The results of nonconditional Logistic regression indicated that alcohol intake (OR=2.141, 95%CI 1.337-3.428) and obesity (OR=3.408, 95%CI 1.629-7.130) were risk factors for WMSDs. The risk for WMSDs would rise as workers' postural workload rose (OR=1.035, 95%CI 1.022-1.048). There was significant different risk for WMSDs in the different groups of workers distinguished by work type, gender and age. Female workers exhibited a higher prevalence for WMSDs (OR=2.626, 95%CI 1.414-4.879) and workers between 30-40 years of age (OR=1.909, 95%CI 1.237-2.946) as compared with those under 30. This method for comprehensively assessing postural workload is reliable and effective when used in assembling workers, and there is certain relation between the postural workload and WMSDs.
Reliability and validity of a brief method to assess nociceptive flexion reflex (NFR) threshold.
Rhudy, Jamie L; France, Christopher R
2011-07-01
The nociceptive flexion reflex (NFR) is a physiological tool to study spinal nociception. However, NFR assessment can take several minutes and expose participants to repeated suprathreshold stimulations. The 4 studies reported here assessed the reliability and validity of a brief method to assess NFR threshold that uses a single ascending series of stimulations (Peak 1 NFR), by comparing it to a well-validated method that uses 3 ascending/descending staircases of stimulations (Staircase NFR). Correlations between the NFR definitions were high, were on par with test-retest correlations of Staircase NFR, and were not affected by participant sex or chronic pain status. Results also indicated the test-retest reliabilities for the 2 definitions were similar. Using larger stimulus increments (4 mAs) to assess Peak 1 NFR tended to result in higher NFR threshold estimates than using the Staircase NFR definition, whereas smaller stimulus increments (2 mAs) tended to result in lower NFR threshold estimates than the Staircase NFR definition. Neither NFR definition was correlated with anxiety, pain catastrophizing, or anxiety sensitivity. In sum, a single ascending series of electrical stimulations results in a reliable and valid estimate of NFR threshold. However, caution may be warranted when comparing NFR thresholds across studies that differ in the ascending stimulus increments. This brief method to assess NFR threshold is reliable and valid; therefore, it should be useful to clinical pain researchers interested in quickly assessing inter- and intra-individual differences in spinal nociceptive processes. Copyright © 2011 American Pain Society. Published by Elsevier Inc. All rights reserved.
Navarro-Ramirez, Rodrigo; Berlin, Connor; Lang, Gernot; Hussain, Ibrahim; Janssen, Insa; Sloan, Stephen; Askin, Gulce; Avila, Mauricio J; Zubkov, Micaella; Härtl, Roger
2018-01-01
Two-dimensional radiographic methods have been proposed to evaluate the radiographic outcome after indirect decompression through extreme lateral interbody fusion (XLIF). However, the assessment of neural decompression in a single plane may underestimate the effect of indirect decompression on central canal and foraminal volumes. The present study aimed to assess the reliability and consistency of a novel 3-dimensional radiographic method that assesses neural decompression by volumetric analysis using a new generation of intraoperative fan-beam computed tomography scanner in patients undergoing XLIF. Prospectively collected data from 7 patients (9 levels) undergoing XLIF was retrospectively analyzed. Three independent, blind raters using imaging analysis software performed volumetric measurements pre- and postoperatively to determine central canal and foraminal volumes. Intrarater and Interrater reliability tests were performed to assess the reliability of this novel volumetric method. The interrater reliability between the three raters ranged from 0.800 to 0.952, P < 0.0001. The test-retest analysis on a randomly selected subset of three patients showed good to excellent internal reliability (range of 0.78-1.00) for all 3 raters. There was a significant increase in mean volume ≈20% for right foramen, left foramen, and central canal volumes postoperatively (P = 0.0472; P = 0.0066; P = 0.0003, respectively). Here we demonstrate a new volumetric analysis technique that is feasible, reliable, and reproducible amongst independent raters for central canal and foraminal volumes in the lumbar spine using an intraoperative computed tomography scanner. Copyright © 2017. Published by Elsevier Inc.
Sawchuk, Dena; Currie, Kris; Vich, Manuel Lagravere; Palomo, Juan Martin
2016-01-01
Objective To evaluate the accuracy and reliability of the diagnostic tools available for assessing maxillary transverse deficiencies. Methods An electronic search of three databases was performed from their date of establishment to April 2015, with manual searching of reference lists of relevant articles. Articles were considered for inclusion if they reported the accuracy or reliability of a diagnostic method or evaluation technique for maxillary transverse dimensions in mixed or permanent dentitions. Risk of bias was assessed in the included articles, using the Quality Assessment of Diagnostic Accuracy Studies tool-2. Results Nine articles were selected. The studies were heterogeneous, with moderate to low methodological quality, and all had a high risk of bias. Four suggested that the use of arch width prediction indices with dental cast measurements is unreliable for use in diagnosis. Frontal cephalograms derived from cone-beam computed tomography (CBCT) images were reportedly more reliable for assessing intermaxillary transverse discrepancies than posteroanterior cephalograms. Two studies proposed new three-dimensional transverse analyses with CBCT images that were reportedly reliable, but have not been validated for clinical sensitivity or specificity. No studies reported sensitivity, specificity, positive or negative predictive values or likelihood ratios, or ROC curves of the methods for the diagnosis of transverse deficiencies. Conclusions Current evidence does not enable solid conclusions to be drawn, owing to a lack of reliable high quality diagnostic studies evaluating maxillary transverse deficiencies. CBCT images are reportedly more reliable for diagnosis, but further validation is required to confirm CBCT's accuracy and diagnostic superiority. PMID:27668196
Barrett, Eva; McCreesh, Karen; Lewis, Jeremy
2014-02-01
A wide array of instruments are available for non-invasive thoracic kyphosis measurement. Guidelines for selecting outcome measures for use in clinical and research practice recommend that properties such as validity and reliability are considered. This systematic review reports on the reliability and validity of non-invasive methods for measuring thoracic kyphosis. A systematic search of 11 electronic databases located studies assessing reliability and/or validity of non-invasive thoracic kyphosis measurement techniques. Two independent reviewers used a critical appraisal tool to assess the quality of retrieved studies. Data was extracted by the primary reviewer. The results were synthesized qualitatively using a level of evidence approach. 27 studies satisfied the eligibility criteria and were included in the review. The reliability, validity and both reliability and validity were investigated by sixteen, two and nine studies respectively. 17/27 studies were deemed to be of high quality. In total, 15 methods of thoracic kyphosis were evaluated in retrieved studies. All investigated methods showed high (ICC ≥ .7) to very high (ICC ≥ .9) levels of reliability. The validity of the methods ranged from low to very high. The strongest levels of evidence for reliability exists in support of the Debrunner kyphometer, Spinal Mouse and Flexicurve index, and for validity supports the arcometer and Flexicurve index. Further reliability and validity studies are required to strengthen the level of evidence for the remaining methods of measurement. This should be addressed by future research. Copyright © 2013 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Lee, Guemin; Park, In-Yong
2012-01-01
Previous assessments of the reliability of test scores for testlet-composed tests have indicated that item-based estimation methods overestimate reliability. This study was designed to address issues related to the extent to which item-based estimation methods overestimate the reliability of test scores composed of testlets and to compare several…
NASA Astrophysics Data System (ADS)
Liu, Haixing; Savić, Dragan; Kapelan, Zoran; Zhao, Ming; Yuan, Yixing; Zhao, Hongbin
2014-07-01
Flow entropy is a measure of uniformity of pipe flows in water distribution systems. By maximizing flow entropy one can identify reliable layouts or connectivity in networks. In order to overcome the disadvantage of the common definition of flow entropy that does not consider the impact of pipe diameter on reliability, an extended definition of flow entropy, termed as diameter-sensitive flow entropy, is proposed. This new methodology is then assessed by using other reliability methods, including Monte Carlo Simulation, a pipe failure probability model, and a surrogate measure (resilience index) integrated with water demand and pipe failure uncertainty. The reliability assessment is based on a sample of WDS designs derived from an optimization process for each of the two benchmark networks. Correlation analysis is used to evaluate quantitatively the relationship between entropy and reliability. To ensure reliability, a comparative analysis between the flow entropy and the new method is conducted. The results demonstrate that the diameter-sensitive flow entropy shows consistently much stronger correlation with the three reliability measures than simple flow entropy. Therefore, the new flow entropy method can be taken as a better surrogate measure for reliability and could be potentially integrated into the optimal design problem of WDSs. Sensitivity analysis results show that the velocity parameters used in the new flow entropy has no significant impact on the relationship between diameter-sensitive flow entropy and reliability.
Culture Representation in Human Reliability Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
David Gertman; Julie Marble; Steven Novack
Understanding human-system response is critical to being able to plan and predict mission success in the modern battlespace. Commonly, human reliability analysis has been used to predict failures of human performance in complex, critical systems. However, most human reliability methods fail to take culture into account. This paper takes an easily understood state of the art human reliability analysis method and extends that method to account for the influence of culture, including acceptance of new technology, upon performance. The cultural parameters used to modify the human reliability analysis were determined from two standard industry approaches to cultural assessment: Hofstede’s (1991)more » cultural factors and Davis’ (1989) technology acceptance model (TAM). The result is called the Culture Adjustment Method (CAM). An example is presented that (1) reviews human reliability assessment with and without cultural attributes for a Supervisory Control and Data Acquisition (SCADA) system attack, (2) demonstrates how country specific information can be used to increase the realism of HRA modeling, and (3) discusses the differences in human error probability estimates arising from cultural differences.« less
Insightful practice: a reliable measure for medical revalidation
Guthrie, Bruce; Sullivan, Frank M; Mercer, Stewart W; Russell, Andrew; Bruce, David A
2012-01-01
Background Medical revalidation decisions need to be reliable if they are to reassure on the quality and safety of professional practice. This study tested an innovative method in which general practitioners (GPs) were assessed on their reflection and response to a set of externally specified feedback. Setting and participants 60 GPs and 12 GP appraisers in the Tayside region of Scotland, UK. Methods A feedback dataset was specified as (1) GP-specific data collected by GPs themselves (patient and colleague opinion; open book self-evaluated knowledge test; complaints) and (2) Externally collected practice-level data provided to GPs (clinical quality and prescribing safety). GPs' perceptions of whether the feedback covered UK General Medical Council specified attributes of a ‘good doctor’ were examined using a mapping exercise. GPs' professionalism was examined in terms of appraiser assessment of GPs' level of insightful practice, defined as: engagement with, insight into and appropriate action on feedback data. The reliability of assessment of insightful practice and subsequent recommendations on GPs' revalidation by face-to-face and anonymous assessors were investigated using Generalisability G-theory. Main outcome measures Coverage of General Medical Council attributes by specified feedback and reliability of assessor recommendations on doctors' suitability for revalidation. Results Face-to-face assessment proved unreliable. Anonymous global assessment by three appraisers of insightful practice was highly reliable (G=0.85), as were revalidation decisions using four anonymous assessors (G=0.83). Conclusions Unlike face-to-face appraisal, anonymous assessment of insightful practice offers a valid and reliable method to decide GP revalidation. Further validity studies are needed. PMID:22653078
Brownson, Ross C.; Chang, Jen Jen; Eyler, Amy A.; Ainsworth, Barbara E.; Kirtland, Karen A.; Saelens, Brian E.; Sallis, James F.
2004-01-01
Objectives. We tested the reliability of 3 instruments that assessed social and physical environments. Methods. We conducted a test–retest study among US adults (n = 289). We used telephone survey methods to measure suitableness of the perceived (vs objective) environment for recreational physical activity and nonmotorized transportation. Results. Most questions in our surveys that attempted to measure specific characteristics of the built environment showed moderate to high reliability. Questions about the social environment showed lower reliability than those that assessed the physical environment. Certain blocks of questions appeared to be selectively more reliable for urban or rural respondents. Conclusions. Despite differences in content and in response formats, all 3 surveys showed evidence of reliability, and most items are now ready for use in research and in public health surveillance. PMID:14998817
ERIC Educational Resources Information Center
Scherer, Marcia J.; McKee, Barbara G.
Validity and reliability data are presented for two instruments for assessing the predispositions that people have toward the use of assistive and educational technologies. The two instruments, the Assistive Technology Device Predisposition Assessment (ATDPA) and the Educational Technology Predisposition Assessment (ETPA), are self-report…
Computer-aided analysis with Image J for quantitatively assessing psoriatic lesion area.
Sun, Z; Wang, Y; Ji, S; Wang, K; Zhao, Y
2015-11-01
Body surface area is important in determining the severity of psoriasis. However, objective, reliable, and practical method is still in need for this purpose. We performed a computer image analysis (CIA) of psoriatic area using the image J freeware to determine whether this method could be used for objective evaluation of psoriatic area. Fifteen psoriasis patients were randomized to be treated with adalimumab or placebo in a clinical trial. At each visit, the psoriasis area of each body site was estimated by two physicians (E-method), and standard photographs were taken. The psoriasis area in the pictures was assessed with CIA using semi-automatic threshold selection (T-method), or manual selection (M-method, gold standard). The results assessed by the three methods were analyzed with reliability and affecting factors evaluated. Both T- and E-method correlated strongly with M-method, and T-method had a slightly stronger correlation with M-method. Both T- and E-methods had a good consistency between the evaluators. All the three methods were able to detect the change in the psoriatic area after treatment, while the E-method tends to overestimate. The CIA with image J freeware is reliable and practicable in quantitatively assessing the lesional of psoriasis area. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Park, Namgyoo K.; Chun, Monica Youngshin; Lee, Jinju
2016-01-01
Compared to the significant development of creativity studies, individual creativity research has not reached a meaningful consensus regarding the most valid and reliable method for assessing individual creativity. This study revisited 2 of the most popular methods for assessing individual creativity: subjective and objective methods. This study…
Bayesian methods in reliability
NASA Astrophysics Data System (ADS)
Sander, P.; Badoux, R.
1991-11-01
The present proceedings from a course on Bayesian methods in reliability encompasses Bayesian statistical methods and their computational implementation, models for analyzing censored data from nonrepairable systems, the traits of repairable systems and growth models, the use of expert judgment, and a review of the problem of forecasting software reliability. Specific issues addressed include the use of Bayesian methods to estimate the leak rate of a gas pipeline, approximate analyses under great prior uncertainty, reliability estimation techniques, and a nonhomogeneous Poisson process. Also addressed are the calibration sets and seed variables of expert judgment systems for risk assessment, experimental illustrations of the use of expert judgment for reliability testing, and analyses of the predictive quality of software-reliability growth models such as the Weibull order statistics.
Reliability and discriminatory power of methods for dental plaque quantification
RAGGIO, Daniela Prócida; BRAGA, Mariana Minatel; RODRIGUES, Jonas Almeida; FREITAS, Patrícia Moreira; IMPARATO, José Carlos Pettorossi; MENDES, Fausto Medeiros
2010-01-01
Objective This in situ study evaluated the discriminatory power and reliability of methods of dental plaque quantification and the relationship between visual indices (VI) and fluorescence camera (FC) to detect plaque. Material and Methods Six volunteers used palatal appliances with six bovine enamel blocks presenting different stages of plaque accumulation. The presence of plaque with and without disclosing was assessed using VI. Images were obtained with FC and digital camera in both conditions. The area covered by plaque was assessed. Examinations were done by two independent examiners. Data were analyzed by Kruskal-Wallis and Kappa tests to compare different conditions of samples and to assess the inter-examiner reproducibility. Results Some methods presented adequate reproducibility. The Turesky index and the assessment of area covered by disclosed plaque in the FC images presented the highest discriminatory powers. Conclusions The Turesky index and images with FC with disclosing present good reliability and discriminatory power in quantifying dental plaque. PMID:20485931
Langarika-Rocafort, Argia; Emparanza, José Ignacio; Aramendi, José F; Castellano, Julen; Calleja-González, Julio
2017-01-01
To examine the intra-observer reliability and agreement between five methods of measurement for dorsiflexion during Weight Bearing Dorsiflexion Lunge Test and to assess the degree of agreement between three methods in female athletes. Repeated measurements study design. Volleyball club. Twenty-five volleyball players. Dorsiflexion was evaluated using five methods: heel-wall distance, first toe-wall distance, inclinometer at tibia, inclinometer at Achilles tendon and the dorsiflexion angle obtained by a simple trigonometric function. For the statistical analysis, agreement was studied using the Bland-Altman method, the Standard Error of Measurement and the Minimum Detectable Change. Reliability analysis was performed using the Intraclass Correlation Coefficient. Measurement methods using the inclinometer had more than 6° of measurement error. The angle calculated by trigonometric function had 3.28° error. The reliability of inclinometer based methods had ICC values < 0.90. Distance based methods and trigonometric angle measurement had an ICC values > 0.90. Concerning the agreement between methods, there was from 1.93° to 14.42° bias, and from 4.24° to 7.96° random error. To assess DF angle in WBLT, the angle calculated by a trigonometric function is the most repeatable method. The methods of measurement cannot be used interchangeably. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Reliable, Feasible Method to Observe Neighborhoods at High Spatial Resolution
Kepper, Maura M.; Sothern, Melinda S.; Theall, Katherine P.; Griffiths, Lauren A.; Scribner, Richard; Tseng, Tung-Sung; Schaettle, Paul; Cwik, Jessica M.; Felker-Kantor, Erica; Broyles, Stephanie T.
2016-01-01
Introduction Systematic social observation (SSO) methods traditionally measure neighborhoods at street level and have been performed reliably using virtual applications to increase feasibility. Research indicates that collection at even higher spatial resolution may better elucidate the health impact of neighborhood factors, but whether virtual applications can reliably capture social determinants of health at the smallest geographic resolution (parcel level) remains uncertain. This paper presents a novel, parcel-level SSO methodology and assesses whether this new method can be collected reliably using Google Street View and is feasible. Methods Multiple raters (N=5) observed 42 neighborhoods. In 2016, inter-rater reliability (observed agreement and kappa coefficient) was compared for four SSO methods: (1) street-level in person; (2) street-level virtual; (3) parcel-level in person; and (4) parcel-level virtual. Intra-rater reliability (observed agreement and kappa coefficient) was calculated to determine whether parcel-level methods produce results comparable to traditional street-level observation. Results Substantial levels of inter-rater agreement were documented across all four methods; all methods had >70% of items with at least substantial agreement. Only physical decay showed higher levels of agreement (83% of items with >75% agreement) for direct versus virtual rating source. Intra-rater agreement comparing street- versus parcel-level methods resulted in observed agreement >75% for all but one item (90%). Conclusions Results support the use of Google Street View as a reliable, feasible tool for performing SSO at the smallest geographic resolution. Validation of a new parcel-level method collected virtually may improve the assessment of social determinants contributing to disparities in health behaviors and outcomes. PMID:27989289
Validation of a method for assessing resident physicians' quality improvement proposals.
Leenstra, James L; Beckman, Thomas J; Reed, Darcy A; Mundell, William C; Thomas, Kris G; Krajicek, Bryan J; Cha, Stephen S; Kolars, Joseph C; McDonald, Furman S
2007-09-01
Residency programs involve trainees in quality improvement (QI) projects to evaluate competency in systems-based practice and practice-based learning and improvement. Valid approaches to assess QI proposals are lacking. We developed an instrument for assessing resident QI proposals--the Quality Improvement Proposal Assessment Tool (QIPAT-7)-and determined its validity and reliability. QIPAT-7 content was initially obtained from a national panel of QI experts. Through an iterative process, the instrument was refined, pilot-tested, and revised. Seven raters used the instrument to assess 45 resident QI proposals. Principal factor analysis was used to explore the dimensionality of instrument scores. Cronbach's alpha and intraclass correlations were calculated to determine internal consistency and interrater reliability, respectively. QIPAT-7 items comprised a single factor (eigenvalue = 3.4) suggesting a single assessment dimension. Interrater reliability for each item (range 0.79 to 0.93) and internal consistency reliability among the items (Cronbach's alpha = 0.87) were high. This method for assessing resident physician QI proposals is supported by content and internal structure validity evidence. QIPAT-7 is a useful tool for assessing resident QI proposals. Future research should determine the reliability of QIPAT-7 scores in other residency and fellowship training programs. Correlations should also be made between assessment scores and criteria for QI proposal success such as implementation of QI proposals, resident scholarly productivity, and improved patient outcomes.
NASA Astrophysics Data System (ADS)
Abdenov, A. Zh; Trushin, V. A.; Abdenova, G. A.
2018-01-01
The paper considers the questions of filling the relevant SIEM nodes based on calculations of objective assessments in order to improve the reliability of subjective expert assessments. The proposed methodology is necessary for the most accurate security risk assessment of information systems. This technique is also intended for the purpose of establishing real-time operational information protection in the enterprise information systems. Risk calculations are based on objective estimates of the adverse events implementation probabilities, predictions of the damage magnitude from information security violations. Calculations of objective assessments are necessary to increase the reliability of the proposed expert assessments.
Beard, J D; Marriott, J; Purdie, H; Crossley, J
2011-01-01
To compare user satisfaction and acceptability, reliability and validity of three different methods of assessing the surgical skills of trainees by direct observation in the operating theatre across a range of different surgical specialties and index procedures. A 2-year prospective, observational study in the operating theatres of three teaching hospitals in Sheffield. The assessment methods were procedure-based assessment (PBA), Objective Structured Assessment of Technical Skills (OSATS) and Non-technical Skills for Surgeons (NOTSS). The specialties were obstetrics and gynaecology (O&G) and upper gastrointestinal, colorectal, cardiac, vascular and orthopaedic surgery. Two to four typical index procedures were selected from each specialty. Surgical trainees were directly observed performing typical index procedures and assessed using a combination of two of the three methods (OSATS or PBA and NOTSS for O&G, PBA and NOTSS for the other specialties) by the consultant clinical supervisor for the case and the anaesthetist and/or scrub nurse, as well as one or more independent assessors from the research team. Information on user satisfaction and acceptability of each assessment method from both assessor and trainee perspectives was obtained from structured questionnaires. The reliability of each method was measured using generalisability theory. Aspects of validity included the internal structure of each tool and correlation between tools, construct validity, predictive validity, interprocedural differences, the effect of assessor designation and the effect of assessment on performance. Of the 558 patients who were consented, a total of 437 (78%) cases were included in the study: 51 consultant clinical supervisors, 56 anaesthetists, 39 nurses, 2 surgical care practitioners and 4 independent assessors provided 1635 assessments on 85 trainees undertaking the 437 cases. A total of 749 PBAs, 695 NOTSS and 191 OSATSs were performed. Non-O&G clinical supervisors and trainees provided mixed, but predominantly positive, responses about a range of applications of PBA. Most felt that PBA was important in surgical education, and would use it again in the future and did not feel that it added time to the operating list. The overall satisfaction of O&G clinical supervisors and trainees with OSATS was not as high, and a majority of those who used both preferred PBA. A majority of anaesthetists and nurses felt that NOTSS allowed them to rate interpersonal skills (communication, teamwork and leadership) more easily than cognitive skills (situation awareness and decision-making), that it had formative value and that it was a valuable adjunct to the assessment of technical skills. PBA demonstrated high reliability (G > 0.8 for only three assessor judgements on the same index procedure). OSATS had lower reliability (G > 0.8 for five assessor judgements on the same index procedure). Both were less reliable on a mix of procedures because of strong procedure-specific factors. A direct comparison of PBA between O&G and non-O&G cases showed a striking difference in reliability. Within O&G, a good level of reliability (G > 0.8) could not be obtained using a feasible number of assessments. Conversely, the reliability within non-O&G cases was exceptionally high, with only two assessor judgements being required. The reasons for this difference probably include the more summative purpose of assessment in O&G and the much higher proportion of O&G trainees in this study with training concerns (42% vs 4%). The reliability of NOTSS was lower than that for PBA. Reliability for the same procedure (G > 0.8) required six assessor judgements. However, as procedure-specific factors exerted a lesser influence on NOTSS, reliability on a mix of procedures could be achieved using only eight assessor judgements. NOTSS also demonstrated a valid internal structure. The strongest correlations between NOTSS and PBA or OSATS were in the 'decision-making' domain. PBA and NOTSS showed better construct validity than OSATS, the year of training and the number of recent index procedures performed being significant independent predictors of performance. There was little variation in scoring between different procedures or different designations of assessor. The results suggest that PBA is a reliable and acceptable method of assessing surgical skills, with good construct validity. Specialties that use OSATS may wish to consider changing the design or switching to PBA. Whatever workplace-based assessment method is used, the purpose, timing and frequency of assessment require detailed guidance. NOTSS is a promising tool for the assessment of non-technical skills, and surgical specialties may wish to consider its inclusion in their assessment framework. Further research is required into the use of health-care professionals other than consultant surgeons to assess trainees, the relationship between performance and experience, the educational impact of assessment and the additional value of video recording.
Broyles, S T; Drazba, K T; Church, T S; Chaput, J-P; Fogelholm, M; Hu, G; Kuriyan, R; Kurpad, A; Lambert, E V; Maher, C; Maia, J; Matsudo, V; Olds, T; Onywera, V; Sarmiento, O L; Standage, M; Tremblay, M S; Tudor-Locke, C; Zhao, P; Katzmarzyk, P T
2015-01-01
Objectives: Schools are an important setting to enable and promote physical activity. Researchers have created a variety of tools to perform objective environmental assessments (or ‘audits') of other settings, such as neighborhoods and parks; yet, methods to assess the school physical activity environment are less common. The purpose of this study is to describe the approach used to objectively measure the school physical activity environment across 12 countries representing all inhabited continents, and to report on the reliability and feasibility of this methodology across these diverse settings. Methods: The International Study of Childhood Obesity, Lifestyle and the Environment (ISCOLE) school audit tool (ISAT) data collection required an in-depth training (including field practice and certification) and was facilitated by various supporting materials. Certified data collectors used the ISAT to assess the environment of all schools enrolled in ISCOLE. Sites completed a reliability audit (simultaneous audits by two independent, certified data collectors) for a minimum of two schools or at least 5% of their school sample. Item-level agreement between data collectors was assessed with both the kappa statistic and percent agreement. Inter-rater reliability of school summary scores was measured using the intraclass correlation coefficient. Results: Across the 12 sites, 256 schools participated in ISCOLE. Reliability audits were conducted at 53 schools (20.7% of the sample). For the assessed environmental features, inter-rater reliability (kappa) ranged from 0.37 to 0.96; 18 items (42%) were assessed with almost perfect reliability (κ=0.80–0.96), and a further 24 items (56%) were assessed with substantial reliability (κ=0.61–0.79). Likewise, scores that summarized a school's support for physical activity were highly reliable, with the exception of scores assessing aesthetics and perceived suitability of the school grounds for sport, informal games and general play. Conclusions: This study suggests that the ISAT can be used to conduct reliable objective audits of the school physical activity environment across diverse, international school settings. PMID:27152183
Schiffman, Eric L; Truelove, Edmond L; Ohrbach, Richard; Anderson, Gary C; John, Mike T; List, Thomas; Look, John O
2010-01-01
The purpose of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) Validation Project was to assess the diagnostic validity of this examination protocol. The aim of this article is to provide an overview of the project's methodology, descriptive statistics, and data for the study participant sample. This article also details the development of reliable methods to establish the reference standards for assessing criterion validity of the Axis I RDC/TMD diagnoses. The Axis I reference standards were based on the consensus of two criterion examiners independently performing a comprehensive history, clinical examination, and evaluation of imaging. Intersite reliability was assessed annually for criterion examiners and radiologists. Criterion examination reliability was also assessed within study sites. Study participant demographics were comparable to those of participants in previous studies using the RDC/TMD. Diagnostic agreement of the criterion examiners with each other and with the consensus-based reference standards was excellent with all kappas > or = 0.81, except for osteoarthrosis (moderate agreement, k = 0.53). Intrasite criterion examiner agreement with reference standards was excellent (k > or = 0.95). Intersite reliability of the radiologists for detecting computed tomography-disclosed osteoarthrosis and magnetic resonance imaging-disclosed disc displacement was good to excellent (k = 0.71 and 0.84, respectively). The Validation Project study population was appropriate for assessing the reliability and validity of the RDC/TMD Axis I and II. The reference standards used to assess the validity of Axis I TMD were based on reliable and clinically credible methods.
White, Sarah A; van den Broek, Nynke R
2004-05-30
Before introducing a new measurement tool it is necessary to evaluate its performance. Several statistical methods have been developed, or used, to evaluate the reliability and validity of a new assessment method in such circumstances. In this paper we review some commonly used methods. Data from a study that was conducted to evaluate the usefulness of a specific measurement tool (the WHO Colour Scale) is then used to illustrate the application of these methods. The WHO Colour Scale was developed under the auspices of the WHO to provide a simple portable and reliable method of detecting anaemia. This Colour Scale is a discrete interval scale, whereas the actual haemoglobin values it is used to estimate are on a continuous interval scale and can be measured accurately using electrical laboratory equipment. The methods we consider are: linear regression, correlation coefficients, paired t-tests plotting differences against mean values and deriving limits of agreement; kappa and weighted kappa statistics, sensitivity and specificity, an intraclass correlation coefficient and the repeatability coefficient. We note that although the definition and properties of each of these methods is well established inappropriate methods continue to be used in medical literature for assessing reliability and validity, as evidenced in the context of the evaluation of the WHO Colour Scale. Copyright 2004 John Wiley & Sons, Ltd.
Development of a probabilistic analysis methodology for structural reliability estimation
NASA Technical Reports Server (NTRS)
Torng, T. Y.; Wu, Y.-T.
1991-01-01
The novel probabilistic analysis method for assessment of structural reliability presented, which combines fast-convolution with an efficient structural reliability analysis, can after identifying the most important point of a limit state proceed to establish a quadratic-performance function. It then transforms the quadratic function into a linear one, and applies fast convolution. The method is applicable to problems requiring computer-intensive structural analysis. Five illustrative examples of the method's application are given.
The Arthroscopic Surgical Skill Evaluation Tool (ASSET).
Koehler, Ryan J; Amsdell, Simon; Arendt, Elizabeth A; Bisson, Leslie J; Braman, Jonathan P; Bramen, Jonathan P; Butler, Aaron; Cosgarea, Andrew J; Harner, Christopher D; Garrett, William E; Olson, Tyson; Warme, Winston J; Nicandri, Gregg T
2013-06-01
Surgeries employing arthroscopic techniques are among the most commonly performed in orthopaedic clinical practice; however, valid and reliable methods of assessing the arthroscopic skill of orthopaedic surgeons are lacking. The Arthroscopic Surgery Skill Evaluation Tool (ASSET) will demonstrate content validity, concurrent criterion-oriented validity, and reliability when used to assess the technical ability of surgeons performing diagnostic knee arthroscopic surgery on cadaveric specimens. Cross-sectional study; Level of evidence, 3. Content validity was determined by a group of 7 experts using the Delphi method. Intra-articular performance of a right and left diagnostic knee arthroscopic procedure was recorded for 28 residents and 2 sports medicine fellowship-trained attending surgeons. Surgeon performance was assessed by 2 blinded raters using the ASSET. Concurrent criterion-oriented validity, interrater reliability, and test-retest reliability were evaluated. Content validity: The content development group identified 8 arthroscopic skill domains to evaluate using the ASSET. Concurrent criterion-oriented validity: Significant differences in the total ASSET score (P < .05) between novice, intermediate, and advanced experience groups were identified. Interrater reliability: The ASSET scores assigned by each rater were strongly correlated (r = 0.91, P < .01), and the intraclass correlation coefficient between raters for the total ASSET score was 0.90. Test-retest reliability: There was a significant correlation between ASSET scores for both procedures attempted by each surgeon (r = 0.79, P < .01). The ASSET appears to be a useful, valid, and reliable method for assessing surgeon performance of diagnostic knee arthroscopic surgery in cadaveric specimens. Studies are ongoing to determine its generalizability to other procedures as well as to the live operating room and other simulated environments.
Reliability Analysis of the MSC System
NASA Astrophysics Data System (ADS)
Kim, Young-Soo; Lee, Do-Kyoung; Lee, Chang-Ho; Woo, Sun-Hee
2003-09-01
MSC (Multi-Spectral Camera) is the payload of KOMPSAT-2, which is being developed for earth imaging in optical and near-infrared region. The design of the MSC is completed and its reliability has been assessed from part level to the MSC system level. The reliability was analyzed in worst case and the analysis results showed that the value complies the required value of 0.9. In this paper, a calculation method of reliability for the MSC system is described, and assessment result is presented and discussed.
Anatomical landmark asymmetry assessment in the lumbar spine and pelvis: a review of reliability.
Stovall, Bradley Alan; Kumar, Shrawan
2010-01-01
The purpose of this article is to review current research investigating the reliability of bony anatomical landmark positional asymmetry assessment in the lumbar spine and pelvis, to determine the agreement on findings between authors, and to explore future directions in the area to address the significant issues. The databases MEDLINE, CINAHL, AMED, MANTIS, Academic Search Complete, and Web of Knowledge were searched. A total of 23 articles were identified and reviewed, 10 of which met the inclusion criteria. For these 10 articles, the average interexaminer reliability for bony anatomical landmark positional asymmetry assessment was slightly above chance for all landmarks except medial malleolus, which had fair reliability. Interexaminer reliability on average was less than intraexaminer reliability (anterior superior iliac spine, k = 0.128/0.414; posterior superior iliac spine, k = 0.092/0.371). All interexaminer reliability averages were below values of clinical significance. From the current literature review, bony anatomical landmark positional asymmetry assessment in the lumbar spine and pelvis has not been demonstrated to be a reliable assessment method. However, there are unexplored factors that, after standardization, may improve reliability and further the understanding of musculoskeletal palpatory examination.
How Reliable is the Acetabular Cup Position Assessment from Routine Radiographs?
Carvajal Alba, Jaime A.; Vincent, Heather K.; Sodhi, Jagdeep S.; Latta, Loren L.; Parvataneni, Hari K.
2017-01-01
Abstract Background: Cup position is crucial for optimal outcomes in total hip arthroplasty. Radiographic assessment of component position is routinely performed in the early postoperative period. Aims: The aims of this study were to determine in a controlled environment if routine radiographic methods accurately and reliably assess the acetabular cup position and to assess if there is a statistical difference related to the rater’s level of training. Methods: A pelvic model was mounted in a spatial frame. An acetabular cup was fixed in different degrees of version and inclination. Standardized radiographs were obtained. Ten observers including five fellowship-trained orthopaedic surgeons and five orthopaedic residents performed a blind assessment of cup position. Inclination was assessed from anteroposterior radiographs of the pelvis and version from cross-table lateral radiographs of the hip. Results: The radiographic methods used showed to be imprecise specially when the cup was positioned at the extremes of version and inclination. An excellent inter-observer reliability (Intra-class coefficient > 0,9) was evidenced. There were no differences related to the level of training of the raters. Conclusions: These widely used radiographic methods should be interpreted cautiously and computed tomography should be utilized in cases when further intervention is contemplated. PMID:28852355
Duff, Kevin
2012-01-01
Repeated assessments are a relatively common occurrence in clinical neuropsychology. The current paper will review some of the relevant concepts (e.g., reliability, practice effects, alternate forms) and methods (e.g., reliable change index, standardized based regression) that are used in repeated neuropsychological evaluations. The focus will be on the understanding and application of these concepts and methods in the evaluation of the individual patient through examples. Finally, some future directions for assessing change will be described. PMID:22382384
Estimation of the KR20 Reliability Coefficient When Data Are Incomplete.
ERIC Educational Resources Information Center
Huynh, Huynh
Three techniques for estimating Kuder Richardson reliability (KR20) coefficients for incomplete data are contrasted. The methods are: (1) Henderson's Method 1 (analysis of variance, or ANOVA); (2) Henderson's Method 3 (FITCO); and (3) Koch's method of symmetric sums (SYSUM). A Monte Carlo simulation was used to assess the precision of the three…
Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis
NASA Technical Reports Server (NTRS)
Dezfuli, Homayoon; Kelly, Dana; Smith, Curtis; Vedros, Kurt; Galyean, William
2009-01-01
This document, Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis, is intended to provide guidelines for the collection and evaluation of risk and reliability-related data. It is aimed at scientists and engineers familiar with risk and reliability methods and provides a hands-on approach to the investigation and application of a variety of risk and reliability data assessment methods, tools, and techniques. This document provides both: A broad perspective on data analysis collection and evaluation issues. A narrow focus on the methods to implement a comprehensive information repository. The topics addressed herein cover the fundamentals of how data and information are to be used in risk and reliability analysis models and their potential role in decision making. Understanding these topics is essential to attaining a risk informed decision making environment that is being sought by NASA requirements and procedures such as 8000.4 (Agency Risk Management Procedural Requirements), NPR 8705.05 (Probabilistic Risk Assessment Procedures for NASA Programs and Projects), and the System Safety requirements of NPR 8715.3 (NASA General Safety Program Requirements).
The reliability of the pass/fail decision for assessments comprised of multiple components
Möltner, Andreas; Tımbıl, Sevgi; Jünger, Jana
2015-01-01
Objective: The decision having the most serious consequences for a student taking an assessment is the one to pass or fail that student. For this reason, the reliability of the pass/fail decision must be determined for high quality assessments, just as the measurement reliability of the point values. Assessments in a particular subject (graded course credit) are often composed of multiple components that must be passed independently of each other. When “conjunctively” combining separate pass/fail decisions, as with other complex decision rules for passing, adequate methods of analysis are necessary for estimating the accuracy and consistency of these classifications. To date, very few papers have addressed this issue; a generally applicable procedure was published by Douglas and Mislevy in 2010. Using the example of an assessment comprised of several parts that must be passed separately, this study analyzes the reliability underlying the decision to pass or fail students and discusses the impact of an improved method for identifying those who do not fulfill the minimum requirements. Method: The accuracy and consistency of the decision to pass or fail an examinee in the subject cluster Internal Medicine/General Medicine/Clinical Chemistry at the University of Heidelberg’s Faculty of Medicine was investigated. This cluster requires students to separately pass three components (two written exams and an OSCE), whereby students may reattempt to pass each component twice. Our analysis was carried out using the method described by Douglas and Mislevy. Results: Frequently, when complex logical connections exist between the individual pass/fail decisions in the case of low failure rates, only a very low reliability for the overall decision to grant graded course credit can be achieved, even if high reliabilities exist for the various components. For the example analyzed here, the classification accuracy and consistency when conjunctively combining the three individual parts is relatively low with κ=0.49 or κ=0.47, despite the good reliability of over 0.75 for each of the three components. The option to repeat each component twice leads to a situation in which only about half of the candidates who do not satisfy the minimum requirements would fail the overall assessment, while the other half is able to continue their studies despite having deficient knowledge and skills. Conclusion: The method put forth by Douglas and Mislevy allows the analysis of the decision accuracy and consistency for complex combinations of scores from different components. Even in the case of highly reliable components, it is not necessarily so that a reliable pass/fail decision has been reached – for instance in the case of low failure rates. Assessments must be administered with the explicit goal of identifying examinees that do not fulfill the minimum requirements. PMID:26483855
ERIC Educational Resources Information Center
Lam, Ling Chi Tenny
2010-01-01
In writing assessment, there are quite a number of factors influencing the marking stability and the reliability of the assessment such as the attitude towards marking and consistency of markers, the physical environment, the design of the items, and marking rubrics. Even the methods to train markers have effects on the reliability of the…
Weyers, Simone; Jemi, Iman; Karger, André; Raski, Bianca; Rotthoff, Thomas; Pentzek, Michael; Mortsiefer, Achim
2016-01-01
Background: Imparting communication skills has been given great importance in medical curricula. In addition to standardized assessments, students should communicate with real patients in actual clinical situations during workplace-based assessments and receive structured feedback on their performance. The aim of this project was to pilot a formative testing method for workplace-based assessment. Our investigation centered in particular on whether or not physicians view the method as feasible and how high acceptance is among students. In addition, we assessed the reliability of the method. Method: As part of the project, 16 students held two consultations each with chronically ill patients at the medical practice where they were completing GP training. These consultations were video-recorded. The trained mentoring physician rated the student’s performance and provided feedback immediately following the consultations using the Berlin Global Rating scale (BGR). Two impartial, trained raters also evaluated the videos using BGR. For qualitative and quantitative analysis, information on how physicians and students viewed feasibility and their levels of acceptance was collected in written form in a partially standardized manner. To test for reliability, the test-retest reliability was calculated for both of the overall evaluations given by each rater. The inter-rater reliability was determined for the three evaluations of each individual consultation. Results: The formative assessment method was rated positively by both physicians and students. It is relatively easy to integrate into daily routines. Its significant value lies in the personal, structured and recurring feedback. The two overall scores for each patient consultation given by the two impartial raters correlate moderately. The degree of uniformity among the three raters in respect to the individual consultations is low. Discussion: Within the scope of this pilot project, only a small sample of physicians and students could be surveyed to a limited extent. There are indications that the assessment can be improved by integrating more information on medical context and student self-assessments. Despite the current limitations regarding test criteria, it is clear that workplace-based assessment of communication skills in the clinical setting is a valuable addition to the communication curricula of medical schools. PMID:27990466
Weyers, Simone; Jemi, Iman; Karger, André; Raski, Bianca; Rotthoff, Thomas; Pentzek, Michael; Mortsiefer, Achim
2016-01-01
Background: Imparting communication skills has been given great importance in medical curricula. In addition to standardized assessments, students should communicate with real patients in actual clinical situations during workplace-based assessments and receive structured feedback on their performance. The aim of this project was to pilot a formative testing method for workplace-based assessment. Our investigation centered in particular on whether or not physicians view the method as feasible and how high acceptance is among students. In addition, we assessed the reliability of the method. Method: As part of the project, 16 students held two consultations each with chronically ill patients at the medical practice where they were completing GP training. These consultations were video-recorded. The trained mentoring physician rated the student's performance and provided feedback immediately following the consultations using the Berlin Global Rating scale (BGR). Two impartial, trained raters also evaluated the videos using BGR. For qualitative and quantitative analysis, information on how physicians and students viewed feasibility and their levels of acceptance was collected in written form in a partially standardized manner. To test for reliability, the test-retest reliability was calculated for both of the overall evaluations given by each rater. The inter-rater reliability was determined for the three evaluations of each individual consultation. Results: The formative assessment method was rated positively by both physicians and students. It is relatively easy to integrate into daily routines. Its significant value lies in the personal, structured and recurring feedback. The two overall scores for each patient consultation given by the two impartial raters correlate moderately. The degree of uniformity among the three raters in respect to the individual consultations is low. Discussion: Within the scope of this pilot project, only a small sample of physicians and students could be surveyed to a limited extent. There are indications that the assessment can be improved by integrating more information on medical context and student self-assessments. Despite the current limitations regarding test criteria, it is clear that workplace-based assessment of communication skills in the clinical setting is a valuable addition to the communication curricula of medical schools.
Michels, Nele R M; Driessen, Erik W; Muijtjens, Arno M M; Van Gaal, Luc F; Bossaert, Leo L; De Winter, Benedicte Y
2009-12-01
A portfolio is used to mentor and assess students' clinical performance at the workplace. However, students and raters often perceive the portfolio as a time-consuming instrument. In this study, we investigated whether assessment during medical internship by a portfolio can combine reliability and feasibility. The domain-oriented reliability of 61 double-rated portfolios was measured, using a generalisability analysis with portfolio tasks and raters as sources of variation in measuring the performance of a student. We obtained reliability (Phi coefficient) of 0.87 with this internship portfolio containing 15 double-rated tasks. The generalisability analysis showed that an acceptable level of reliability (Phi = 0.80) was maintained when the amount of portfolio tasks was decreased to 13 or 9 using one and two raters, respectively. Our study shows that a portfolio can be a reliable method for the assessment of workplace learning. The possibility of reducing the amount of tasks or raters while maintaining a sufficient level of reliability suggests an increase in feasibility of portfolio use for both students and raters.
Gutiérrez-Vilahú, Lourdes; Massó-Ortigosa, Núria; Rey-Abella, Ferran; Costa-Tutusaus, Lluís; Guerra-Balic, Myriam
2016-05-01
People with Down syndrome present skeletal abnormalities in their feet that can be analyzed by commonly used gold standard indices (the Hernández-Corvo index, the Chippaux-Smirak index, the Staheli arch index, and the Clarke angle) based on footprint measurements. The use of Photoshop CS5 software (Adobe Systems Software Ireland Ltd, Dublin, Ireland) to measure footprints has been validated in the general population. The present study aimed to assess the reliability and validity of this footprint assessment technique in the population with Down syndrome. Using optical podography and photography, 44 footprints from 22 patients with Down syndrome (11 men [mean ± SD age, 23.82 ± 3.12 years] and 11 women [mean ± SD age, 24.82 ± 6.81 years]) were recorded in a static bipedal standing position. A blinded observer performed the measurements using a validated manual method three times during the 4-month study, with 2 months between measurements. Test-retest was used to check the reliability of the Photoshop CS5 software measurements. Validity and reliability were obtained by intraclass correlation coefficient (ICC). The reliability test for all of the indices showed very good values for the Photoshop CS5 method (ICC, 0.982-0.995). Validity testing also found no differences between the techniques (ICC, 0.988-0.999). The Photoshop CS5 software method is reliable and valid for the study of footprints in young people with Down syndrome.
Mosmuller, David; Tan, Robin; Mulder, Frans; Bachour, Yara; de Vet, Henrica; Don Griot, Peter
2016-10-01
It is essential to have a reliable assessment method in order to compare the results of cleft lip and palate surgery. In this study the computer-based program SymNose, a method for quantitative assessment of the nose and lip, will be assessed on usability and reliability. The symmetry of the nose and lip was measured twice in 50 six-year-old complete and incomplete unilateral cleft lip and palate patients by four observers. For the frontal view the asymmetry level of the nose and upper lip were evaluated and for the basal view the asymmetry level of the nose and nostrils were evaluated. A mean inter-observer reliability when tracing each image once or twice was 0.70 and 0.75, respectively. Tracing the photographs with 2 observers and 4 observers gave a mean inter-observer score of 0.86 and 0.92, respectively. The mean intra-observer reliability varied between 0.80 and 0.84. SymNose is a practical and reliable tool for the retrospective assessment of large caseloads of 2D photographs of cleft patients for research purposes. Moderate to high single inter-observer reliability was found. For future research with SymNose reliable outcomes can be achieved by using the average outcomes of single tracings of two observers. Copyright © 2016 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Scale for positive aspects of caregiving experience: development, reliability, and factor structure.
Kate, N; Grover, S; Kulhara, P; Nehra, R
2012-06-01
OBJECTIVE. To develop an instrument (Scale for Positive Aspects of Caregiving Experience [SPACE]) that evaluates positive caregiving experience and assess its psychometric properties. METHODS. Available scales which assess some aspects of positive caregiving experience were reviewed and a 50-item questionnaire with a 5-point rating was constructed. In all, 203 primary caregivers of patients with severe mental disorders were asked to complete the questionnaire. Internal consistency, test-retest reliability, cross-language reliability, split-half reliability, and face validity were evaluated. Principal component factor analysis was run to assess the factorial validity of the scale. RESULTS. The scale developed as part of the study was found to have good internal consistency, test-retest reliability, cross-language reliability, split-half reliability, and face validity. Principal component factor analysis yielded a 4-factor structure, which also had good test-retest reliability and cross-language reliability. There was a strong correlation between the 4 factors obtained. CONCLUSION. The SPACE developed as part of this study has good psychometric properties.
Saloheimo, T; González, S A; Erkkola, M; Milauskas, D M; Meisel, J D; Champagne, C M; Tudor-Locke, C; Sarmiento, O; Katzmarzyk, P T; Fogelholm, M
2015-01-01
Objective: The main aim of this study was to assess the reliability and validity of a food frequency questionnaire with 23 food groups (I-FFQ) among a sample of 9–11-year-old children from three different countries that differ on economical development and income distribution, and to assess differences between country sites. Furthermore, we assessed factors associated with I-FFQ's performance. Methods: This was an ancillary study of the International Study of Childhood Obesity, Lifestyle and the Environment. Reliability (n=321) and validity (n=282) components of this study had the same participants. Participation rates were 95% and 70%, respectively. Participants completed two I-FFQs with a mean interval of 4.9 weeks to assess reliability. A 3-day pre-coded food diary (PFD) was used as the reference method in the validity analyses. Wilcoxon signed-rank tests, intraclass correlation coefficients and cross-classifications were used to assess the reliability of I-FFQ. Spearman correlation coefficients, percentage difference and cross-classifications were used to assess the validity of I-FFQ. A logistic regression model was used to assess the relation of selected variables with the estimate of validity. Analyses based on information in the PFDs were performed to assess how participants interpreted food groups. Results: Reliability correlation coefficients ranged from 0.37 to 0.78 and gross misclassification for all food groups was <5%. Validity correlation coefficients were below 0.5 for 22/23 food groups, and they differed among country sites. For validity, gross misclassification was <5% for 22/23 food groups. Over- or underestimation did not appear for 19/23 food groups. Logistic regression showed that country of participation and parental education were associated (P⩽0.05) with the validity of I-FFQ. Analyses of children's interpretation of food groups suggested that the meaning of most food groups was understood by the children. Conclusion: I-FFQ is a moderately reliable method and its validity ranged from low to moderate, depending on food group and country site. PMID:27152180
Validation and Improvement of Reliability Methods for Air Force Building Systems
focusing primarily on HVAC systems . This research used contingency analysis to assess the performance of each model for HVAC systems at six Air Force...probabilistic model produced inflated reliability calculations for HVAC systems . In light of these findings, this research employed a stochastic method, a...Nonhomogeneous Poisson Process (NHPP), in an attempt to produce accurate HVAC system reliability calculations. This effort ultimately concluded that
Development of a Peer Teaching-Assessment Program and a Peer Observation and Evaluation Tool
Trujillo, Jennifer M.; Barr, Judith; Gonyeau, Michael; Van Amburgh, Jenny A.; Matthews, S. James; Qualters, Donna
2008-01-01
Objectives To develop a formalized, comprehensive, peer-driven teaching assessment program and a valid and reliable assessment tool. Methods A volunteer taskforce was formed and a peer-assessment program was developed using a multistep, sequential approach and the Peer Observation and Evaluation Tool (POET). A pilot study was conducted to evaluate the efficiency and practicality of the process and to establish interrater reliability of the tool. Intra-class correlation coefficients (ICC) were calculated. Results ICCs for 8 separate lectures evaluated by 2-3 observers ranged from 0.66 to 0.97, indicating good interrater reliability of the tool. Conclusion Our peer assessment program for large classroom teaching, which includes a valid and reliable evaluation tool, is comprehensive, feasible, and can be adopted by other schools of pharmacy. PMID:19325963
Vinco, L J; Giacomelli, S; Campana, L; Chiari, M; Vitale, N; Lombardi, G; Veldkamp, T; Hocking, P M
2018-02-01
1. An experiment was conducted to compare 5 different methods for the evaluation of litter moisture. 2. For litter collection and assessment, 55 farms were selected, one shed from each farm was inspected and 9 points were identified within each shed. 3. For each device, used for the evaluation of litter moisture, mean and standard deviation of wetness measures per collection point were assessed. 4. The reliability and overall consistency between the 5 instruments used to measure wetness were high (α = 0.72). 5. Measurement of three out of the 9 collection points were sufficient to provide a reliable assessment of litter moisture throughout the shed. 6. Based on the direct correlation between litter moisture and footpad lesions, litter moisture measurement can be used as a resource based on-farm animal welfare indicator. 7. Among the 5 methods analysed, visual scoring is the most simple and practical, and therefore the best candidate to be used on-farm for animal welfare assessment.
Perraton, Luke G.; Bower, Kelly J.; Adair, Brooke; Pua, Yong-Hao; Williams, Gavin P.; McGaw, Rebekah
2015-01-01
Introduction Hand-held dynamometry (HHD) has never previously been used to examine isometric muscle power. Rate of force development (RFD) is often used for muscle power assessment, however no consensus currently exists on the most appropriate method of calculation. The aim of this study was to examine the reliability of different algorithms for RFD calculation and to examine the intra-rater, inter-rater, and inter-device reliability of HHD as well as the concurrent validity of HHD for the assessment of isometric lower limb muscle strength and power. Methods 30 healthy young adults (age: 23±5yrs, male: 15) were assessed on two sessions. Isometric muscle strength and power were measured using peak force and RFD respectively using two HHDs (Lafayette Model-01165 and Hoggan microFET2) and a criterion-reference KinCom dynamometer. Statistical analysis of reliability and validity comprised intraclass correlation coefficients (ICC), Pearson correlations, concordance correlations, standard error of measurement, and minimal detectable change. Results Comparison of RFD methods revealed that a peak 200ms moving window algorithm provided optimal reliability results. Intra-rater, inter-rater, and inter-device reliability analysis of peak force and RFD revealed mostly good to excellent reliability (coefficients ≥ 0.70) for all muscle groups. Concurrent validity analysis showed moderate to excellent relationships between HHD and fixed dynamometry for the hip and knee (ICCs ≥ 0.70) for both peak force and RFD, with mostly poor to good results shown for the ankle muscles (ICCs = 0.31–0.79). Conclusions Hand-held dynamometry has good to excellent reliability and validity for most measures of isometric lower limb strength and power in a healthy population, particularly for proximal muscle groups. To aid implementation we have created freely available software to extract these variables from data stored on the Lafayette device. Future research should examine the reliability and validity of these variables in clinical populations. PMID:26509265
Wei, Wei; Larrey-Lassalle, Pyrène; Faure, Thierry; Dumoulin, Nicolas; Roux, Philippe; Mathias, Jean-Denis
2016-03-01
Comparative decision making process is widely used to identify which option (system, product, service, etc.) has smaller environmental footprints and for providing recommendations that help stakeholders take future decisions. However, the uncertainty problem complicates the comparison and the decision making. Probability-based decision support in LCA is a way to help stakeholders in their decision-making process. It calculates the decision confidence probability which expresses the probability of a option to have a smaller environmental impact than the one of another option. Here we apply the reliability theory to approximate the decision confidence probability. We compare the traditional Monte Carlo method with a reliability method called FORM method. The Monte Carlo method needs high computational time to calculate the decision confidence probability. The FORM method enables us to approximate the decision confidence probability with fewer simulations than the Monte Carlo method by approximating the response surface. Moreover, the FORM method calculates the associated importance factors that correspond to a sensitivity analysis in relation to the probability. The importance factors allow stakeholders to determine which factors influence their decision. Our results clearly show that the reliability method provides additional useful information to stakeholders as well as it reduces the computational time.
Mani, Suresh; Sharma, Shobha; Omar, Baharudin; Paungmali, Aatit; Joseph, Leonard
2017-04-01
Purpose The purpose of this review is to systematically explore and summarise the validity and reliability of telerehabilitation (TR)-based physiotherapy assessment for musculoskeletal disorders. Method A comprehensive systematic literature review was conducted using a number of electronic databases: PubMed, EMBASE, PsycINFO, Cochrane Library and CINAHL, published between January 2000 and May 2015. The studies examined the validity, inter- and intra-rater reliabilities of TR-based physiotherapy assessment for musculoskeletal conditions were included. Two independent reviewers used the Quality Appraisal Tool for studies of diagnostic Reliability (QAREL) and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool to assess the methodological quality of reliability and validity studies respectively. Results A total of 898 hits were achieved, of which 11 articles based on inclusion criteria were reviewed. Nine studies explored the concurrent validity, inter- and intra-rater reliabilities, while two studies examined only the concurrent validity. Reviewed studies were moderate to good in methodological quality. The physiotherapy assessments such as pain, swelling, range of motion, muscle strength, balance, gait and functional assessment demonstrated good concurrent validity. However, the reported concurrent validity of lumbar spine posture, special orthopaedic tests, neurodynamic tests and scar assessments ranged from low to moderate. Conclusion TR-based physiotherapy assessment was technically feasible with overall good concurrent validity and excellent reliability, except for lumbar spine posture, orthopaedic special tests, neurodynamic testa and scar assessment.
[Modification and evaluation of assessment of medication literacy].
Zheng, Feng; Zhong, Zhuqing; Ding, Siqing; Luo, Aijing; Liu, Zina
2016-11-28
To translate and revise the Medication Literacy Assessment in English (MedLitRxSE-English) and evaluate its validity and reliability. Methods: We introduced MedLitRxSE-English from abroad. According to the principles of Brislin and culture adjustment, we revised it as a Chinese edition. Using random sampling method, from Oct, 2014 to Jan, 2015, 461 non-hospitalized patients from the outpatient departments of the top three hospitals in Changsha city were investigated. The reliability and validity of the scale was tested. Results: The test-retest reliability of the Chinese version for medication literacy scale was 0.885; the split reliability was 0.840; K-R was 0.820; the correlations between the assessment of medication literacy and the corresponding items were 0.427-0.587; the confirmatory factor analysis revealed overall good fit. Root mean square error of approximation (RMSEA), χ2/df, goodness of fit index (GFI) and comparative fit index (CFI) was 0.08, 3.06, 0.91 and 0.94, respectively. Conclusion: The Chinese version for the assessment of medication literacy is in good reliability and validity, and it can be used to evaluate the medication literacy in our country.
NASA Technical Reports Server (NTRS)
Kleinhammer, Roger K.; Graber, Robert R.; DeMott, D. L.
2016-01-01
Reliability practitioners advocate getting reliability involved early in a product development process. However, when assigned to estimate or assess the (potential) reliability of a product or system early in the design and development phase, they are faced with lack of reasonable models or methods for useful reliability estimation. Developing specific data is costly and time consuming. Instead, analysts rely on available data to assess reliability. Finding data relevant to the specific use and environment for any project is difficult, if not impossible. Instead, analysts attempt to develop the "best" or composite analog data to support the assessments. Industries, consortia and vendors across many areas have spent decades collecting, analyzing and tabulating fielded item and component reliability performance in terms of observed failures and operational use. This data resource provides a huge compendium of information for potential use, but can also be compartmented by industry, difficult to find out about, access, or manipulate. One method used incorporates processes for reviewing these existing data sources and identifying the available information based on similar equipment, then using that generic data to derive an analog composite. Dissimilarities in equipment descriptions, environment of intended use, quality and even failure modes impact the "best" data incorporated in an analog composite. Once developed, this composite analog data provides a "better" representation of the reliability of the equipment or component. It can be used to support early risk or reliability trade studies, or analytical models to establish the predicted reliability data points. It also establishes a baseline prior that may updated based on test data or observed operational constraints and failures, i.e., using Bayesian techniques. This tutorial presents a descriptive compilation of historical data sources across numerous industries and disciplines, along with examples of contents and data characteristics. It then presents methods for combining failure information from different sources and mathematical use of this data in early reliability estimation and analyses.
The Validation of a Case-Based, Cumulative Assessment and Progressions Examination
Coker, Adeola O.; Copeland, Jeffrey T.; Gottlieb, Helmut B.; Horlen, Cheryl; Smith, Helen E.; Urteaga, Elizabeth M.; Ramsinghani, Sushma; Zertuche, Alejandra; Maize, David
2016-01-01
Objective. To assess content and criterion validity, as well as reliability of an internally developed, case-based, cumulative, high-stakes third-year Annual Student Assessment and Progression Examination (P3 ASAP Exam). Methods. Content validity was assessed through the writing-reviewing process. Criterion validity was assessed by comparing student scores on the P3 ASAP Exam with the nationally validated Pharmacy Curriculum Outcomes Assessment (PCOA). Reliability was assessed with psychometric analysis comparing student performance over four years. Results. The P3 ASAP Exam showed content validity through representation of didactic courses and professional outcomes. Similar scores on the P3 ASAP Exam and PCOA with Pearson correlation coefficient established criterion validity. Consistent student performance using Kuder-Richardson coefficient (KR-20) since 2012 reflected reliability of the examination. Conclusion. Pharmacy schools can implement internally developed, high-stakes, cumulative progression examinations that are valid and reliable using a robust writing-reviewing process and psychometric analyses. PMID:26941435
Wise Crowd Content Assessment and Educational Rubrics
ERIC Educational Resources Information Center
Passonneau, Rebecca J.; Poddar, Ananya; Gite, Gaurav; Krivokapic, Alisa; Yang, Qian; Perin, Dolores
2018-01-01
Development of reliable rubrics for educational intervention studies that address reading and writing skills is labor-intensive, and could benefit from an automated approach. We compare a main ideas rubric used in a successful writing intervention study to a highly reliable wise-crowd content assessment method developed to evaluate…
Jung, Kyoung-Sim; Jung, Jin-Hwa; In, Tae-Sung; Cho, Hwi-Young
2016-09-01
[Purpose] The purpose of this study was to establish the reliability and validity of the Short Musculoskeletal Function Assessment questionnaire, which was translated into Korean, for patients with musculoskeletal disorder. [Subjects and Methods] Fifty-five subjects (26 males and 29 females) with musculoskeletal diseases participated in the study. The Short Musculoskeletal Function Assessment questionnaire focuses on a limited range of physical functions and includes a dysfunction index and a bother index. Reliability was determined using the intraclass correlation coefficient, and validity was examined by correlating short musculoskeletal function assessment scores with the 36-item Short-Form Health Survey (SF-36) score. [Results] The reliability was 0.97 for the dysfunction index and 0.94 for the bother index. Validity was established by comparison with Korean version of the SF-36. [Conclusion] This study demonstrated that the Korean version of the Short Musculoskeletal Function Assessment questionnaire is a reliable and valid instrument for the assessment of musculoskeletal disorders.
TCOPPE school environmental audit tool: assessing safety and walkability of school environments.
Lee, Chanam; Kim, Hyung Jin; Dowdy, Diane M; Hoelscher, Deanna M; Ory, Marcia G
2013-09-01
Several environmental audit instruments have been developed for assessing streets, parks and trails, but none for schools. This paper introduces a school audit tool that includes 3 subcomponents: 1) street audit, 2) school site audit, and 3) map audit. It presents the conceptual basis and the development process of this instrument, and the methods and results of the reliability assessments. Reliability tests were conducted by 2 trained auditors on 12 study schools (high-low income and urban-suburban-rural settings). Kappa statistics (categorical, factual items) and ICC (Likert-scale, perceptual items) were used to assess a) interrater, b) test-retest, and c) peak vs. off-peak hour reliability tests. For the interrater reliability test, the average Kappa was 0.839 and the ICC was 0.602. For the test-retest reliability, the average Kappa was 0.903 and the ICC was 0.774. The peak-off peak reliability was 0.801. Rural schools showed the most consistent results in the peak-off peak and test-retest assessments. For interrater tests, urban schools showed the highest ICC, and rural schools showed the highest Kappa. Most items achieved moderate to high levels of reliabilities in all study schools. With proper training, this audit can be used to assess school environments reliably for research, outreach, and policy-support purposes.
The Arthroscopic Surgical Skill Evaluation Tool (ASSET)
Koehler, Ryan J.; Amsdell, Simon; Arendt, Elizabeth A; Bisson, Leslie J; Braman, Jonathan P; Butler, Aaron; Cosgarea, Andrew J; Harner, Christopher D; Garrett, William E; Olson, Tyson; Warme, Winston J.; Nicandri, Gregg T.
2014-01-01
Background Surgeries employing arthroscopic techniques are among the most commonly performed in orthopaedic clinical practice however, valid and reliable methods of assessing the arthroscopic skill of orthopaedic surgeons are lacking. Hypothesis The Arthroscopic Surgery Skill Evaluation Tool (ASSET) will demonstrate content validity, concurrent criterion-oriented validity, and reliability, when used to assess the technical ability of surgeons performing diagnostic knee arthroscopy on cadaveric specimens. Study Design Cross-sectional study; Level of evidence, 3 Methods Content validity was determined by a group of seven experts using a Delphi process. Intra-articular performance of a right and left diagnostic knee arthroscopy was recorded for twenty-eight residents and two sports medicine fellowship trained attending surgeons. Subject performance was assessed by two blinded raters using the ASSET. Concurrent criterion-oriented validity, inter-rater reliability, and test-retest reliability were evaluated. Results Content validity: The content development group identified 8 arthroscopic skill domains to evaluate using the ASSET. Concurrent criterion-oriented validity: Significant differences in total ASSET score (p<0.05) between novice, intermediate, and advanced experience groups were identified. Inter-rater reliability: The ASSET scores assigned by each rater were strongly correlated (r=0.91, p <0.01) and the intra-class correlation coefficient between raters for the total ASSET score was 0.90. Test-retest reliability: there was a significant correlation between ASSET scores for both procedures attempted by each individual (r = 0.79, p<0.01). Conclusion The ASSET appears to be a useful, valid, and reliable method for assessing surgeon performance of diagnostic knee arthroscopy in cadaveric specimens. Studies are ongoing to determine its generalizability to other procedures as well as to the live OR and other simulated environments. PMID:23548808
Reliability studies of diagnostic methods in Indian traditional Ayurveda medicine: An overview
Kurande, Vrinda Hitendra; Waagepetersen, Rasmus; Toft, Egon; Prasad, Ramjee
2013-01-01
Recently, a need to develop supportive new scientific evidence for contemporary Ayurveda has emerged. One of the research objectives is an assessment of the reliability of diagnoses and treatment. Reliability is a quantitative measure of consistency. It is a crucial issue in classification (such as prakriti classification), method development (pulse diagnosis), quality assurance for diagnosis and treatment and in the conduct of clinical studies. Several reliability studies are conducted in western medicine. The investigation of the reliability of traditional Chinese, Japanese and Sasang medicine diagnoses is in the formative stage. However, reliability studies in Ayurveda are in the preliminary stage. In this paper, examples are provided to illustrate relevant concepts of reliability studies of diagnostic methods and their implication in practice, education, and training. An introduction to reliability estimates and different study designs and statistical analysis is given for future studies in Ayurveda. PMID:23930037
Empirical methods for assessing meaningful neuropsychological change following epilepsy surgery.
Sawrie, S M; Chelune, G J; Naugle, R I; Lüders, H O
1996-11-01
Traditional methods for assessing the neurocognitive effects of epilepsy surgery are confounded by practice effects, test-retest reliability issues, and regression to the mean. This study employs 2 methods for assessing individual change that allow direct comparison of changes across both individuals and test measures. Fifty-one medically intractable epilepsy patients completed a comprehensive neuropsychological battery twice, approximately 8 months apart, prior to any invasive monitoring or surgical intervention. First, a Reliable Change (RC) index score was computed for each test score to take into account the reliability of that measure, and a cutoff score was empirically derived to establish the limits of statistically reliable change. These indices were subsequently adjusted for expected practice effects. The second approach used a regression technique to establish "change norms" along a common metric that models both expected practice effects and regression to the mean. The RC index scores provide the clinician with a statistical means of determining whether a patient's retest performance is "significantly" changed from baseline. The regression norms for change allow the clinician to evaluate the magnitude of a given patient's change on 1 or more variables along a common metric that takes into account the reliability and stability of each test measure. Case data illustrate how these methods provide an empirically grounded means for evaluating neurocognitive outcomes following medical interventions such as epilepsy surgery.
Validity of radiographic assessment of the knee joint space using automatic image analysis.
Komatsu, Daigo; Hasegawa, Yukiharu; Kojima, Toshihisa; Seki, Taisuke; Ikeuchi, Kazuma; Takegami, Yasuhiko; Amano, Takafumi; Higuchi, Yoshitoshi; Kasai, Takehiro; Ishiguro, Naoki
2016-09-01
The present study investigated whether there were differences between automatic and manual measurements of the minimum joint space width (mJSW) on knee radiographs. Knee radiographs of 324 participants in a systematic health screening were analyzed using the following three methods: manual measurement of film-based radiographs (Manual), manual measurement of digitized radiographs (Digital), and automatic measurement of digitized radiographs (Auto). The mean mJSWs on the medial and lateral sides of the knees were determined using each method, and measurement reliability was evaluated using intra-class correlation coefficients. Measurement errors were compared between normal knees and knees with radiographic osteoarthritis. All three methods demonstrated good reliability, although the reliability was slightly lower with the Manual method than with the other methods. On the medial and lateral sides of the knees, the mJSWs were the largest in the Manual method and the smallest in the Auto method. The measurement errors of each method were significantly larger for normal knees than for radiographic osteoarthritis knees. The mJSW measurements are more accurate and reliable with the Auto method than with the Manual or Digital method, especially for normal knees. Therefore, the Auto method is ideal for the assessment of the knee joint space.
Stovall, Bradley A; Kumar, Shrawan
2010-11-01
The objective of this review is to establish the current state of knowledge on the reliability of clinical assessment of asymmetry in the lumbar spine and pelvis. To search the literature, the authors consulted the databases of MEDLINE, CINAHL, AMED, MANTIS, Academic Search Complete, and Web of Knowledge using different combinations of the following keywords: palpation, asymmetry, inter or intraexaminer reliability, tissue texture, assessment, and anatomic landmark. Of the 23 studies identified, 14 did not meet the inclusion criteria and were excluded. The quality and methods of studies investigating the reliability of bony anatomic landmark asymmetry assessment are variable. The κ statistic ranges without training for interexaminer reliability were as follows: anterior superior iliac spine (ASIS), -0.01 to 0.19; posterior superior iliac spine (PSIS), 0.04 to 0.15; inferior lateral angle, transverse plane (ILA-A/P), -0.03 to 0.11; inferior lateral angles, coronal plane (ILA-S/I), -0.01 to 0.08; sacral sulcus (SS), -0.4 to 0.37; lumbar spine transverse processes L1 through L5, 0.04 to 0.17. The corresponding ranges for intraexaminer reliability were higher for all associated landmarks: ASIS, 0.19 to 0.4; PSIS, 0.13 to 0.49; ILA-A/P, 0.1 to 0.2; ILA-S/I, 0.03 to 0.21; SS, 0.24 to 0.28; lumbar spine transverse processes L1 through L5, not applicable. Further research is needed to better understand the reliability of asymmetry assessment methods in manipulative medicine.
Stovall, Bradley A.; Kumar, Shrawan
2011-01-01
The objective of this review is to establish the current state of knowledge on the reliability of clinical assessment of asymmetry in the lumbar spine and pelvis. To search the literature, the authors consulted the databases of MEDLINE, CINAHL, AMED, MANTIS, Academic Search Complete, and Web of Knowledge using different combinations of the following keywords: palpation, asymmetry, inter- or intraex-aminer reliability, tissue texture, assessment, and anatomic landmark. Of the 23 studies identified, 14 did not meet the inclusion criteria and were excluded. The quality and methods of studies investigating the reliability of bony anatomic landmark asymmetry assessment are variable. The κ statistic ranges without training for interexaminer reliability were as follows: anterior superior iliac spine (ASIS), −0.01 to 0.19; posterior superior iliac spine (PSIS), 0.04 to 0.15; inferior lateral angle, transverse plane (ILA-A/P), −0.03 to 0.11; inferior lateral angles, coronal plane (ILA-S/I), −0.01 to 0.08; sacral sulcus (SS), −0.4 to 0.37; lumbar spine transverse processes L1 through L5, 0.04 to 0.17. The corresponding ranges for intraexaminer reliability were higher for all associated landmarks: ASIS, 0.19 to 0.4; PSIS, 0.13 to 0.49; ILA-A/P, 0.1 to 0.2; ILA-S/I, 0.03 to 0.21; SS, 0.24 to 0.28; lumbar spine transverse processes L1 through L5, not applicable. Further research is needed to better understand the reliability of asymmetry assessment methods in manipulative medicine. PMID:21135198
Guseva Canu, Irina; Jezewski-Serra, Delphine; Delabre, Laurène; Ducamp, Stéphane; Iwatsubo, Yuriko; Audignon-Durand, Sabine; Ducros, Cécile; Radauceanu, Anca; Durand, Catherine; Witschger, Olivier; Flahaut, Emmanuel
2017-01-01
The relatively recent development of industries working with nanomaterials has created challenges for exposure assessment. In this article, we propose a relatively simple approach to assessing nanomaterial exposures for the purposes of epidemiological studies of workers in these industries. This method consists of an onsite industrial hygiene visit of facilities carried out individually and a description of workstations where nano-objects and their agglomerates and aggregates (NOAA) are present using a standardized tool, the Onsite technical logbook. To assess its reliability, we implemented this approach for assessing exposure to NOAA in workplaces at seven workstations which synthesize and functionalize carbon nanotubes. The prediction of exposure to NOAA using this method exhibited substantial agreement with that of the reference method, the latter being based on an onsite group visit, an expert's report and exposure measurements (Cohen kappa = 0.70, sensitivity = 0.88, specificity = 0.92). Intramethod comparison of results for exposure prediction showed moderate agreement between the three evaluators (two program team evaluators and one external evaluator) (weighted Fleiss kappa = 0.60, P = 0.003). Interevaluator reliability of the semiquantitative exposure characterization results was excellent between the two evaluators from the program team (Spearman rho = 0.93, P = 0.03) and fair when these two evaluators' results were compared with the external evaluator's results. The project was undertaken within the framework of the French epidemiological surveillance program EpiNano. This study allowed a first reliability assessment of the EpiNano method. However, to further validate this method a comparison with robust quantitative exposure measurement data is necessary. © The Author 2017. Published by Oxford University Press on behalf of the British Occupational Hygiene Society.
Mash, Bob; Derese, Anselme
2013-01-01
Abstract Background Competency-based education and the validity and reliability of workplace-based assessment of postgraduate trainees have received increasing attention worldwide. Family medicine was recognised as a speciality in South Africa six years ago and a satisfactory portfolio of learning is a prerequisite to sit the national exit exam. A massive scaling up of the number of family physicians is needed in order to meet the health needs of the country. Aim The aim of this study was to develop a reliable, robust and feasible portfolio assessment tool (PAT) for South Africa. Methods Six raters each rated nine portfolios from the Stellenbosch University programme, using the PAT, to test for inter-rater reliability. This rating was repeated three months later to determine test–retest reliability. Following initial analysis and feedback the PAT was modified and the inter-rater reliability again assessed on nine new portfolios. An acceptable intra-class correlation was considered to be > 0.80. Results The total score was found to be reliable, with a coefficient of 0.92. For test–retest reliability, the difference in mean total score was 1.7%, which was not statistically significant. Amongst the subsections, only assessment of the educational meetings and the logbook showed reliability coefficients > 0.80. Conclusion This was the first attempt to develop a reliable, robust and feasible national portfolio assessment tool to assess postgraduate family medicine training in the South African context. The tool was reliable for the total score, but the low reliability of several sections in the PAT helped us to develop 12 recommendations regarding the use of the portfolio, the design of the PAT and the training of raters.
NASA Astrophysics Data System (ADS)
Goh, A. T. C.; Kulhawy, F. H.
2005-05-01
In urban environments, one major concern with deep excavations in soft clay is the potentially large ground deformations in and around the excavation. Excessive movements can damage adjacent buildings and utilities. There are many uncertainties associated with the calculation of the ultimate or serviceability performance of a braced excavation system. These include the variabilities of the loadings, geotechnical soil properties, and engineering and geometrical properties of the wall. A risk-based approach to serviceability performance failure is necessary to incorporate systematically the uncertainties associated with the various design parameters. This paper demonstrates the use of an integrated neural network-reliability method to assess the risk of serviceability failure through the calculation of the reliability index. By first performing a series of parametric studies using the finite element method and then approximating the non-linear limit state surface (the boundary separating the safe and failure domains) through a neural network model, the reliability index can be determined with the aid of a spreadsheet. Two illustrative examples are presented to show how the serviceability performance for braced excavation problems can be assessed using the reliability index.
Reliability of resting-state microstate features in electroencephalography.
Khanna, Arjun; Pascual-Leone, Alvaro; Farzan, Faranak
2014-01-01
Electroencephalographic (EEG) microstate analysis is a method of identifying quasi-stable functional brain states ("microstates") that are altered in a number of neuropsychiatric disorders, suggesting their potential use as biomarkers of neurophysiological health and disease. However, use of EEG microstates as neurophysiological biomarkers requires assessment of the test-retest reliability of microstate analysis. We analyzed resting-state, eyes-closed, 30-channel EEG from 10 healthy subjects over 3 sessions spaced approximately 48 hours apart. We identified four microstate classes and calculated the average duration, frequency, and coverage fraction of these microstates. Using Cronbach's α and the standard error of measurement (SEM) as indicators of reliability, we examined: (1) the test-retest reliability of microstate features using a variety of different approaches; (2) the consistency between TAAHC and k-means clustering algorithms; and (3) whether microstate analysis can be reliably conducted with 19 and 8 electrodes. The approach of identifying a single set of "global" microstate maps showed the highest reliability (mean Cronbach's α > 0.8, SEM ≈ 10% of mean values) compared to microstates derived by each session or each recording. There was notably low reliability in features calculated from maps extracted individually for each recording, suggesting that the analysis is most reliable when maps are held constant. Features were highly consistent across clustering methods (Cronbach's α > 0.9). All features had high test-retest reliability with 19 and 8 electrodes. High test-retest reliability and cross-method consistency of microstate features suggests their potential as biomarkers for assessment of the brain's neurophysiological health.
Evaluation of power system security and development of transmission pricing method
NASA Astrophysics Data System (ADS)
Kim, Hyungchul
The electric power utility industry is presently undergoing a change towards the deregulated environment. This has resulted in unbundling of generation, transmission and distribution services. The introduction of competition into unbundled electricity services may lead system operation closer to its security boundaries resulting in smaller operating safety margins. The competitive environment is expected to lead to lower price rates for customers and higher efficiency for power suppliers in the long run. Under this deregulated environment, security assessment and pricing of transmission services have become important issues in power systems. This dissertation provides new methods for power system security assessment and transmission pricing. In power system security assessment, the following issues are discussed (1) The description of probabilistic methods for power system security assessment; (2) The computation time of simulation methods; (3) on-line security assessment for operation. A probabilistic method using Monte-Carlo simulation is proposed for power system security assessment. This method takes into account dynamic and static effects corresponding to contingencies. Two different Kohonen networks, Self-Organizing Maps and Learning Vector Quantization, are employed to speed up the probabilistic method. The combination of Kohonen networks and Monte-Carlo simulation can reduce computation time in comparison with straight Monte-Carlo simulation. A technique for security assessment employing Bayes classifier is also proposed. This method can be useful for system operators to make security decisions during on-line power system operation. This dissertation also suggests an approach for allocating transmission transaction costs based on reliability benefits in transmission services. The proposed method shows the transmission transaction cost of reliability benefits when transmission line capacities are considered. The ratio between allocation by transmission line capacity-use and allocation by reliability benefits is computed using the probability of system failure.
Boser, Quinn A; Valevicius, Aïda M; Lavoie, Ewen B; Chapman, Craig S; Pilarski, Patrick M; Hebert, Jacqueline S; Vette, Albert H
2018-04-27
Quantifying angular joint kinematics of the upper body is a useful method for assessing upper limb function. Joint angles are commonly obtained via motion capture, tracking markers placed on anatomical landmarks. This method is associated with limitations including administrative burden, soft tissue artifacts, and intra- and inter-tester variability. An alternative method involves the tracking of rigid marker clusters affixed to body segments, calibrated relative to anatomical landmarks or known joint angles. The accuracy and reliability of applying this cluster method to the upper body has, however, not been comprehensively explored. Our objective was to compare three different upper body cluster models with an anatomical model, with respect to joint angles and reliability. Non-disabled participants performed two standardized functional upper limb tasks with anatomical and cluster markers applied concurrently. Joint angle curves obtained via the marker clusters with three different calibration methods were compared to those from an anatomical model, and between-session reliability was assessed for all models. The cluster models produced joint angle curves which were comparable to and highly correlated with those from the anatomical model, but exhibited notable offsets and differences in sensitivity for some degrees of freedom. Between-session reliability was comparable between all models, and good for most degrees of freedom. Overall, the cluster models produced reliable joint angles that, however, cannot be used interchangeably with anatomical model outputs to calculate kinematic metrics. Cluster models appear to be an adequate, and possibly advantageous alternative to anatomical models when the objective is to assess trends in movement behavior. Copyright © 2018 Elsevier Ltd. All rights reserved.
Estimates Of The Orbiter RSI Thermal Protection System Thermal Reliability
NASA Technical Reports Server (NTRS)
Kolodziej, P.; Rasky, D. J.
2002-01-01
In support of the Space Shuttle Orbiter post-flight inspection, structure temperatures are recorded at selected positions on the windward, leeward, starboard and port surfaces. Statistical analysis of this flight data and a non-dimensional load interference (NDLI) method are used to estimate the thermal reliability at positions were reusable surface insulation (RSI) is installed. In this analysis, structure temperatures that exceed the design limit define the critical failure mode. At thirty-three positions the RSI thermal reliability is greater than 0.999999 for the missions studied. This is not the overall system level reliability of the thermal protection system installed on an Orbiter. The results from two Orbiters, OV-102 and OV-105, are in good agreement. The original RSI designs on the OV-102 Orbital Maneuvering System pods, which had low reliability, were significantly improved on OV-105. The NDLI method was also used to estimate thermal reliability from an assessment of TPS uncertainties that was completed shortly before the first Orbiter flight. Results fiom the flight data analysis and the pre-flight assessment agree at several positions near each other. The NDLI method is also effective for optimizing RSI designs to provide uniform thermal reliability on the acreage surface of reusable launch vehicles.
Dynamic one-dimensional modeling of secondary settling tanks and system robustness evaluation.
Li, Ben; Stenstrom, M K
2014-01-01
One-dimensional secondary settling tank models are widely used in current engineering practice for design and optimization, and usually can be expressed as a nonlinear hyperbolic or nonlinear strongly degenerate parabolic partial differential equation (PDE). Reliable numerical methods are needed to produce approximate solutions that converge to the exact analytical solutions. In this study, we introduced a reliable numerical technique, the Yee-Roe-Davis (YRD) method as the governing PDE solver, and compared its reliability with the prevalent Stenstrom-Vitasovic-Takács (SVT) method by assessing their simulation results at various operating conditions. The YRD method also produced a similar solution to the previously developed Method G and Enquist-Osher method. The YRD and SVT methods were also used for a time-to-failure evaluation, and the results show that the choice of numerical method can greatly impact the solution. Reliable numerical methods, such as the YRD method, are strongly recommended.
Almaqrami, Bushra-Sufyan; Alhammadi, Maged-Sultan
2018-01-01
Background The objective of this study was to analyse three dimensionally the reliability and correlation of angular and linear measurements in assessment of anteroposterior skeletal discrepancy. Material and Methods In this retrospective cross sectional study, a sample of 213 subjects were three-dimensionally analysed from cone-beam computed tomography scans. The sample was divided according to three dimensional measurement of anteroposterior relation (ANB angle) into three groups (skeletal Class I, Class II and Class III). The anterior-posterior cephalometric indicators were measured on volumetric images using Anatomage software (InVivo5.2). These measurements included three angular and seven linear measurements. Cross tabulations were performed to correlate the ANB angle with each method. Intra-class Correlation Coefficient (ICC) test was applied for the difference between the two reliability measurements. P value of < 0.05 was considered significant. Results There was a statistically significant (P<0.05) agreement between all methods used with variability in assessment of different anteroposterior relations. The highest correlation was between ANB and DSOJ (0.913), strong correlation with AB/FH, AB/SN/, MM bisector, AB/PP, Wits appraisal (0.896, 0.890, 0.878, 0.867,and 0.858, respectively), moderate with AD/SN and Beta angle (0.787 and 0.760), and weak correlation with corrected ANB angle (0.550). Conclusions Conjunctive usage of ANB angle with DSOJ, AB/FH, AB/SN/, MM bisector, AB/PP and Wits appraisal in 3D cephalometric analysis provide a more reliable and valid indicator of the skeletal anteroposterior relationship. Clinical relevance: Most of orthodontic literature depends on single method (ANB) with its drawbacks in assessment of skeletal discrepancy which is a cardinal factors for proper treatment planning, this study assessed three dimensionally the degree of correlation between all available methods to make clinical judgement more accurate based on more than one method of assessment. Key words:Anteroposterior relationships, ANB angle, Three-dimension, CBCT. PMID:29750096
The reliability of the pass/fail decision for assessments comprised of multiple components.
Möltner, Andreas; Tımbıl, Sevgi; Jünger, Jana
2015-01-01
The decision having the most serious consequences for a student taking an assessment is the one to pass or fail that student. For this reason, the reliability of the pass/fail decision must be determined for high quality assessments, just as the measurement reliability of the point values. Assessments in a particular subject (graded course credit) are often composed of multiple components that must be passed independently of each other. When "conjunctively" combining separate pass/fail decisions, as with other complex decision rules for passing, adequate methods of analysis are necessary for estimating the accuracy and consistency of these classifications. To date, very few papers have addressed this issue; a generally applicable procedure was published by Douglas and Mislevy in 2010. Using the example of an assessment comprised of several parts that must be passed separately, this study analyzes the reliability underlying the decision to pass or fail students and discusses the impact of an improved method for identifying those who do not fulfill the minimum requirements. The accuracy and consistency of the decision to pass or fail an examinee in the subject cluster Internal Medicine/General Medicine/Clinical Chemistry at the University of Heidelberg's Faculty of Medicine was investigated. This cluster requires students to separately pass three components (two written exams and an OSCE), whereby students may reattempt to pass each component twice. Our analysis was carried out using the method described by Douglas and Mislevy. Frequently, when complex logical connections exist between the individual pass/fail decisions in the case of low failure rates, only a very low reliability for the overall decision to grant graded course credit can be achieved, even if high reliabilities exist for the various components. For the example analyzed here, the classification accuracy and consistency when conjunctively combining the three individual parts is relatively low with κ=0.49 or κ=0.47, despite the good reliability of over 0.75 for each of the three components. The option to repeat each component twice leads to a situation in which only about half of the candidates who do not satisfy the minimum requirements would fail the overall assessment, while the other half is able to continue their studies despite having deficient knowledge and skills. The method put forth by Douglas and Mislevy allows the analysis of the decision accuracy and consistency for complex combinations of scores from different components. Even in the case of highly reliable components, it is not necessarily so that a reliable pass/fail decision has been reached - for instance in the case of low failure rates. Assessments must be administered with the explicit goal of identifying examinees that do not fulfill the minimum requirements.
Benjamin, Sara E; Neelon, Brian; Ball, Sarah C; Bangdiwala, Shrikant I; Ammerman, Alice S; Ward, Dianne S
2007-01-01
Background Few assessment instruments have examined the nutrition and physical activity environments in child care, and none are self-administered. Given the emerging focus on child care settings as a target for intervention, a valid and reliable measure of the nutrition and physical activity environment is needed. Methods To measure inter-rater reliability, 59 child care center directors and 109 staff completed the self-assessment concurrently, but independently. Three weeks later, a repeat self-assessment was completed by a sub-sample of 38 directors to assess test-retest reliability. To assess criterion validity, a researcher-administered environmental assessment was conducted at 69 centers and was compared to a self-assessment completed by the director. A weighted kappa test statistic and percent agreement were calculated to assess agreement for each question on the self-assessment. Results For inter-rater reliability, kappa statistics ranged from 0.20 to 1.00 across all questions. Test-retest reliability of the self-assessment yielded kappa statistics that ranged from 0.07 to 1.00. The inter-quartile kappa statistic ranges for inter-rater and test-retest reliability were 0.45 to 0.63 and 0.27 to 0.45, respectively. When percent agreement was calculated, questions ranged from 52.6% to 100% for inter-rater reliability and 34.3% to 100% for test-retest reliability. Kappa statistics for validity ranged from -0.01 to 0.79, with an inter-quartile range of 0.08 to 0.34. Percent agreement for validity ranged from 12.9% to 93.7%. Conclusion This study provides estimates of criterion validity, inter-rater reliability and test-retest reliability for an environmental nutrition and physical activity self-assessment instrument for child care. Results indicate that the self-assessment is a stable and reasonably accurate instrument for use with child care interventions. We therefore recommend the Nutrition and Physical Activity Self-Assessment for Child Care (NAP SACC) instrument to researchers and practitioners interested in conducting healthy weight intervention in child care. However, a more robust, less subjective measure would be more appropriate for researchers seeking an outcome measure to assess intervention impact. PMID:17615078
Towards early software reliability prediction for computer forensic tools (case study).
Abu Talib, Manar
2016-01-01
Versatility, flexibility and robustness are essential requirements for software forensic tools. Researchers and practitioners need to put more effort into assessing this type of tool. A Markov model is a robust means for analyzing and anticipating the functioning of an advanced component based system. It is used, for instance, to analyze the reliability of the state machines of real time reactive systems. This research extends the architecture-based software reliability prediction model for computer forensic tools, which is based on Markov chains and COSMIC-FFP. Basically, every part of the computer forensic tool is linked to a discrete time Markov chain. If this can be done, then a probabilistic analysis by Markov chains can be performed to analyze the reliability of the components and of the whole tool. The purposes of the proposed reliability assessment method are to evaluate the tool's reliability in the early phases of its development, to improve the reliability assessment process for large computer forensic tools over time, and to compare alternative tool designs. The reliability analysis can assist designers in choosing the most reliable topology for the components, which can maximize the reliability of the tool and meet the expected reliability level specified by the end-user. The approach of assessing component-based tool reliability in the COSMIC-FFP context is illustrated with the Forensic Toolkit Imager case study.
Clinical methods to quantify trunk mobility in an elite male surfing population.
Furness, James; Climstein, Mike; Sheppard, Jeremy M; Abbott, Allan; Hing, Wayne
2016-05-01
Thoracic mobility in the sagittal and horizontal planes are key requirements in the sport of surfing; however to date the normal values of these movements have not yet been quantified in a surfing population. To develop a reliable method to quantify thoracic mobility in the sagittal plane; to assess the reliability of an existing thoracic rotation method, and quantify thoracic mobility in an elite male surfing population. Clinical Measurement, reliability and comparative study. A total of 30 subjects were used to determine the reliability component. 15 elite surfers were used as part of a comparative analysis with age and gender matched controls. Intraclass correlation coefficient values ranged between 0.95-0.99 (95% CI; 0.89-0.99) for both thoracic methods. The elite surfing group had significantly (p ≤ 0.05) greater rotation than the comparative group (mean rotation 63.57° versus 40.80°, respectively). This study has illustrated reliable methods to assess the thoracic spine in the sagittal plane and thoracic rotation. It has also quantified ROM in a surfing cohort; identifying thoracic rotation as a key movement. This information may provide clinicians, coaches and athletic trainers with imperative information regarding the importance of maintaining adequate thoracic rotation. Copyright © 2015 Elsevier Ltd. All rights reserved.
Assessment of mesh simplification algorithm quality
NASA Astrophysics Data System (ADS)
Roy, Michael; Nicolier, Frederic; Foufou, S.; Truchetet, Frederic; Koschan, Andreas; Abidi, Mongi A.
2002-03-01
Traditionally, medical geneticists have employed visual inspection (anthroposcopy) to clinically evaluate dysmorphology. In the last 20 years, there has been an increasing trend towards quantitative assessment to render diagnosis of anomalies more objective and reliable. These methods have focused on direct anthropometry, using a combination of classical physical anthropology tools and new instruments tailor-made to describe craniofacial morphometry. These methods are painstaking and require that the patient remain still for extended periods of time. Most recently, semiautomated techniques (e.g., structured light scanning) have been developed to capture the geometry of the face in a matter of seconds. In this paper, we establish that direct anthropometry and structured light scanning yield reliable measurements, with remarkably high levels of inter-rater and intra-rater reliability, as well as validity (contrasting the two methods).
Hama, Yohei; Kanazawa, Manabu; Minakuchi, Shunsuke; Uchida, Tatsuro; Sasaki, Yoshiyuki
2014-03-19
In the present study, we developed a novel color scale for visual assessment, conforming to theoretical color changes of a gum, to evaluate masticatoryperformance; moreover, we investigated the reliability and validity of this evaluation method using the color scale. Ten participants (aged 26.30 years) with natural dentition chewed the gum at several chewing strokes. Changes in color were measured using a colorimeter, and then, linearregression expressions that represented changes in gum color were derived. The color scale was developed using these regression expressions. Thirty-two chewed gums were evaluated using colorimeter and were assessed three times using the color scale by six dentists aged 25.27 (mean, 25.8) years, six preclinical dental students aged 21.23 (mean, 22.2) years, and six elderly individuals aged 68.84 (mean, 74.0) years. The intrarater and interrater reliability of evaluations was assessed using intraclass correlation coefficients. Validity of the method compared with a colorimeter was assessed using Spearman's rank correlation coefficient. All intraclass correlation coefficients were > 0.90, and Spearman's rank-correlation coefficients were > 0.95 in all groups. These results indicated that the evaluation method of the color-changeable chewing gum using the newly developed color scale is reliable and valid.
Trustworthiness and Authenticity: Alternate Ways To Judge Authentic Assessments.
ERIC Educational Resources Information Center
Hipps, Jerome A.
New methods are needed to judge the quality of alternative student assessment, methods which complement the philosophy underlying authentic assessments. This paper examines assumptions underlying validity, reliability, and objectivity, and why they are not matched to authentic assessment, concentrating on the constructivist paradigm of E. Guba and…
Aye, Thanda; Oo, Khin Saw; Khin, Myo Thuzar; Kuramoto-Ahuja, Tsugumi; Maruyama, Hitoshi
2017-01-01
[Purpose] The purpose of this study was to investigate reliability of the test of gross motor development second edition (TGMD-2) for Kindergarten children in Myanmar. [Subjects and Methods] Fifty healthy Kindergarten children (23 males, 27 females) whose parents/guardians had given written consent were participated. The subjects were explained and demonstrated all 12 gross motor skills of TGMD-2 before the assessment. Each subject individually performed two trials for each gross motor skill and the performance was video recorded. Three raters separately watched the video recordings and rated for inter-rater reliability. The second assessment was done one month later with 25 out of 50 subjects for test-rest reliability. The video recordings of 12 subjects were randomly selected from the first 50 recordings for intra-rater reliability six weeks after the first assessment. The agreement on the locomotor and object control raw scores and the gross motor quotient (GMQ) were calculated. [Results] The findings of all the reliability coefficients for the locomotor and object control raw scores and the GMQ were interpreted as good and excellent reliability. [Conclusion] The results represented that TGMD-2 is a highly reliable and appropriate assessment tool for assessing gross motor skill development of Kindergarten children in Myanmar. PMID:29184278
Aye, Thanda; Oo, Khin Saw; Khin, Myo Thuzar; Kuramoto-Ahuja, Tsugumi; Maruyama, Hitoshi
2017-10-01
[Purpose] The purpose of this study was to investigate reliability of the test of gross motor development second edition (TGMD-2) for Kindergarten children in Myanmar. [Subjects and Methods] Fifty healthy Kindergarten children (23 males, 27 females) whose parents/guardians had given written consent were participated. The subjects were explained and demonstrated all 12 gross motor skills of TGMD-2 before the assessment. Each subject individually performed two trials for each gross motor skill and the performance was video recorded. Three raters separately watched the video recordings and rated for inter-rater reliability. The second assessment was done one month later with 25 out of 50 subjects for test-rest reliability. The video recordings of 12 subjects were randomly selected from the first 50 recordings for intra-rater reliability six weeks after the first assessment. The agreement on the locomotor and object control raw scores and the gross motor quotient (GMQ) were calculated. [Results] The findings of all the reliability coefficients for the locomotor and object control raw scores and the GMQ were interpreted as good and excellent reliability. [Conclusion] The results represented that TGMD-2 is a highly reliable and appropriate assessment tool for assessing gross motor skill development of Kindergarten children in Myanmar.
Kainz, Hans; Hajek, Martin; Modenese, Luca; Saxby, David J; Lloyd, David G; Carty, Christopher P
2017-03-01
In human motion analysis predictive or functional methods are used to estimate the location of the hip joint centre (HJC). It has been shown that the Harrington regression equations (HRE) and geometric sphere fit (GSF) method are the most accurate predictive and functional methods, respectively. To date, the comparative reliability of both approaches has not been assessed. The aims of this study were to (1) compare the reliability of the HRE and the GSF methods, (2) analyse the impact of the number of thigh markers used in the GSF method on the reliability, (3) evaluate how alterations to the movements that comprise the functional trials impact HJC estimations using the GSF method, and (4) assess the influence of the initial guess in the GSF method on the HJC estimation. Fourteen healthy adults were tested on two occasions using a three-dimensional motion capturing system. Skin surface marker positions were acquired while participants performed quite stance, perturbed and non-perturbed functional trials, and walking trials. Results showed that the HRE were more reliable in locating the HJC than the GSF method. However, comparison of inter-session hip kinematics during gait did not show any significant difference between the approaches. Different initial guesses in the GSF method did not result in significant differences in the final HJC location. The GSF method was sensitive to the functional trial performance and therefore it is important to standardize the functional trial performance to ensure a repeatable estimate of the HJC when using the GSF method. Copyright © 2017 Elsevier B.V. All rights reserved.
Methods for assessing the quality of data in public health information systems: a critical review.
Chen, Hong; Yu, Ping; Hailey, David; Wang, Ning
2014-01-01
The quality of data in public health information systems can be ensured by effective data quality assessment. In order to conduct effective data quality assessment, measurable data attributes have to be precisely defined. Then reliable and valid measurement methods for data attributes have to be used to measure each attribute. We conducted a systematic review of data quality assessment methods for public health using major databases and well-known institutional websites. 35 studies were eligible for inclusion in the study. A total of 49 attributes of data quality were identified from the literature. Completeness, accuracy and timeliness were the three most frequently assessed attributes of data quality. Most studies directly examined data values. This is complemented by exploring either data users' perception or documentation quality. However, there are limitations of current data quality assessment methods: a lack of consensus on attributes measured; inconsistent definition of the data quality attributes; a lack of mixed methods for assessing data quality; and inadequate attention to reliability and validity. Removal of these limitations is an opportunity for further improvement.
Smith, Toby O; Clark, Allan; Neda, Sophia; Arendt, Elizabeth A; Post, William R; Grelsamer, Ronald P; Dejour, David; Almqvist, Karl Fredrik; Donell, Simon T
2012-08-01
An accurate physical examination of patients with patellar instability is an important aspect of the diagnosis and treatment. While previous studies have assessed the diagnostic accuracy of such physical examination tests, little has been undertaken to assess the inter- and intra-tester reliability of such techniques. The purpose of this study was to determine the inter- and intra-tester reliability of the physical examination tests used for patients with patellar instability. Five patients (10 knees) with bilateral recurrent patellar instability were assessed by five members of the International Patellofemoral Study Group. Each surgeon assessed each patient twice using 18 reported physical examination tests. The inter- and intra-observer reliability was assessed using weighted Kappa statistics with 95% confidence intervals. The findings of the study suggested that there were very poor inter-observer reliability for the majority of the physical tests, with only the assessments of patellofemoral crepitus, foot arch position and the J-sign presenting with fair to moderate agreement respectively. The intra-observer reliability indicated largely moderate to substantial agreement between the first and second tests performed by each assessor, with the greatest agreement seen for the assessment of tibial torsion, popliteal angle and the Bassett's sign. For the common physical examination tests used in the management of patients with patellar instability inter-observer reliability is poor, while intra-observer reliability is moderate. Standardization of physical exam assessments and further study of these results among different clinicians and more divergent patient groups is indicated. Copyright © 2011 Elsevier B.V. All rights reserved.
Research on Novel Algorithms for Smart Grid Reliability Assessment and Economic Dispatch
NASA Astrophysics Data System (ADS)
Luo, Wenjin
In this dissertation, several studies of electric power system reliability and economy assessment methods are presented. To be more precise, several algorithms in evaluating power system reliability and economy are studied. Furthermore, two novel algorithms are applied to this field and their simulation results are compared with conventional results. As the electrical power system develops towards extra high voltage, remote distance, large capacity and regional networking, the application of a number of new technique equipments and the electric market system have be gradually established, and the results caused by power cut has become more and more serious. The electrical power system needs the highest possible reliability due to its complication and security. In this dissertation the Boolean logic Driven Markov Process (BDMP) method is studied and applied to evaluate power system reliability. This approach has several benefits. It allows complex dynamic models to be defined, while maintaining its easy readability as conventional methods. This method has been applied to evaluate IEEE reliability test system. The simulation results obtained are close to IEEE experimental data which means that it could be used for future study of the system reliability. Besides reliability, modern power system is expected to be more economic. This dissertation presents a novel evolutionary algorithm named as quantum evolutionary membrane algorithm (QEPS), which combines the concept and theory of quantum-inspired evolutionary algorithm and membrane computation, to solve the economic dispatch problem in renewable power system with on land and offshore wind farms. The case derived from real data is used for simulation tests. Another conventional evolutionary algorithm is also used to solve the same problem for comparison. The experimental results show that the proposed method is quick and accurate to obtain the optimal solution which is the minimum cost for electricity supplied by wind farm system.
ERIC Educational Resources Information Center
Rojahn, Johannes; Schroeder, Stephen R.; Mayo-Ortega, Liliana; Oyama-Ganiko, Rosao; LeBlanc, Judith; Marquis, Janet; Berke, Elizabeth
2013-01-01
Reliable and valid assessment of aberrant behaviors is essential in empirically verifying prevention and intervention for individuals with intellectual or developmental disabilities (IDD). Few instruments exist which assess behavior problems in infants. The current longitudinal study examined the performance of three behavior-rating scales for…
Reliability of Volumetry and Perimetry to Assess Knee Volume.
Nunes, Guilherme S; Yamashitafuji, Igor; Wageck, Bruna; Teixeira, Guilherme Garcia; Karloh, Manuela; de Noronha, Marcos
2016-08-24
The treatment of edema after a knee injury is usually 1 of the main objectives during rehabilitation. To assess the success of treatment, 2 methods are commonly used in clinical practice: volumetry and perimetry. To investigate the intra- and interassessor reliability of volumetry and perimetry to assess knee volume. Cross-sectional. Laboratory. 45 healthy participants (26 women) with mean age of 22.4 ± 2.8 y. Knee volume was assessed by 3 assessors (A, B, and C) with 3 methods (lower-limb volumetry [LLV], knee volumetry [KV], and knee perimetry [KP]). Assessor A was the most-experienced assessor, and assessor C, the least experienced. LLV and KV were performed with participants in the orthostatic position, while KP was performed with participants in supine. For the interassessor analysis, the ICC2,1 was high (.82) for KV and very high for LLV (.99) and KP (.99). For the intra-assessor analysis, ICC2,1 ranged from moderate to high for KV (.69-.83) and was very high for LLV (.99) and KP (.97-.99). KV, LLV, and KP are reliable methods, both intra- and interassessor, to measure knee volume.
Measurement in Sensory Modulation: The Sensory Processing Scale Assessment
Miller, Lucy J.; Sullivan, Jillian C.
2014-01-01
OBJECTIVE. Sensory modulation issues have a significant impact on participation in daily life. Moreover, understanding phenotypic variation in sensory modulation dysfunction is crucial for research related to defining homogeneous groups and for clinical work in guiding treatment planning. We thus evaluated the new Sensory Processing Scale (SPS) Assessment. METHOD. Research included item development, behavioral scoring system development, test administration, and item analyses to evaluate reliability and validity across sensory domains. RESULTS. Items with adequate reliability (internal reliability >.4) and discriminant validity (p < .01) were retained. Feedback from the expert panel also contributed to decisions about retaining items in the scale. CONCLUSION. The SPS Assessment appears to be a reliable and valid measure of sensory modulation (scale reliability >.90; discrimination between group effect sizes >1.00). This scale has the potential to aid in differential diagnosis of sensory modulation issues. PMID:25184464
Lorencatto, Fabiana; West, Robert; Seymour, Natalie; Michie, Susan
2013-06-01
There is a difference between interventions as planned and as delivered in practice. Unless we know what was actually delivered, we cannot understand "what worked" in effective interventions. This study aimed to (a) assess whether an established taxonomy of 53 smoking cessation behavior change techniques (BCTs) may be applied or adapted as a method for reliably specifying the content of smoking cessation behavioral support consultations and (b) develop an effective method for training researchers and practitioners in the reliable application of the taxonomy. Fifteen transcripts of audio-recorded consultations delivered by England's Stop Smoking Services were coded into component BCTs using the taxonomy. Interrater reliability and potential adaptations to the taxonomy to improve coding were discussed following 3 coding waves. A coding training manual was developed through expert consensus and piloted on 10 trainees, assessing coding reliability and self-perceived competence before and after training. An average of 33 BCTs from the taxonomy were identified at least once across sessions and coding waves. Consultations contained on average 12 BCTs (range = 8-31). Average interrater reliability was high (88% agreement). The taxonomy was adapted to simplify coding by merging co-occurring BCTs and refining BCT definitions. Coding reliability and self-perceived competence significantly improved posttraining for all trainees. It is possible to apply a taxonomy to reliably identify and classify BCTs in smoking cessation behavioral support delivered in practice, and train inexperienced coders to do so reliably. This method can be used to investigate variability in provision of behavioral support across services, monitor fidelity of delivery, and identify training needs.
Reliability of the Colorado Family Support Assessment: A Self-Sufficiency Matrix for Families
ERIC Educational Resources Information Center
Richmond, Melissa K.; Pampel, Fred C.; Zarcula, Flavia; Howey, Virginia; McChesney, Brenda
2017-01-01
Purpose: Family support programs commonly use self-sufficiency matrices (SSMs) to measure family outcomes, however, validation research on SSMs is sparse. This study examined the reliability of the Colorado Family Support Assessment 2.0 (CFSA 2.0) to measure family self-reliance across 14 domains (e.g., employment). Methods: Ten written case…
Interhemispheric Inhibition Measurement Reliability in Stroke: A Pilot Study
Cassidy, Jessica M.; Chu, Haitao; Chen, Mo; Kimberley, Teresa J.; Carey, James R.
2016-01-01
Objective Reliable transcranial magnetic stimulation (TMS) measures for probing corticomotor excitability are important when assessing the physiological effects of non-invasive brain stimulation. The primary objective of this study was to examine test-retest reliability of an interhemispheric inhibition (IHI) index measurement in stroke. Materials and Methods Ten subjects with chronic stroke (≥ 6 months) completed two IHI testing sessions per week for three weeks (six testing sessions total). A single investigator measured IHI in the contra- to-ipsilesional primary motor cortex direction and in the opposite direction using bilateral paired-pulse TMS. Weekly sessions were separated by 24 hours with a 1-week washout period separating testing weeks. To determine if motor-evoked potential (MEP) quantification method affected measurement reliability, IHI indices computed from both MEP amplitude and area responses were found. Reliability was assessed with two-way, mixed intraclass correlation coefficients (ICC(3,k)). Standard error of measurement and minimal detectable difference statistics were also determined. Results With the exception of the initial testing week, IHI indices measured in the contra-to-ipsilesional hemisphere direction demonstrated moderate to excellent reliability (ICC = 0.725 – 0.913). Ipsi-to-contralesional IHI indices depicted poor or invalid reliability estimates throughout the three-week testing duration (ICC= −1.153 – 0.105). The overlap of ICC 95% confidence intervals suggested that IHI indices using MEP amplitude vs. area measures did not differ with respect to reliability. Conclusions IHI indices demonstrated varying magnitudes of reliability irrespective of MEP quantification method. Several strategies for improving IHI index measurement reliability are discussed. PMID:27333364
2013-01-01
Background In recent years response rates on telephone surveys have been declining. Rates for the behavioral risk factor surveillance system (BRFSS) have also declined, prompting the use of new methods of weighting and the inclusion of cell phone sampling frames. A number of scholars and researchers have conducted studies of the reliability and validity of the BRFSS estimates in the context of these changes. As the BRFSS makes changes in its methods of sampling and weighting, a review of reliability and validity studies of the BRFSS is needed. Methods In order to assess the reliability and validity of prevalence estimates taken from the BRFSS, scholarship published from 2004–2011 dealing with tests of reliability and validity of BRFSS measures was compiled and presented by topics of health risk behavior. Assessments of the quality of each publication were undertaken using a categorical rubric. Higher rankings were achieved by authors who conducted reliability tests using repeated test/retest measures, or who conducted tests using multiple samples. A similar rubric was used to rank validity assessments. Validity tests which compared the BRFSS to physical measures were ranked higher than those comparing the BRFSS to other self-reported data. Literature which undertook more sophisticated statistical comparisons was also ranked higher. Results Overall findings indicated that BRFSS prevalence rates were comparable to other national surveys which rely on self-reports, although specific differences are noted for some categories of response. BRFSS prevalence rates were less similar to surveys which utilize physical measures in addition to self-reported data. There is very little research on reliability and validity for some health topics, but a great deal of information supporting the validity of the BRFSS data for others. Conclusions Limitations of the examination of the BRFSS were due to question differences among surveys used as comparisons, as well as mode of data collection differences. As the BRFSS moves to incorporating cell phone data and changing weighting methods, a review of reliability and validity research indicated that past BRFSS landline only data were reliable and valid as measured against other surveys. New analyses and comparisons of BRFSS data which include the new methodologies and cell phone data will be needed to ascertain the impact of these changes on estimates in the future. PMID:23522349
Mian, Nicholas D.; Carter, Alice S.; Pine, Daniel S.; Wakschlag, Lauren S.; Briggs-Gowan, Margaret J.
2015-01-01
Background Identifying anxiety disorders in preschool-age children represents an important clinical challenge. Observation is essential to clinical assessment and can help differentiate normative variation from clinically significant anxiety. Yet, most anxiety assessment methods for young children rely on parent-reports. The goal of this article is to present and preliminarily test the reliability and validity of a novel observational paradigm for assessing a range of fearful and anxious behaviors in young children, the Anxiety Dimensional Observation Schedule (Anx-DOS). Methods A diverse sample of 403 children, aged 3 to 6 years, and their mothers was studied. Reliability and validity in relation to parent reports (Preschool Age Psychiatric Assessment) and known risk factors, including indicators of behavioral inhibition (latency to touch novel objects) and attention bias to threat (in the dot-probe task) were investigated. Results The Anx-DOS demonstrated good inter-rater reliability and internal consistency. Evidence for convergent validity was demonstrated relative to mother-reported separation anxiety, social anxiety, phobic avoidance, trauma symptoms, and past service use. Finally, fearfulness was associated with observed latency and attention bias toward threat. Conclusions Findings support the Anx-DOS as a method for capturing early manifestations of fearfulness and anxiety in young children. Multimethod assessments incorporating standardized methods for assessing discrete, observable manifestations of anxiety may be beneficial for early identification and clinical intervention efforts. PMID:25773515
Reliability and Validity of the TIMPSI for Infants With Spinal Muscular Atrophy Type I
Krosschell, Kristin J.; Maczulski, Jo Anne; Scott, Charles; King, Wendy; Hartman, Jill T.; Case, Laura E.; Viazzo-Trussell, Donata; Wood, Janine; Roman, Carolyn A.; Hecker, Eva; Meffert, Marianne; Léveillé, Maude; Kienitz, Krista; Swoboda, Kathryn J.
2014-01-01
Purpose This study examined the reliability and validity of the Test of Infant Motor Performance Screening Items (TIMPSI) in infants with type I spinal muscular atrophy (SMA). Methods After training, 12 evaluators scored 4 videos of infants with type I SMA to assess interrater reliability. Intrarater and test-retest reliability was further assessed for 9 evaluators during a SMA type I clinical trial, with 9 evaluators testing a total of 38 infants twice. Relatedness of the TIMPSI score to ability to reach and ventilatory support was also examined. Results Excellent interrater video score reliability was noted (intraclass correlation coefficient, 0.97–0.98). Intrarater reliability was excellent (intraclass correlation coefficient, 0.91–0.98) and test-retest reliability ranged from r = 0.82 to r = 0.95. The TIMPSI score was related to the ability to reach (P ≤ .05). Conclusion The TIMPSI can reliably be used to assess motor function in infants with type I SMA. In addition, the TIMPSI scores are related to the ability to reach, an important functional skill in children with type I SMA. PMID:23542189
Reliability and Validity of 3 Methods of Assessing Orthopedic Resident Skill in Shoulder Surgery.
Bernard, Johnathan A; Dattilo, Jonathan R; Srikumaran, Uma; Zikria, Bashir A; Jain, Amit; LaPorte, Dawn M
Traditional measures for evaluating resident surgical technical skills (e.g., case logs) assess operative volume but not level of surgical proficiency. Our goal was to compare the reliability and validity of 3 tools for measuring surgical skill among orthopedic residents when performing 3 open surgical approaches to the shoulder. A total of 23 residents at different stages of their surgical training were tested for technical skill pertaining to 3 shoulder surgical approaches using the following measures: Objective Structured Assessment of Technical Skills (OSATS) checklists, the Global Rating Scale (GRS), and a final pass/fail assessment determined by 3 upper extremity surgeons. Adverse events were recorded. The Cronbach α coefficient was used to assess reliability of the OSATS checklists and GRS scores. Interrater reliability was calculated with intraclass correlation coefficients. Correlations among OSATS checklist scores, GRS scores, and pass/fail assessment were calculated with Spearman ρ. Validity of OSATS checklists was determined using analysis of variance with postgraduate year (PGY) as a between-subjects factor. Significance was set at p < 0.05 for all tests. Criterion validity was shown between the OSATS checklists and GRS for the 3 open shoulder approaches. Checklist scores showed superior interrater reliability compared with GRS and subjective pass/fail measurements. GRS scores were positively correlated across training years. The incidence of adverse events was significantly higher among PGY-1 and PGY-2 residents compared with more experienced residents. OSATS checklists are a valid and reliable assessment of technical skills across 3 surgical shoulder approaches. However, checklist scores do not measure quality of technique. Documenting adverse events is necessary to assess quality of technique and ultimate pass/fail status. Multiple methods of assessing surgical skill should be considered when evaluating orthopedic resident surgical performance. Copyright © 2016 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Kramp, Kelvin H; van Det, Marc J; Veeger, Nic J G M; Pierie, Jean-Pierre E N
2016-06-01
There is no widely used method to evaluate procedure-specific laparoscopic skills. The first aim of this study was to develop a procedure-based assessment method. The second aim was to compare its validity, reliability and feasibility with currently available global rating scales (GRSs). An independence-scaled procedural assessment was created by linking the procedural key steps of the laparoscopic cholecystectomy to an independence scale. Subtitled and blinded videos of a novice, an intermediate and an almost competent trainee, were evaluated with GRSs (OSATS and GOALS) and the independence-scaled procedural assessment by seven surgeons, three senior trainees and six scrub nurses. Participants received a short introduction to the GRSs and independence-scaled procedural assessment before assessment. The validity was estimated with the Friedman and Wilcoxon test and the reliability with the intra-class correlation coefficient (ICC). A questionnaire was used to evaluate user opinion. Independence-scaled procedural assessment and GRS scores improved significantly with surgical experience (OSATS p = 0.001, GOALS p < 0.001, independence-scaled procedural assessment p < 0.001). The ICCs of the OSATS, GOALS and independence-scaled procedural assessment were 0.78, 0.74 and 0.84, respectively, among surgeons. The ICCs increased when the ratings of scrub nurses were added to those of the surgeons. The independence-scaled procedural assessment was not considered more of an administrative burden than the GRSs (p = 0.692). A procedural assessment created by combining procedural key steps to an independence scale is a valid, reliable and acceptable assessment instrument in surgery. In contrast to the GRSs, the reliability of the independence-scaled procedural assessment exceeded the threshold of 0.8, indicating that it can also be used for summative assessment. It furthermore seems that scrub nurses can assess the operative competence of surgical trainees.
Cramer, Emily
2016-01-01
Abstract Hospital performance reports often include rankings of unit pressure ulcer rates. Differentiating among units on the basis of quality requires reliable measurement. Our objectives were to describe and apply methods for assessing reliability of hospital‐acquired pressure ulcer rates and evaluate a standard signal‐noise reliability measure as an indicator of precision of differentiation among units. Quarterly pressure ulcer data from 8,199 critical care, step‐down, medical, surgical, and medical‐surgical nursing units from 1,299 US hospitals were analyzed. Using beta‐binomial models, we estimated between‐unit variability (signal) and within‐unit variability (noise) in annual unit pressure ulcer rates. Signal‐noise reliability was computed as the ratio of between‐unit variability to the total of between‐ and within‐unit variability. To assess precision of differentiation among units based on ranked pressure ulcer rates, we simulated data to estimate the probabilities of a unit's observed pressure ulcer rate rank in a given sample falling within five and ten percentiles of its true rank, and the probabilities of units with ulcer rates in the highest quartile and highest decile being identified as such. We assessed the signal‐noise measure as an indicator of differentiation precision by computing its correlations with these probabilities. Pressure ulcer rates based on a single year of quarterly or weekly prevalence surveys were too susceptible to noise to allow for precise differentiation among units, and signal‐noise reliability was a poor indicator of precision of differentiation. To ensure precise differentiation on the basis of true differences, alternative methods of assessing reliability should be applied to measures purported to differentiate among providers or units based on quality. © 2016 The Authors. Research in Nursing & Health published by Wiley Periodicals, Inc. PMID:27223598
Hayashi, Paul H.; Barnhart, Huiman X.; Fontana, Robert J.; Chalasani, Naga; Davern, Timothy J.; Talwalkar, Jayant A.; Reddy, K. Rajender; Stolz, Andrew A.; Hoofnagle, Jay H.; Rockey, Don C.
2014-01-01
Background Due to the lack of objective tests to diagnose drug induced liver injury (DILI), causality assessment is a matter of debate. Expert opinion is often used in research and industry but its test-retest reliability is unknown. Aims To determine the test-retest reliability of the expert opinion process used by the Drug-Induced Liver Injury Network (DILIN) Methods Three DILIN hepatologists adjudicate suspected hepatotoxicity cases to 1 of 5 categories representing levels of likelihood of DILI. Adjudication is based on retrospective assessment of gathered case data that includes prospective follow-up information. One hundred randomly selected DILIN cases were re-assessed using the same processes for initial assessment but by 3 different reviewers in 92% of cases. Results The median time between assessments was 938 days (range: 140–2352). Thirty-one cases involved >1 agent. Weighted kappa statistics for overall case and individual agent category agreement were 0.60 (95% CI: 0.50–0.71) and 0.60 (0.52–0.68), respectively. Overall case adjudications were within one category of each other 93% of the time, while 5% differed by 2 categories and 2% differed by 3 categories. Fourteen-percent crossed the 50% threshold of likelihood due to competing diagnoses or atypical timing between drug exposure and injury. Conclusions The DILIN expert opinion causality assessment method has moderate inter-observer reliability but very good agreement within 1 category. A small but important proportion of cases could not be reliably diagnosed as ≥ 50% likely to be DILI. PMID:24661785
Sarig Bahat, Hilla; Sprecher, Elliot; Sela, Itamar; Treleaven, Julia
2016-07-01
The use of virtual reality (VR) for assessment and intervention of neck pain has previously been used and shown reliable for cervical range of motion measures. Neck VR enables analysis of task-oriented neck movement by stimulating responsive movements to external stimuli. Therefore, the purpose of this study was to establish inter-tester reliability of neck kinematic measures so that it can be used as a reliable assessment and treatment tool between clinicians. This reliability study included 46 asymptomatic participants, who were assessed using the neck VR system which displayed an interactive VR scenario via a head-mounted device, controlled by neck movements. The objective of the interactive assessment was to hit 16 targets, randomly appearing in four directions, as fast as possible. Each participant was tested twice by two different testers. Good reliability was found of neck motion kinematic measures in flexion, extension, and rotation (0.64-0.93 inter-class correlation). High reliability was shown for peak velocity globally (0.93), in left rotation (0.9), right rotation and extension (0.88), and flexion (0.86). Mean velocity had a good global reliability (0.84), except for left rotation directed movement with moderate reliability (0.68). Minimal detectable change for peak velocity ranged from 41 to 53 °/s, while mean velocity ranged from 20 to 25 °/s. The results suggest high reliability for peak and mean velocity as measured by the interactive Neck VR assessment of neck motion kinematics. VR appears to provide a reliable and more ecologically valid method of cervical motion evaluation than previous conventional methodologies.
Digital assessment of the fetal alcohol syndrome facial phenotype: reliability and agreement study.
Tsang, Tracey W; Laing-Aiken, Zoe; Latimer, Jane; Fitzpatrick, James; Oscar, June; Carter, Maureen; Elliott, Elizabeth J
2017-01-01
To examine the three facial features of fetal alcohol syndrome (FAS) in a cohort of Australian Aboriginal children from two-dimensional digital facial photographs to: (1) assess intrarater and inter-rater reliability; (2) identify the racial norms with the best fit for this population; and (3) assess agreement with clinician direct measures. Photographs and clinical data for 106 Aboriginal children (aged 7.4-9.6 years) were sourced from the Lililwan Project . Fifty-eight per cent had a confirmed prenatal alcohol exposure and 13 (12%) met the Canadian 2005 criteria for FAS/partial FAS. Photographs were analysed using the FAS Facial Photographic Analysis Software to generate the mean PFL three-point ABC-Score, five-point lip and philtrum ranks and four-point face rank in accordance with the 4-Digit Diagnostic Code. Intrarater and inter-rater reliability of digital ratings was examined in two assessors. Caucasian or African American racial norms for PFL and lip thickness were assessed for best fit; and agreement between digital and direct measurement methods was assessed. Reliability of digital measures was substantial within (kappa: 0.70-1.00) and between assessors (kappa: 0.64-0.89). Clinician and digital ratings showed moderate agreement (kappa: 0.47-0.58). Caucasian PFL norms and the African American Lip-Philtrum Guide 2 provided the best fit for this cohort. In an Aboriginal cohort with a high rate of FAS, assessment of facial dysmorphology using digital methods showed substantial inter- and intrarater reliability. Digital measurement of features has high reliability and until data are available from a larger population of Aboriginal children, the African American Lip-Philtrum Guide 2 and Caucasian (Strömland) PFL norms provide the best fit for Australian Aboriginal children.
NASA Technical Reports Server (NTRS)
Sargusingh, Miriam J.; Nelson, Jason R.
2014-01-01
NASA has highlighted reliability as critical to future human space exploration, particularly in the area of environmental controls and life support systems. The Advanced Exploration Systems (AES) projects have been encouraged to pursue higher reliability components and systems as part of technology development plans. However, no consensus has been reached on what is meant by improving on reliability, or on how to assess reliability within the AES projects. This became apparent when trying to assess reliability as one of several figures of merit for a regenerable water architecture trade study. In the spring of 2013, the AES Water Recovery Project hosted a series of events at Johnson Space Center with the intended goal of establishing a common language and understanding of NASA's reliability goals, and equipping the projects with acceptable means of assessing the respective systems. This campaign included an educational series in which experts from across the agency and academia provided information on terminology, tools, and techniques associated with evaluating and designing for system reliability. The campaign culminated in a workshop that included members of the Environmental Control and Life Support System and AES communities. The goal of this workshop was to develop a consensus on what reliability means to AES and identify methods for assessing low- to mid-technology readiness level technologies for reliability. This paper details the results of that workshop.
ECLSS Reliability for Long Duration Missions Beyond Lower Earth Orbit
NASA Technical Reports Server (NTRS)
Sargusingh, Miriam J.; Nelson, Jason
2014-01-01
Reliability has been highlighted by NASA as critical to future human space exploration particularly in the area of environmental controls and life support systems. The Advanced Exploration Systems (AES) projects have been encouraged to pursue higher reliability components and systems as part of technology development plans. However there is no consensus on what is meant by improving on reliability; nor on how to assess reliability within the AES projects. This became apparent when trying to assess reliability as one of several figures of merit for a regenerable water architecture trade study. In the spring of 2013, the AES Water Recovery Project (WRP) hosted a series of events at the NASA Johnson Space Center (JSC) with the intended goal of establishing a common language and understanding of our reliability goals, and equipping the projects with acceptable means of assessing our respective systems. This campaign included an educational series in which experts from across the agency and academia provided information on terminology, tools and techniques associated with evalauating and designing for system reliability. The campaign culminated in a workshop at JSC with members of the ECLSS and AES communities with the goal of developing a consensus on what reliability means to AES and identifying methods for assessing our low to mid-technology readiness level (TRL) technologies for reliability. This paper details the results of the workshop.
ECLSS Reliability for Long Duration Missions Beyond Lower Earth Orbit
NASA Technical Reports Server (NTRS)
Sargusingh, Miriam J.; Nelson, Jason
2014-01-01
Reliability has been highlighted by NASA as critical to future human space exploration particularly in the area of environmental controls and life support systems. The Advanced Exploration Systems (AES) projects have been encouraged to pursue higher reliability components and systems as part of technology development plans. However, there is no consensus on what is meant by improving on reliability; nor on how to assess reliability within the AES projects. This became apparent when trying to assess reliability as one of several figures of merit for a regenerable water architecture trade study. In the Spring of 2013, the AES Water Recovery Project (WRP) hosted a series of events at the NASA Johnson Space Center (JSC) with the intended goal of establishing a common language and understanding of our reliability goals and equipping the projects with acceptable means of assessing our respective systems. This campaign included an educational series in which experts from across the agency and academia provided information on terminology, tools and techniques associated with evaluating and designing for system reliability. The campaign culminated in a workshop at JSC with members of the ECLSS and AES communities with the goal of developing a consensus on what reliability means to AES and identifying methods for assessing our low to mid-technology readiness level (TRL) technologies for reliability. This paper details the results of the workshop.
Reliability and validity of the Safe Routes to school parent and student surveys
2011-01-01
Background The purpose of this study is to assess the reliability and validity of the U.S. National Center for Safe Routes to School's in-class student travel tallies and written parent surveys. Over 65,000 tallies and 374,000 parent surveys have been completed, but no published studies have examined their measurement properties. Methods Students and parents from two Charlotte, NC (USA) elementary schools participated. Tallies were conducted on two consecutive days using a hand-raising protocol; on day two students were also asked to recall the previous days' travel. The recall from day two was compared with day one to assess 24-hour test-retest reliability. Convergent validity was assessed by comparing parent-reports of students' travel mode with student-reports of travel mode. Two-week test-retest reliability of the parent survey was assessed by comparing within-parent responses. Reliability and validity were assessed using kappa statistics. Results A total of 542 students participated in the in-class student travel tally reliability assessment and 262 parent-student dyads participated in the validity assessment. Reliability was high for travel to and from school (kappa > 0.8); convergent validity was lower but still high (kappa > 0.75). There were no differences by student grade level. Two-week test-retest reliability of the parent survey (n = 112) ranged from moderate to very high for objective questions on travel mode and travel times (kappa range: 0.62 - 0.97) but was substantially lower for subjective assessments of barriers to walking to school (kappa range: 0.31 - 0.76). Conclusions The student in-class student travel tally exhibited high reliability and validity at all elementary grades. The parent survey had high reliability on questions related to student travel mode, but lower reliability for attitudinal questions identifying barriers to walking to school. Parent survey design should be improved so that responses clearly indicate issues that influence parental decision making in regards to their children's mode of travel to school. PMID:21651794
Bedekar, Nilima; Suryawanshi, Mayuri; Rairikar, Savita; Sancheti, Parag; Shyam, Ashok
2014-01-01
Evaluation of range of motion (ROM) is integral part of assessment of musculoskeletal system. This is required in health fitness and pathological conditions; also it is used as an objective outcome measure. Several methods are described to check spinal flexion range of motion. Different methods for measuring spine ranges have their advantages and disadvantages. Hence, a new device was introduced in this study using the method of dual inclinometer to measure lumbar spine flexion range of motion (ROM). To determine Intra and Inter-rater reliability of mobile device goniometer in measuring lumbar flexion range of motion. iPod mobile device with goniometer software was used. The part being measure i.e the back of the subject was suitably exposed. Subject was standing with feet shoulder width apart. Spinous process of second sacral vertebra S2 and T12 were located, these were used as the reference points and readings were taken. Three readings were taken for each: inter-rater reliability as well as the intra-rater reliability. Sufficient rest was given between each flexion movement. Intra-rater reliability using ICC was r=0.920 and inter-rater r=0.812 at CI 95%. Validity r=0.95. Mobile device goniometer has high intra-rater reliability. The inter-rater reliability was moderate. This device can be used to assess range of motion of spine flexion, representing uni-planar movement.
Anderson, Donald D; Segal, Neil A; Kern, Andrew M; Nevitt, Michael C; Torner, James C; Lynch, John A
2012-01-01
Recent findings suggest that contact stress is a potent predictor of subsequent symptomatic osteoarthritis development in the knee. However, much larger numbers of knees (likely on the order of hundreds, if not thousands) need to be reliably analyzed to achieve the statistical power necessary to clarify this relationship. This study assessed the reliability of new semiautomated computational methods for estimating contact stress in knees from large population-based cohorts. Ten knees of subjects from the Multicenter Osteoarthritis Study were included. Bone surfaces were manually segmented from sequential 1.0 Tesla magnetic resonance imaging slices by three individuals on two nonconsecutive days. Four individuals then registered the resulting bone surfaces to corresponding bone edges on weight-bearing radiographs, using a semi-automated algorithm. Discrete element analysis methods were used to estimate contact stress distributions for each knee. Segmentation and registration reliabilities (day-to-day and interrater) for peak and mean medial and lateral tibiofemoral contact stress were assessed with Shrout-Fleiss intraclass correlation coefficients (ICCs). The segmentation and registration steps of the modeling approach were found to have excellent day-to-day (ICC 0.93-0.99) and good inter-rater reliability (0.84-0.97). This approach for estimating compartment-specific tibiofemoral contact stress appears to be sufficiently reliable for use in large population-based cohorts.
NASA Astrophysics Data System (ADS)
Yu, Bo; Ning, Chao-lie; Li, Bing
2017-03-01
A probabilistic framework for durability assessment of concrete structures in marine environments was proposed in terms of reliability and sensitivity analysis, which takes into account the uncertainties under the environmental, material, structural and executional conditions. A time-dependent probabilistic model of chloride ingress was established first to consider the variations in various governing parameters, such as the chloride concentration, chloride diffusion coefficient, and age factor. Then the Nataf transformation was adopted to transform the non-normal random variables from the original physical space into the independent standard Normal space. After that the durability limit state function and its gradient vector with respect to the original physical parameters were derived analytically, based on which the first-order reliability method was adopted to analyze the time-dependent reliability and parametric sensitivity of concrete structures in marine environments. The accuracy of the proposed method was verified by comparing with the second-order reliability method and the Monte Carlo simulation. Finally, the influences of environmental conditions, material properties, structural parameters and execution conditions on the time-dependent reliability of concrete structures in marine environments were also investigated. The proposed probabilistic framework can be implemented in the decision-making algorithm for the maintenance and repair of deteriorating concrete structures in marine environments.
Park, Hee-Won; Baek, Sora; Kim, Hong Young; Park, Jung-Gyoo; Kang, Eun Kyoung
2017-10-01
To investigate the reliability and validity of a new method for isometric back extensor strength measurement using a portable dynamometer. A chair equipped with a small portable dynamometer was designed (Power Track II Commander Muscle Tester). A total of 15 men (mean age, 34.8±7.5 years) and 15 women (mean age, 33.1±5.5 years) with no current back problems or previous history of back surgery were recruited. Subjects were asked to push the back of the chair while seated, and their isometric back extensor strength was measured by the portable dynamometer. Test-retest reliability was assessed with intraclass correlation coefficient (ICC). For the validity assessment, isometric back extensor strength of all subjects was measured by a widely used physical performance evaluation instrument, BTE PrimusRS system. The limit of agreement (LoA) from the Bland-Altman plot was evaluated between two methods. The test-retest reliability was excellent (ICC=0.82; 95% confidence interval, 0.65-0.91). The Bland-Altman plots demonstrated acceptable agreement between the two methods: the lower 95% LoA was -63.1 N and the upper 95% LoA was 61.1 N. This study shows that isometric back extensor strength measurement using a portable dynamometer has good reliability and validity.
Normative Data for an Instrumental Assessment of the Upper-Limb Functionality.
Caimmi, Marco; Guanziroli, Eleonora; Malosio, Matteo; Pedrocchi, Nicola; Vicentini, Federico; Molinari Tosatti, Lorenzo; Molteni, Franco
2015-01-01
Upper-limb movement analysis is important to monitor objectively rehabilitation interventions, contributing to improving the overall treatments outcomes. Simple, fast, easy-to-use, and applicable methods are required to allow routinely functional evaluation of patients with different pathologies and clinical conditions. This paper describes the Reaching and Hand-to-Mouth Evaluation Method, a fast procedure to assess the upper-limb motor control and functional ability, providing a set of normative data from 42 healthy subjects of different ages, evaluated for both the dominant and the nondominant limb motor performance. Sixteen of them were reevaluated after two weeks to perform test-retest reliability analysis. Data were clustered into three subgroups of different ages to test the method sensitivity to motor control differences. Experimental data show notable test-retest reliability in all tasks. Data from older and younger subjects show significant differences in the measures related to the ability for coordination thus showing the high sensitivity of the method to motor control differences. The presented method, provided with control data from healthy subjects, appears to be a suitable and reliable tool for the upper-limb functional assessment in the clinical environment.
Normative Data for an Instrumental Assessment of the Upper-Limb Functionality
Caimmi, Marco; Guanziroli, Eleonora; Malosio, Matteo; Pedrocchi, Nicola; Vicentini, Federico; Molinari Tosatti, Lorenzo; Molteni, Franco
2015-01-01
Upper-limb movement analysis is important to monitor objectively rehabilitation interventions, contributing to improving the overall treatments outcomes. Simple, fast, easy-to-use, and applicable methods are required to allow routinely functional evaluation of patients with different pathologies and clinical conditions. This paper describes the Reaching and Hand-to-Mouth Evaluation Method, a fast procedure to assess the upper-limb motor control and functional ability, providing a set of normative data from 42 healthy subjects of different ages, evaluated for both the dominant and the nondominant limb motor performance. Sixteen of them were reevaluated after two weeks to perform test-retest reliability analysis. Data were clustered into three subgroups of different ages to test the method sensitivity to motor control differences. Experimental data show notable test-retest reliability in all tasks. Data from older and younger subjects show significant differences in the measures related to the ability for coordination thus showing the high sensitivity of the method to motor control differences. The presented method, provided with control data from healthy subjects, appears to be a suitable and reliable tool for the upper-limb functional assessment in the clinical environment. PMID:26539500
Rochon, James; Protiva, Petr; Seeff, Leonard B.; Fontana, Robert J.; Liangpunsakul, Suthat; Watkins, Paul B.; Davern, Timothy; McHutchison, John G.
2013-01-01
The Roussel Uclaf Causality Assessment Method (RUCAM) was developed to quantify the strength of association between a liver injury and the medication implicated as causing the injury. However, its reliability in a research setting has never been fully explored. The aim of this study was to determine test-retest and interrater reliabilities of RUCAM in retrospectively-identified cases of drug induced liver injury. The Drug-Induced Liver Injury Network is enrolling well-defined cases of hepatotoxicity caused by isoniazid, phenytoin, clavulanate/amoxicillin, or valproate occurring since 1994. Each case was adjudicated by three reviewers working independently; after an interval of at least 5 months, cases were readjudicated by the same reviewers. A total of 40 drug-induced liver injury cases were enrolled including individuals treated with isoniazid (nine), phenytoin (five), clavulanate/amoxicillin (15), and valproate (11). Mean ± standard deviation age at protocol-defined onset was 44.8 ± 19.5 years; patients were 68% female and 78% Caucasian. Cases were classified as hepatocellular (44%), mixed (28%), or cholestatic (28%). Test-retest differences ranged from −7 to +8 with complete agreement in only 26% of cases. On average, the maximum absolute difference among the three reviewers was 3.1 on the first adjudication and 2.7 on the second, although much of this variability could be attributed to differences between the enrolling investigator and the external reviewers. The test-retest reliability by the same assessors was 0.54 (upper 95% confidence limit = 0.77); the interrater reliability was 0.45 (upper 95% confidence limit = 0.58). Categorizing the RUCAM to a five-category scale improved these reliabilities but only marginally. Conclusion The mediocre reliability of the RUCAM is problematic for future studies of drug-induced liver injury. Alternative methods, including modifying the RUCAM, developing drug-specific instruments, or causality assessment based on expert opinion, may be more appropriate. PMID:18798340
Barbado, David; Moreside, Janice; Vera-Garcia, Francisco J
2017-03-01
Although unstable seat methodology has been used to assess trunk postural control, the reliability of the variables that characterize it remains unclear. To analyze reliability and learning effect of center of pressure (COP) and kinematic parameters that characterize trunk postural control performance in unstable seating. The relationships between kinematic and COP parameters also were explored. Test-retest reliability design. Biomechanics laboratory setting. Twenty-three healthy male subjects. Participants volunteered to perform 3 sessions at 1-week intervals, each consisting of five 70-second balancing trials. A force platform and a motion capture system were used to measure COP and pelvis, thorax, and spine displacements. Reliability was assessed through standard error of measurement (SEM) and intraclass correlation coefficients (ICC 2,1 ) using 3 methods: (1) comparing the last trial score of each day; (2) comparing the best trial score of each day; and (3) calculating the average of the three last trial scores of each day. Standard deviation and mean velocity were calculated to assess balance performance. Although analyses of variance showed some differences in balance performance between days, these differences were not significant between days 2 and 3. Best result and average methods showed the greatest reliability. Mean velocity of the COP showed high reliability (0.71 < ICC < 0.86; 10.3 < SEM < 13.0), whereas standard deviation only showed a low to moderate reliability (0.37 < ICC < 0.61; 14.5 < SEM < 23.0). Regarding the kinematic variables, only pelvis displacement mean velocity achieved a high reliability using the average method (0.62 < ICC < 0.83; 18.8 < SEM < 23.1). Correlations between COP and kinematics were high only for mean velocity (0.45
2017-01-01
Objective To investigate the reliability and validity of a new method for isometric back extensor strength measurement using a portable dynamometer. Methods A chair equipped with a small portable dynamometer was designed (Power Track II Commander Muscle Tester). A total of 15 men (mean age, 34.8±7.5 years) and 15 women (mean age, 33.1±5.5 years) with no current back problems or previous history of back surgery were recruited. Subjects were asked to push the back of the chair while seated, and their isometric back extensor strength was measured by the portable dynamometer. Test-retest reliability was assessed with intraclass correlation coefficient (ICC). For the validity assessment, isometric back extensor strength of all subjects was measured by a widely used physical performance evaluation instrument, BTE PrimusRS system. The limit of agreement (LoA) from the Bland-Altman plot was evaluated between two methods. Results The test-retest reliability was excellent (ICC=0.82; 95% confidence interval, 0.65–0.91). The Bland-Altman plots demonstrated acceptable agreement between the two methods: the lower 95% LoA was −63.1 N and the upper 95% LoA was 61.1 N. Conclusion This study shows that isometric back extensor strength measurement using a portable dynamometer has good reliability and validity. PMID:29201818
NASA Technical Reports Server (NTRS)
Singhal, Surendra N.
2003-01-01
The SAE G-11 RMSL Division and Probabilistic Methods Committee meeting sponsored by the Picatinny Arsenal during March 1-3, 2004 at Westin Morristown, will report progress on projects for probabilistic assessment of Army system and launch an initiative for probabilistic education. The meeting features several Army and industry Senior executives and Ivy League Professor to provide an industry/government/academia forum to review RMSL technology; reliability and probabilistic technology; reliability-based design methods; software reliability; and maintainability standards. With over 100 members including members with national/international standing, the mission of the G-11s Probabilistic Methods Committee is to enable/facilitate rapid deployment of probabilistic technology to enhance the competitiveness of our industries by better, faster, greener, smarter, affordable and reliable product development.
Assessing the Reliability of Curriculum-Based Measurement: An Application of Latent Growth Modeling
ERIC Educational Resources Information Center
Yeo, Seungsoo; Kim, Dong-Il; Branum-Martin, Lee; Wayman, Miya Miura; Espin, Christine A.
2012-01-01
The purpose of this study was to demonstrate the use of Latent Growth Modeling (LGM) as a method for estimating reliability of Curriculum-Based Measurement (CBM) progress-monitoring data. The LGM approach permits the error associated with each measure to differ at each time point, thus providing an alternative method for examining of the…
Test-Retest Reliability of the Preschool Age Psychiatric Assessment (PAPA)
ERIC Educational Resources Information Center
Egger, Helen Link; Erkanli, Alaattin; Keeler, Gordon; Potts, Edward; Walter, Barbara Keith; Angold, Adrian
2006-01-01
Objective: To examine the test-retest reliability of a new interviewer-based psychiatric diagnostic measure (the Preschool Age Psychiatric Assessment) for use with parents of preschoolers 2 to 5 years old. Method: A total of 1,073 parents of children attending a large pediatric clinic completed the Child Behavior Checklist 1 1/2-5. For 18 months,…
ERIC Educational Resources Information Center
Wuang, Yee-Pay; Su, Jui-Hsing; Su, Chwen-Yng
2012-01-01
Aim: To examine the internal consistency, test-retest reliability, and responsiveness of the Movement Assessment Battery for Children--Second Edition (MABC-2) Test for children with developmental coordination disorder (DCD). Method: One hundred and forty-four Taiwanese children with DCD aged 6 to 12 years (87 males, 57 females) were tested on…
ERIC Educational Resources Information Center
Strand, Edythe A.; McCauley, Rebecca J.; Weigand, Stephen D.; Stoeckel, Ruth E.; Baas, Becky S.
2013-01-01
Purpose: In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Method: Participants were 81 children between 36 and 79 months of age who were referred to the…
A new real-time visual assessment method for faulty movement patterns during a jump-landing task.
Rabin, Alon; Levi, Ran; Abramowitz, Shai; Kozol, Zvi
2016-07-01
Determine the interrater reliability of a new real-time assessment of faulty movement patterns during a jump-landing task. Interrater reliability study. Human movement laboratory. 50 healthy females. Assessment included 6 items which were evaluated from a front and a side view. Two Physical Therapy students used a 9-point scale (0-8) to independently rate the quality of movement as good (0-2), moderate (3-5), or poor (6-8). Interrater reliability was expressed by percent agreement and weighted kappa. One examiner rated the quality of movement of 6 subjects as good, 34 subjects as moderate, and 10 subjects as poor. The second examiner rated the quality of movement of 12 subjects as good, 23 subjects as moderate, and 15 subjects as poor. Percent agreement and weighted kappa (95% confidence interval) were 78% and 0.68 (0.51, 0.85), respectively. A new real-time assessment of faulty movement patterns during jump-landing demonstrated adequate interrater reliability. Further study is warranted to validate this method against a motion analysis system, as well as to establish its predictive validity for injury. Copyright © 2015 Elsevier Ltd. All rights reserved.
Appraising the reliability of visual impact assessment methods
Nickolaus R. Feimer; Kenneth H. Craik; Richard C. Smardon; Stephen R.J. Sheppard
1979-01-01
This paper presents the research approach and selected results of an empirical investigation aimed at the evaluation of selected observer-based visual impact assessment (VIA) methods. The VIA methods under examination were chosen to cover a range of VIA methods currently in use in both applied and research settings. Variation in three facets of VIA methods were...
Zaki, Rafdzah; Bulgiba, Awang; Nordin, Noorhaire; Azina Ismail, Noor
2013-06-01
Reliability measures precision or the extent to which test results can be replicated. This is the first ever systematic review to identify statistical methods used to measure reliability of equipment measuring continuous variables. This studyalso aims to highlight the inappropriate statistical method used in the reliability analysis and its implication in the medical practice. In 2010, five electronic databases were searched between 2007 and 2009 to look for reliability studies. A total of 5,795 titles were initially identified. Only 282 titles were potentially related, and finally 42 fitted the inclusion criteria. The Intra-class Correlation Coefficient (ICC) is the most popular method with 25 (60%) studies having used this method followed by the comparing means (8 or 19%). Out of 25 studies using the ICC, only 7 (28%) reported the confidence intervals and types of ICC used. Most studies (71%) also tested the agreement of instruments. This study finds that the Intra-class Correlation Coefficient is the most popular method used to assess the reliability of medical instruments measuring continuous outcomes. There are also inappropriate applications and interpretations of statistical methods in some studies. It is important for medical researchers to be aware of this issue, and be able to correctly perform analysis in reliability studies.
ERIC Educational Resources Information Center
Lau, Wilfred W. F.; Yuen, Allan H. K.
2009-01-01
Recent years have seen a shift in focus from assessment of learning to assessment for learning and the emergence of alternative assessment methods. However, the reliability and validity of these methods as assessment tools are still questionable. In this article, we investigated the predictive validity of measures of the Pathfinder Scaling…
Effect of knee angle on neuromuscular assessment of plantar flexor muscles: A reliability study
Cornu, Christophe; Jubeau, Marc
2018-01-01
Introduction This study aimed to determine the intra- and inter-session reliability of neuromuscular assessment of plantar flexor (PF) muscles at three knee angles. Methods Twelve young adults were tested for three knee angles (90°, 30° and 0°) and at three time points separated by 1 hour (intra-session) and 7 days (inter-session). Electrical (H reflex, M wave) and mechanical (evoked and maximal voluntary torque, activation level) parameters were measured on the PF muscles. Intraclass correlation coefficients (ICC) and coefficients of variation were calculated to determine intra- and inter-session reliability. Results The mechanical measurements presented excellent (ICC>0.75) intra- and inter-session reliabilities regardless of the knee angle considered. The reliability of electrical measurements was better for the 90° knee angle compared to the 0° and 30° angles. Conclusions Changes in the knee angle may influence the reliability of neuromuscular assessments, which indicates the importance of considering the knee angle to collect consistent outcomes on the PF muscles. PMID:29596480
Savage, Trevor Nicholas; McIntosh, Andrew Stuart
2017-03-01
It is important to understand factors contributing to and directly causing sports injuries to improve the effectiveness and safety of sports skills. The characteristics of injury events must be evaluated and described meaningfully and reliably. However, many complex skills cannot be effectively investigated quantitatively because of ethical, technological and validity considerations. Increasingly, qualitative methods are being used to investigate human movement for research purposes, but there are concerns about reliability and measurement bias of such methods. Using the tackle in Rugby union as an example, we outline a systematic approach for developing a skill analysis protocol with a focus on improving objectivity, validity and reliability. Characteristics for analysis were selected using qualitative analysis and biomechanical theoretical models and epidemiological and coaching literature. An expert panel comprising subject matter experts provided feedback and the inter-rater reliability of the protocol was assessed using ten trained raters. The inter-rater reliability results were reviewed by the expert panel and the protocol was revised and assessed in a second inter-rater reliability study. Mean agreement in the second study improved and was comparable (52-90% agreement and ICC between 0.6 and 0.9) with other studies that have reported inter-rater reliability of qualitative analysis of human movement.
Kvistgaard Olsen, Jack; Fener, Dilay Kesgin; Waehrens, Eva Elisabet; Wulf Christensen, Anton; Jespersen, Anders; Danneskiold-Samsøe, Bente; Bartels, Else Marie
2017-07-01
Computerized pneumatic cuff pressure algometry (CPA) using the DoloCuff is a new method for pain assessment. Intra- and inter-rater reliabilities have not yet been established. Our aim was to examine the inter- and intrarater reliabilities of DoloCuff measures in healthy subjects. Twenty healthy subjects (ages 20 to 29 years) were assessed three times at 24-hour intervals by two trained raters. Inter-rater reliability was established based on the first and second assessments, whereas intrarater reliability was based on the second and third assessments. Subjects were randomized 1:1 to first assessment at either rater 1 or rater 2. The variables of interest were pressure pain threshold (PT), pressure pain tolerance (PTol), and temporal summation index (TSI). Reliability was estimated by a two-way mixed intraclass correlation coefficient (ICC) absolute agreement analysis. Reliability was considered excellent if ICC > 0.75, fair to good if 0.4 < ICC < 0.75, and poor if ICC < 0.4. Bias and random errors between raters and assessments were evaluated using 95% confidence interval (CI) and Bland-Altman plots. Inter-rater reliability for PT, PTol, and TSI was 0.88 (95% CI: 0.69 to 0.95), 0.86 (95% CI: 0.65 to 0.95), and 0.81 (95% CI: 0.42 to 0.94), respectively. The intrarater reliability for PT, PTol, and TSI was 0.81 (95% CI: 0.53 to 0.92), 0.89 (95% CI: 0.74 to 0.96), and 0.75 (95% CI: 0.28 to 0.91), respectively. Inter-rater reliability was excellent for PT, PTol, and TSI. Similarly, the intrarater reliability for PT and PTol was excellent, while borderline excellent/good for TSI. Therefore, the DoloCuff can be used to obtain reliable measures of pressure pain parameters in healthy subjects. © 2016 World Institute of Pain.
Reliable Multi Method Assessment of Metacognition Use in Chemistry Problem Solving
ERIC Educational Resources Information Center
Cooper, Melanie M.; Sandi-Urena, Santiago; Stevens, Ron
2008-01-01
Metacognition is fundamental in achieving understanding of chemistry and developing of problem solving skills. This paper describes an across-method-and-time instrument designed to assess the use of metacognition in chemistry problem solving. This multi method instrument combines a self report, namely the Metacognitive Activities Inventory…
Niedermann, K; Forster, A; Hammond, A; Uebelhart, D; de Bie, R
2007-03-15
Joint protection (JP) is an important part of the treatment concept for patients with rheumatoid arthritis (RA). The Joint Protection Behavior Assessment short form (JPBA-S) assesses the use of hand JP methods by patients with RA while preparing a hot drink. The purpose of this study was to develop a German version of the JPBA-S (D-JPBA-S) and to test its validity and reliability. A manual was developed through consensus with 8 occupational therapist (OT) experts as the reference for assessing patients' JP behavior. Twenty-four patients with RA and 10 healthy individuals were videotaped while performing 10 tasks reflecting the activity of preparing instant coffee. Recordings were repeated after 3 months for test-retest analysis. One rater assessed all available patient recordings (n = 23, recorded twice) for test-retest reliability. The video recordings of 10 randomly selected patients and all healthy individuals were independently assessed for interrater reliability by 6 OTs who were explicitly asked to follow the manual. Rasch analysis was performed to test construct validity and transform ordinal raw data into interval data for reliability calculations. Nine of the 10 tasks fit the Rasch model. The D-JPBA-S, consisting of 9 valid tasks, had an intraclass correlation coefficient of 0.77 for interrater reliability and 0.71 for test-retest reliability. The D-JPBA-S provides a valid and reliable instrument for assessing JP behavior of patients with RA and can be used in German-speaking countries.
Comprehensive classification test of scapular dyskinesis: A reliability study.
Huang, Tsun-Shun; Huang, Han-Yi; Wang, Tyng-Guey; Tsai, Yung-Shen; Lin, Jiu-Jenq
2015-06-01
Assessment of scapular dyskinesis (SD) is of clinical interest, as SD is believed to be related to shoulder pathology. However, no clinical assessment with sufficient reliability to identify SD and provide treatment strategies is available. The purpose of this study was to investigate the reliability of the comprehensive SD classification method. Cross-sectional reliability study. Sixty subjects with unilateral shoulder pain were evaluated by two independent physiotherapists with a visual-based palpation method. SD was classified as single abnormal scapular pattern [inferior angle (pattern I), medial border (pattern II), superior border of scapula prominence or abnormal scapulohumeral rhythm (pattern III)], a mixture of the above abnormal scapular patterns, or normal pattern (pattern IV). The assessment of SD was evaluated as subjects performed bilateral arm raising/lowering movements with a weighted load in the scapular plane. Percentage of agreement and kappa coefficients were calculated to determine reliability. Agreement between the 2 independent physiotherapists was 83% (50/60, 6 subjects as pattern III and 44 subjects as pattern IV) in the raising phase and 68% (41/60, 5 subjects as pattern I, 12 subjects as pattern II, 12 subjects as pattern IV, 12 subjects as mixed patterns I and II) in the lowering phase. The kappa coefficients were 0.49-0.64. We concluded that the visual-based palpation classification method for SD had moderate to substantial inter-rater reliability. The appearance of different types of SD was more pronounced in the lowering phase than in the raising phase of arm movements. Copyright © 2014 Elsevier Ltd. All rights reserved.
Casartelli, Nicola; Müller, Roland; Maffiuletti, Nicola A
2010-11-01
The aim of the present study was to verify the validity and reliability of the Myotest accelerometric system (Myotest SA, Sion, Switzerland) for the assessment of vertical jump height. Forty-four male basketball players (age range: 9-25 years) performed series of squat, countermovement and repeated jumps during 2 identical test sessions separated by 2-15 days. Flight height was simultaneously quantified with the Myotest system and validated photoelectric cells (Optojump). Two calculation methods were used to estimate the jump height from Myotest recordings: flight time (Myotest-T) and vertical takeoff velocity (Myotest-V). Concurrent validity was investigated comparing Myotest-T and Myotest-V to the criterion method (Optojump), and test-retest reliability was also examined. As regards validity, Myotest-T overestimated jumping height compared to Optojump (p < 0.001) with a systematic bias of approximately 7 cm, even though random errors were low (2.7 cm) and intraclass correlation coefficients (ICCs) where high (>0.98), that is, excellent validity. Myotest-V overestimated jumping height compared to Optojump (p < 0.001), with high random errors (>12 cm), high limits of agreement ratios (>36%), and low ICCs (<0.75), that is, poor validity. As regards reliability, Myotest-T showed high ICCs (range: 0.92-0.96), whereas Myotest-V showed low ICCs (range: 0.56-0.89), and high random errors (>9 cm). In conclusion, Myotest-T is a valid and reliable method for the assessment of vertical jump height, and its use is legitimate for field-based evaluations, whereas Myotest-V is neither valid nor reliable.
Test-retest and between-site reliability in a multicenter fMRI study.
Friedman, Lee; Stern, Hal; Brown, Gregory G; Mathalon, Daniel H; Turner, Jessica; Glover, Gary H; Gollub, Randy L; Lauriello, John; Lim, Kelvin O; Cannon, Tyrone; Greve, Douglas N; Bockholt, Henry Jeremy; Belger, Aysenil; Mueller, Bryon; Doty, Michael J; He, Jianchun; Wells, William; Smyth, Padhraic; Pieper, Steve; Kim, Seyoung; Kubicki, Marek; Vangel, Mark; Potkin, Steven G
2008-08-01
In the present report, estimates of test-retest and between-site reliability of fMRI assessments were produced in the context of a multicenter fMRI reliability study (FBIRN Phase 1, www.nbirn.net). Five subjects were scanned on 10 MRI scanners on two occasions. The fMRI task was a simple block design sensorimotor task. The impulse response functions to the stimulation block were derived using an FIR-deconvolution analysis with FMRISTAT. Six functionally-derived ROIs covering the visual, auditory and motor cortices, created from a prior analysis, were used. Two dependent variables were compared: percent signal change and contrast-to-noise-ratio. Reliability was assessed with intraclass correlation coefficients derived from a variance components analysis. Test-retest reliability was high, but initially, between-site reliability was low, indicating a strong contribution from site and site-by-subject variance. However, a number of factors that can markedly improve between-site reliability were uncovered, including increasing the size of the ROIs, adjusting for smoothness differences, and inclusion of additional runs. By employing multiple steps, between-site reliability for 3T scanners was increased by 123%. Dropping one site at a time and assessing reliability can be a useful method of assessing the sensitivity of the results to particular sites. These findings should provide guidance toothers on the best practices for future multicenter studies.
Reliability database development for use with an object-oriented fault tree evaluation program
NASA Technical Reports Server (NTRS)
Heger, A. Sharif; Harringtton, Robert J.; Koen, Billy V.; Patterson-Hine, F. Ann
1989-01-01
A description is given of the development of a fault-tree analysis method using object-oriented programming. In addition, the authors discuss the programs that have been developed or are under development to connect a fault-tree analysis routine to a reliability database. To assess the performance of the routines, a relational database simulating one of the nuclear power industry databases has been constructed. For a realistic assessment of the results of this project, the use of one of existing nuclear power reliability databases is planned.
Petrova, Tatjana; Kavookjian, Jan; Madson, Michael B; Dagley, John; Shannon, David; McDonough, Sharon K
2015-01-01
Motivational interviewing (MI) has demonstrated a significant impact as an intervention strategy for addiction management, change in lifestyle behaviors, and adherence to prescribed medication and other treatments. Key elements to studying MI include training in MI of professionals who will use it, assessment of skills acquisition in trainees, and the use of a validated skills assessment tool. The purpose of this research project was to develop a psychometrically valid and reliable tool that has been designed to assess MI skills competence in health care provider trainees. The goal was to develop an assessment tool that would evaluate the acquisition and use of specific MI skills and principles, as well as the quality of the patient-provider therapeutic alliance in brief health care encounters. To address this purpose, specific steps were followed, beginning with a literature review. This review contributed to the development of relevant conceptual and operational definitions, selecting a scaling technique and response format, and methods for analyzing validity and reliability. Internal consistency reliability was established on 88 video recorded interactions. The inter-rater and test-retest reliability were established using randomly selected 18 from the 88 interactions. The assessment tool Motivational Interviewing Skills for Health Care Encounters (MISHCE) and a manual for use of the tool were developed. Validity and reliability of MISHCE were examined. Face and content validity were supported with well-defined conceptual and operational definitions and feedback from an expert panel. Reliability was established through internal consistency, inter-rater reliability, and test-retest reliability. The overall internal consistency reliability (Cronbach's alpha) for all fifteen items was 0.75. MISHCE demonstrated good inter-rater reliability and good to excellent test-retest reliability. MISHCE assesses the health provider's level of knowledge and skills in brief disease management encounters. MISHCE also evaluates quality of the patient-provider therapeutic alliance, i.e., the "flow" of the interaction. Copyright © 2015 Elsevier Inc. All rights reserved.
Miciak, Jeremy; Taylor, Pat; Denton, Carolyn A.; Fletcher, Jack M.
2014-01-01
Purpose Few empirical investigations have evaluated learning disabilities (LD) identification methods based on a pattern of cognitive strengths and weaknesses (PSW). This study investigated the reliability of LD classification decisions of the concordance/discordance method (C/DM) across different psychoeducational assessment batteries. Methods C/DM criteria were applied to assessment data from 177 second grade students based on two psychoeducational assessment batteries. The achievement tests were different, but were highly correlated and measured the same latent construct. Resulting LD identifications were then evaluated for agreement across batteries on LD status and the academic domain of eligibility. Results The two batteries identified a similar number of participants as having LD (80 and 74). However, indices of agreement for classification decisions were low (kappa = .29), especially for percent positive agreement (62%). The two batteries demonstrated agreement on the academic domain of eligibility for only 25 participants. Conclusions Cognitive discrepancy frameworks for LD identification are inherently unstable because of imperfect reliability and validity at the observed level. Methods premised on identifying a PSW profile may never achieve high reliability because of these underlying psychometric factors. An alternative is to directly assess academic skills to identify students in need of intervention. PMID:25243467
van de Water, A T M; Benjamin, D R
2016-02-01
Systematic literature review. Diastasis of the rectus abdominis muscle (DRAM) has been linked with low back pain, abdominal and pelvic dysfunction. Measurement is used to either screen or to monitor DRAM width. Determining which methods are suitable for screening and monitoring DRAM is of clinical value. To identify the best methods to screen for DRAM presence and monitor DRAM width. AMED, Embase, Medline, PubMed and CINAHL databases were searched for measurement property studies of DRAM measurement methods. Population characteristics, measurement methods/procedures and measurement information were extracted from included studies. Quality of all studies was evaluated using 'quality rating criteria'. When possible, reliability generalisation was conducted to provide combined reliability estimations. Thirteen studies evaluated measurement properties of the 'finger width'-method, tape measure, calipers, ultrasound, CT and MRI. Ultrasound was most evaluated. Methodological quality of these studies varied widely. Pearson's correlations of r = 0.66-0.79 were found between calipers and ultrasound measurements. Calipers and ultrasound had Intraclass Correlation Coefficients (ICC) of 0.78-0.97 for test-retest, inter- and intra-rater reliability. The 'finger width'-method had weighted Kappa's of 0.73-0.77 for test-retest reliability, but moderate agreement (63%; weighted Kappa = 0.53) between raters. Comparing calipers and ultrasound, low measurement error was found (above the umbilicus), and the methods had good agreement (83%; weighted Kappa = 0.66) for discriminative purposes. The available information support ultrasound and calipers as adequate methods to assess DRAM. For other methods limited measurement information of low to moderate quality is available and further evaluation of their measurement properties is required. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
1991-01-01
The technical effort and computer code enhancements performed during the sixth year of the Probabilistic Structural Analysis Methods program are summarized. Various capabilities are described to probabilistically combine structural response and structural resistance to compute component reliability. A library of structural resistance models is implemented in the Numerical Evaluations of Stochastic Structures Under Stress (NESSUS) code that included fatigue, fracture, creep, multi-factor interaction, and other important effects. In addition, a user interface was developed for user-defined resistance models. An accurate and efficient reliability method was developed and was successfully implemented in the NESSUS code to compute component reliability based on user-selected response and resistance models. A risk module was developed to compute component risk with respect to cost, performance, or user-defined criteria. The new component risk assessment capabilities were validated and demonstrated using several examples. Various supporting methodologies were also developed in support of component risk assessment.
Assessing risk of injury in people with mental retardation living in an intermediate care facility.
Konarski, Edward A; Tassé, Marc
2005-09-01
A brief instrument to assess risk of injury was applied retrospectively for 2 years and prospectively for 1 year to all people living in a large ICF/MR. Results suggest that the percentage of people who experienced an injury significantly increased across the levels of increasing risk indicated by the assessment. Furthermore, people who experienced an injury had significantly higher risk scores than those who did not. Using psychometric analyses, we found a mean correlation of .79 for interrater reliability and .90 for test-retest reliability on individual items and correlations of .91 and .95, respectively, on total score. We conclude that the assessment has promise as a reliable and valid method for predicting injury risk level.
NASA Technical Reports Server (NTRS)
Wallace, Dolores R.
2003-01-01
In FY01 we learned that hardware reliability models need substantial changes to account for differences in software, thus making software reliability measurements more effective, accurate, and easier to apply. These reliability models are generally based on familiar distributions or parametric methods. An obvious question is 'What new statistical and probability models can be developed using non-parametric and distribution-free methods instead of the traditional parametric method?" Two approaches to software reliability engineering appear somewhat promising. The first study, begin in FY01, is based in hardware reliability, a very well established science that has many aspects that can be applied to software. This research effort has investigated mathematical aspects of hardware reliability and has identified those applicable to software. Currently the research effort is applying and testing these approaches to software reliability measurement, These parametric models require much project data that may be difficult to apply and interpret. Projects at GSFC are often complex in both technology and schedules. Assessing and estimating reliability of the final system is extremely difficult when various subsystems are tested and completed long before others. Parametric and distribution free techniques may offer a new and accurate way of modeling failure time and other project data to provide earlier and more accurate estimates of system reliability.
BAGLIO, MICHELLE L.; BAXTER, SUZANNE DOMEL; GUINN, CAROLINE H.; THOMPSON, WILLIAM O.; SHAFFER, NICOLE M.; FRYE, FRANCESCA H. A.
2005-01-01
This article (a) provides a general review of interobserver reliability (IOR) and (b) describes our method for assessing IOR for items and amounts consumed during school meals for a series of studies regarding the accuracy of fourth-grade children's dietary recalls validated with direct observation of school meals. A widely used validation method for dietary assessment is direct observation of meals. Although many studies utilize several people to conduct direct observations, few published studies indicate whether IOR was assessed. Assessment of IOR is necessary to determine that the information collected does not depend on who conducted the observation. Two strengths of our method for assessing IOR are that IOR was assessed regularly throughout the data collection period and that IOR was assessed for foods at the item and amount level instead of at the nutrient level. Adequate agreement among observers is essential to the reasoning behind using observation as a validation tool. Readers are encouraged to question the results of studies that fail to mention and/or to include the results for assessment of IOR when multiple people have conducted observations. PMID:15354155
ERIC Educational Resources Information Center
Dorn, Fred J.; And Others
1983-01-01
Reviews the inconsistent findings of studies on neurolinguistic programing and recommends some areas that should be examined to verify various claims. Discusses methods of assessing client's primary representational systems, including predicate usage and eye movements, and suggests that more reliable methods of assessing PRS must be found. (JAC)
Taghipour, Morteza; Mohseni-Bandpei, Mohammad Ali; Behtash, Hamid; Abdollahi, Iraj; Rajabzadeh, Fatemeh; Pourahmadi, Mohammad Reza; Emami, Mahnaz
2018-04-24
Rehabilitative ultrasound (US) imaging is one of the popular methods for investigating muscle morphologic characteristics and dimensions in recent years. The reliability of this method has been investigated in different studies. As studies have been performed with different designs and quality, reported values of rehabilitative US have a wide range. The objective of this study was to systematically review the literature conducted on the reliability of rehabilitative US imaging for the assessment of deep abdominal and lumbar trunk muscle dimensions. The PubMed/MEDLINE, Scopus, Google Scholar, Science Direct, Embase, Physiotherapy Evidence, Ovid, and CINAHL databases were searched to identify original research articles conducted on the reliability of rehabilitative US imaging published from June 2007 to August 2017. The articles were qualitatively assessed; reliability data were extracted; and the methodological quality was evaluated by 2 independent reviewers. Of the 26 included studies, 16 were considered of high methodological quality. Except for 2 studies, all high-quality studies reported intraclass correlation coefficients (ICCs) for intra-rater reliability of 0.70 or greater. Also, ICCs reported for inter-rater reliability in high-quality studies were generally greater than 0.70. Among low-quality studies, reported ICCs ranged from 0.26 to 0.99 and 0.68 to 0.97 for intra- and inter-rater reliability, respectively. Also, the reported standard error of measurement and minimal detectable change for rehabilitative US were generally in an acceptable range. Generally, the results of the reviewed studies indicate that rehabilitative US imaging has good levels of both inter- and intra-rater reliability. © 2018 by the American Institute of Ultrasound in Medicine.
ERIC Educational Resources Information Center
Embiza, Samuel; Hadush, Selamawit
2015-01-01
The purpose of this study was to assess the dimensionality and reliability of Teachers Evaluation Questionnaire in Eastern Zone high school; Tigrai National Regional State which was filled by school principal. To this end: 9 high schools in 7 woredas were selected using the lottery method, in which 459 teachers' rate forms were collected. All…
ERIC Educational Resources Information Center
Kerr, Jacqueline; Sallis, James F.; Bromby, Erica; Glanz, Karen
2012-01-01
Objective: To evaluate reliability and validity of a new tool for assessing the placement and promotional environment in grocery stores. Methods: Trained observers used the "GroPromo" instrument in 40 stores to code the placement of 7 products in 9 locations within a store, along with other promotional characteristics. To test construct validity,…
Reliability of ^1^H NMR analysis for assessment of lipid oxidation at frying temperatures
USDA-ARS?s Scientific Manuscript database
The reliability of a method using ^1^H NMR analysis for assessment of oil oxidation at a frying temperature was examined. During heating and frying at 180 °C, changes of soybean oil signals in the ^1^H NMR spectrum including olefinic (5.16-5.30 ppm), bisallylic (2.70-2.88 ppm), and allylic (1.94-2.1...
ERIC Educational Resources Information Center
Gustafsson, Peik; Svedin, Carl Goran; Ericsson, Ingegerd; Linden, Christian; Karlsson, Magnus K.; Thernlund, Gunilla
2010-01-01
Aim: To study the value and reliability of an examination of neurological soft-signs, often used in Sweden, in the assessment of children with attention-deficit-hyperactivity disorder (ADHD), by examining children with and without ADHD, as diagnosed by an experienced clinician using the DSM-III-R. Method: We have examined interrater reliability…
ERIC Educational Resources Information Center
Furr-Holden, C. D. M.; Campbell, K. D. M.; Milam, A. J.; Smart, M. J.; Ialongo, N. A.; Leaf, P. J.
2010-01-01
Objectives: Establish metric properties of the Neighborhood Inventory for Environmental Typology (NIfETy). Method: A total of 919 residential block faces were assessed by paired raters using the NIfETy. Reliability was evaluated via interrater and internal consistency reliability; validity by comparing NIfETy data with youth self-reported…
Guetterman, Timothy C; Creswell, John W; Wittink, Marsha; Barg, Fran K; Castro, Felipe G; Dahlberg, Britt; Watkins, Daphne C; Deutsch, Charles; Gallo, Joseph J
2017-01-01
Demand for training in mixed methods is high, with little research on faculty development or assessment in mixed methods. We describe the development of a self-rated mixed methods skills assessment and provide validity evidence. The instrument taps six research domains: "Research question," "Design/approach," "Sampling," "Data collection," "Analysis," and "Dissemination." Respondents are asked to rate their ability to define or explain concepts of mixed methods under each domain, their ability to apply the concepts to problems, and the extent to which they need to improve. We administered the questionnaire to 145 faculty and students using an internet survey. We analyzed descriptive statistics and performance characteristics of the questionnaire using the Cronbach alpha to assess reliability and an analysis of variance that compared a mixed methods experience index with assessment scores to assess criterion relatedness. Internal consistency reliability was high for the total set of items (0.95) and adequate (≥0.71) for all but one subscale. Consistent with establishing criterion validity, respondents who had more professional experiences with mixed methods (eg, published a mixed methods article) rated themselves as more skilled, which was statistically significant across the research domains. This self-rated mixed methods assessment instrument may be a useful tool to assess skills in mixed methods for training programs. It can be applied widely at the graduate and faculty level. For the learner, assessment may lead to enhanced motivation to learn and training focused on self-identified needs. For faculty, the assessment may improve curriculum and course content planning.
Reliability and Probabilistic Risk Assessment - How They Play Together
NASA Technical Reports Server (NTRS)
Safie, Fayssal; Stutts, Richard; Huang, Zhaofeng
2015-01-01
Since the Space Shuttle Challenger accident in 1986, NASA has extensively used probabilistic analysis methods to assess, understand, and communicate the risk of space launch vehicles. Probabilistic Risk Assessment (PRA), used in the nuclear industry, is one of the probabilistic analysis methods NASA utilizes to assess Loss of Mission (LOM) and Loss of Crew (LOC) risk for launch vehicles. PRA is a system scenario based risk assessment that uses a combination of fault trees, event trees, event sequence diagrams, and probability distributions to analyze the risk of a system, a process, or an activity. It is a process designed to answer three basic questions: 1) what can go wrong that would lead to loss or degraded performance (i.e., scenarios involving undesired consequences of interest), 2) how likely is it (probabilities), and 3) what is the severity of the degradation (consequences). Since the Challenger accident, PRA has been used in supporting decisions regarding safety upgrades for launch vehicles. Another area that was given a lot of emphasis at NASA after the Challenger accident is reliability engineering. Reliability engineering has been a critical design function at NASA since the early Apollo days. However, after the Challenger accident, quantitative reliability analysis and reliability predictions were given more scrutiny because of their importance in understanding failure mechanism and quantifying the probability of failure, which are key elements in resolving technical issues, performing design trades, and implementing design improvements. Although PRA and reliability are both probabilistic in nature and, in some cases, use the same tools, they are two different activities. Specifically, reliability engineering is a broad design discipline that deals with loss of function and helps understand failure mechanism and improve component and system design. PRA is a system scenario based risk assessment process intended to assess the risk scenarios that could lead to a major/top undesirable system event, and to identify those scenarios that are high-risk drivers. PRA output is critical to support risk informed decisions concerning system design. This paper describes the PRA process and the reliability engineering discipline in detail. It discusses their differences and similarities and how they work together as complementary analyses to support the design and risk assessment processes. Lessons learned, applications, and case studies in both areas are also discussed in the paper to demonstrate and explain these differences and similarities.
Non-therapist identification of falling hazards in older adult homes using digital photography.
Ritchey, Katherine C; Meyer, Deborah; Ice, Gillian H
2015-01-01
Evaluation and removal of home hazards is an invaluable method for preventing in-home falls and preserving independent living. Current processes for conducting home hazard assessments are impractical from a whole population standpoint given the substantial resources required for implementation. Digital photography offers an opportunity to remotely evaluate an environment for falling hazards. However, reliability of this method has only been tested under the direction of skilled therapists. Ten community dwelling adults over the age of 65 were recruited from local primary care practices between July, 2009 and February, 2010. In-home (IH) assessments were completed immediately after a photographer, blinded to the assessment form, took digital photographs (DP) of the participant home. A different non-therapist assessor then reviewed the photographs and completed a second assessment of the home. Kappa statistic was used to analyze the reliability between the two independent assessments. Home assessments completed by a non-therapist using digital photographs had a substantial agreement (Kappa = 0.61, p < 0.001) with in-home assessments completed by another non-therapist. Additionally, the DP assessments agreed with the IH assessments on the presence or absence of items 96.8% of the time. This study showed that non-therapists can reliably conduct home hazard evaluations using digital photographs.
Kerosuo, E; Kolehmainen, L
1982-01-01
The susceptibility of a tooth to dental caries has been proposed to depend on tooth color. So far there has, however, been no reliable method for tooth color determination. The aims of this study were to evaluate the reliability of an opto-electronic method and to examine the relationship between tooth color and past caries experience. The color of upper right central incisors of 64 school-children was determined using an opto-electronic tri-stimulus color comparator. The intra- and interexaminer reliability of the method was evaluated in vitro and in vivo being 85% and 83%, respectively. To assess the past caries experience the DMFS-index was calculated. Oral hygiene and dietary habits were also assessed. No significant difference in DMFS scores was obtained between the 'white teeth' group and the 'yellow teeth' group. The conclusion is, that the practical importance of possible colorrelated differences in caries resistance is negligible due to the multifaceted nature of dental caries.
A low-cost, tablet-based option for prehospital neurologic assessment
Chapman Smith, Sherita N.; Govindarajan, Prasanthi; Padrick, Matthew M.; Lippman, Jason M.; McMurry, Timothy L.; Resler, Brian L.; Keenan, Kevin; Gunnell, Brian S.; Mehndiratta, Prachi; Chee, Christina Y.; Cahill, Elizabeth A.; Dietiker, Cameron; Cattell-Gordon, David C.; Smith, Wade S.; Perina, Debra G.; Solenski, Nina J.; Worrall, Bradford B.
2016-01-01
Objectives: In this 2-center study, we assessed the technical feasibility and reliability of a low cost, tablet-based mobile telestroke option for ambulance transport and hypothesized that the NIH Stroke Scale (NIHSS) could be performed with similar reliability between remote and bedside examinations. Methods: We piloted our mobile telemedicine system in 2 geographic regions, central Virginia and the San Francisco Bay Area, utilizing commercial cellular networks for videoconferencing transmission. Standardized patients portrayed scripted stroke scenarios during ambulance transport and were evaluated by independent raters comparing bedside to remote mobile telestroke assessments. We used a mixed-effects regression model to determine intraclass correlation of the NIHSS between bedside and remote examinations (95% confidence interval). Results: We conducted 27 ambulance runs at both sites and successfully completed the NIHSS for all prehospital assessments without prohibitive technical interruption. The mean difference between bedside (face-to-face) and remote (video) NIHSS scores was 0.25 (1.00 to −0.50). Overall, correlation of the NIHSS between bedside and mobile telestroke assessments was 0.96 (0.92–0.98). In the mixed-effects regression model, there were no statistically significant differences accounting for method of evaluation or differences between sites. Conclusions: Utilizing a low-cost, tablet-based platform and commercial cellular networks, we can reliably perform prehospital neurologic assessments in both rural and urban settings. Further research is needed to establish the reliability and validity of prehospital mobile telestroke assessment in live patients presenting with acute neurologic symptoms. PMID:27281534
Reliability analysis of the objective structured clinical examination using generalizability theory.
Trejo-Mejía, Juan Andrés; Sánchez-Mendiola, Melchor; Méndez-Ramírez, Ignacio; Martínez-González, Adrián
2016-01-01
Background The objective structured clinical examination (OSCE) is a widely used method for assessing clinical competence in health sciences education. Studies using this method have shown evidence of validity and reliability. There are no published studies of OSCE reliability measurement with generalizability theory (G-theory) in Latin America. The aims of this study were to assess the reliability of an OSCE in medical students using G-theory and explore its usefulness for quality improvement. Methods An observational cross-sectional study was conducted at National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City. A total of 278 fifth-year medical students were assessed with an 18-station OSCE in a summative end-of-career final examination. There were four exam versions. G-theory with a crossover random effects design was used to identify the main sources of variance. Examiners, standardized patients, and cases were considered as a single facet of analysis. Results The exam was applied to 278 medical students. The OSCE had a generalizability coefficient of 0.93. The major components of variance were stations, students, and residual error. The sites and the versions of the tests had minimum variance. Conclusions Our study achieved a G coefficient similar to that found in other reports, which is acceptable for summative tests. G-theory allows the estimation of the magnitude of multiple sources of error and helps decision makers to determine the number of stations, test versions, and examiners needed to obtain reliable measurements.
Reliability evaluation of microgrid considering incentive-based demand response
NASA Astrophysics Data System (ADS)
Huang, Ting-Cheng; Zhang, Yong-Jun
2017-07-01
Incentive-based demand response (IBDR) can guide customers to adjust their behaviour of electricity and curtail load actively. Meanwhile, distributed generation (DG) and energy storage system (ESS) can provide time for the implementation of IBDR. The paper focus on the reliability evaluation of microgrid considering IBDR. Firstly, the mechanism of IBDR and its impact on power supply reliability are analysed. Secondly, the IBDR dispatch model considering customer’s comprehensive assessment and the customer response model are developed. Thirdly, the reliability evaluation method considering IBDR based on Monte Carlo simulation is proposed. Finally, the validity of the above models and method is studied through numerical tests on modified RBTS Bus6 test system. Simulation results demonstrated that IBDR can improve the reliability of microgrid.
Reliability of proton NMR spectroscopy for the assessment of frying oil oxidation
USDA-ARS?s Scientific Manuscript database
Although there are many analytical methods developed to assess oxidation of edible oil, it is still common to see a lack of consistency in results from different methods. This inconsistency is expected since there are numerous oxidation products and any analytical method measuring only one kind of o...
The NMR analysis of frying oil: a very reliable method for assessment of lipid oxidation
USDA-ARS?s Scientific Manuscript database
There are many analytical methods developed for the assessment of lipid oxidation. However, one of the most challenging issues in analyzing oil oxidation is that there is lack of consistency in results obtained from different analytical methods. The major reason for the inconsistency is that most me...
Devosa, Iván; Kozinszky, Zoltán; Vanya, Melinda; Szili, Károly; Fáyné Dombi, Alice; Barabás, Katalin
2016-04-03
Promiscuity and lack of use of reliable contraceptive methods increase the probability of sexually transmitted diseases and the risk of unwanted pregnancies, which are quite common among university students. The aim of the study was to assess the knowledge of university students about reliable contraceptive methods and sexually transmitted diseases, and to assess the effectiveness of the sexual health education in secondary schools, with specific focus on the education held by peers. An anonymous, self-administered questionnaire survey was carried out in a randomized sample of students at the University of Szeged (n = 472, 298 women and 174 men, average age 21 years) between 2009 and 2011. 62.1% of the respondents declared that reproductive health education lessons in high schools held by peers were reliable and authentic source of information, 12.3% considered as a less reliable source, and 25.6% defined the school health education as irrelevant source. Among those, who considered the health education held by peers as a reliable source, there were significantly more females (69.3% vs. 46.6%, p = 0.001), significantly fewer lived in cities (83.6% vs. 94.8%, p = 0.025), and significantly more responders knew that Candida infection can be transmitted through sexual intercourse (79.5% versus 63.9%, p = 0.02) as compared to those who did not consider health education held by peers as a reliable source. The majority of respondents obtained knowledge about sexual issues from the mass media. Young people who considered health educating programs reliable were significantly better informed about Candida disease.
Charlton, Paula C; Mentiplay, Benjamin F; Pua, Yong-Hao; Clark, Ross A
2015-05-01
Traditional methods of assessing joint range of motion (ROM) involve specialized tools that may not be widely available to clinicians. This study assesses the reliability and validity of a custom Smartphone application for assessing hip joint range of motion. Intra-tester reliability with concurrent validity. Passive hip joint range of motion was recorded for seven different movements in 20 males on two separate occasions. Data from a Smartphone, bubble inclinometer and a three dimensional motion analysis (3DMA) system were collected simultaneously. Intraclass correlation coefficients (ICCs), coefficients of variation (CV) and standard error of measurement (SEM) were used to assess reliability. To assess validity of the Smartphone application and the bubble inclinometer against the three dimensional motion analysis system, intraclass correlation coefficients and fixed and proportional biases were used. The Smartphone demonstrated good to excellent reliability (ICCs>0.75) for four out of the seven movements, and moderate to good reliability for the remaining three movements (ICC=0.63-0.68). Additionally, the Smartphone application displayed comparable reliability to the bubble inclinometer. The Smartphone application displayed excellent validity when compared to the three dimensional motion analysis system for all movements (ICCs>0.88) except one, which displayed moderate to good validity (ICC=0.71). Smartphones are portable and widely available tools that are mostly reliable and valid for assessing passive hip range of motion, with potential for large-scale use when a bubble inclinometer is not available. However, caution must be taken in its implementation as some movement axes demonstrated only moderate reliability. Copyright © 2014 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Heres, H M; Schoots, T; Tchang, B C Y; Rutten, M C M; Kemps, H M C; van de Vosse, F N; Lopata, R G P
2018-06-01
Assessment of limitations in the perfusion dynamics of skeletal muscle may provide insight in the pathophysiology of exercise intolerance in, e.g., heart failure patients. Power doppler ultrasound (PDUS) has been recognized as a sensitive tool for the detection of muscle blood flow. In this volunteer study (N = 30), a method is demonstrated for perfusion measurements in the vastus lateralis muscle, with PDUS, during standardized cycling exercise protocols, and the test-retest reliability has been investigated. Fixation of the ultrasound probe on the upper leg allowed for continuous PDUS measurements. Cycling exercise protocols included a submaximal and an incremental exercise to maximal power. The relative perfused area (RPA) was determined as a measure of perfusion. Absolute and relative reliability of RPA amplitude and kinetic parameters during exercise (onset, slope, maximum value) and recovery (overshoot, decay time constants) were investigated. A RPA increase during exercise followed by a signal recovery was measured in all volunteers. Amplitudes and kinetic parameters during exercise and recovery showed poor to good relative reliability (ICC ranging from 0.2-0.8), and poor to moderate absolute reliability (coefficient of variation (CV) range 18-60%). A method has been demonstrated which allows for continuous (Power Doppler) ultrasonography and assessment of perfusion dynamics in skeletal muscle during exercise. The reliability of the RPA amplitudes and kinetics ranges from poor to good, while the reliability of the RPA increase in submaximal cycling (ICC = 0.8, CV = 18%) is promising for non-invasive clinical assessment of the muscle perfusion response to daily exercise.
Wu, Yu-Tzu; Nash, Paul; Barnes, Linda E; Minett, Thais; Matthews, Fiona E; Jones, Andy; Brayne, Carol
2014-10-22
An association between depressive symptoms and features of built environment has been reported in the literature. A remaining research challenge is the development of methods to efficiently capture pertinent environmental features in relevant study settings. Visual streetscape images have been used to replace traditional physical audits and directly observe the built environment of communities. The aim of this work is to examine the inter-method reliability of the two audit methods for assessing community environments with a specific focus on physical features related to mental health. Forty-eight postcodes in urban and rural areas of Cambridgeshire, England were randomly selected from an alphabetical list of streets hosted on a UK property website. The assessment was conducted in July and August 2012 by both physical and visual image audits based on the items in Residential Environment Assessment Tool (REAT), an observational instrument targeting the micro-scale environmental features related to mental health in UK postcodes. The assessor used the images of Google Street View and virtually "walked through" the streets to conduct the property and street level assessments. Gwet's AC1 coefficients and Bland-Altman plots were used to compare the concordance of two audits. The results of conducting the REAT by visual image audits generally correspond to direct observations. More variations were found in property level items regarding physical incivilities, with broad limits of agreement which importantly lead to most of the variation in the overall REAT score. Postcodes in urban areas had lower consistency between the two methods than rural areas. Google Street View has the potential to assess environmental features related to mental health with fair reliability and provide a less resource intense method of assessing community environments than physical audits.
ERIC Educational Resources Information Center
Menold, Natalja; Tausch, Anja
2016-01-01
Effects of rating scale forms on cross-sectional reliability and measurement equivalence were investigated. A randomized experimental design was implemented, varying category labels and number of categories. The participants were 800 students at two German universities. In contrast to previous research, reliability assessment method was used,…
Zhang, Dengke; Pang, Yanxia; Cai, Weixiong; Fazio, Rachel L; Ge, Jianrong; Su, Qiaorong; Xu, Shuiqin; Pan, Yinan; Chen, Sanmei; Zhang, Hongwei
2016-08-01
Impairment of theory of mind (ToM) is a common phenomenon following traumatic brain injury (TBI) that has clear effects on patients' social functioning. A growing body of research has focused on this area, and several methods have been developed to assess ToM deficiency. Although an informant assessment scale would be useful for examining individuals with TBI, very few studies have adopted this approach. The purpose of the present study was to develop an informant assessment scale of ToM for adults with traumatic brain injury (IASToM-aTBI) and to test its reliability and validity with 196 adults with TBI and 80 normal adults. A 44-item scale was developed following a literature review, interviews with patient informants, consultations with experts, item analysis, and exploratory factor analysis (EFA). The following three common factors were extracted: social interaction, understanding of beliefs, and understanding of emotions. The psychometric analyses indicate that the scale has good internal consistency reliability, split-half reliability, test-retest reliability, inter-rater reliability, structural validity, discriminate validity and criterion validity. These results provide preliminary evidence that supports the reliability and validity of the IASToM-aTBI as a ToM assessment tool for adults with TBI.
Reliability of temporal summation and diffuse noxious inhibitory control
Cathcart, Stuart; Winefield, Anthony H; Rolan, Paul; Lushington, Kurt
2009-01-01
BACKGROUND: The test-retest reliability of temporal summation (TS) and diffuse noxious inhibitory control (DNIC) has not been reported to date. Establishing such reliability would support the possibility of future experimental studies examining factors affecting TS and DNIC. Similarly, the use of manual algometry to induce TS, or an occlusion cuff to induce DNIC of TS to mechanical stimuli, has not been reported to date. Such devices may offer a simpler method than current techniques for inducing TS and DNIC, affording assessment at more anatomical locations and in more varied research settings. METHOD: The present study assessed the test-retest reliability of TS and DNIC using the above techniques. Sex differences on these measures were also investigated. RESULTS: Repeated measures ANOVA indicated successful induction of TS and DNIC, with no significant differences across test-retest occasions. Sex effects were not significant for any measure or interaction. Intraclass correlations indicated high test-retest reliability for all measures; however, there was large interindividual variation between test and retest measurements. CONCLUSION: The present results indicate acceptable within-session test-retest reliability of TS and DNIC. The results support the possibility of future experimental studies examining factors affecting TS and DNIC. PMID:20011713
Quinn, Amity E; Rosen, Rochelle K; McGeary, John E; Amoa, Francine; Kranzler, Henry R; Francazio, Sarah; McGarvey, Stephen T; Swift, Robert M
2014-01-01
The aims of this study were to develop a bilingual version of the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA) in English and Samoan and determine the reliability of assessments of alcohol dependence in American Samoa. The study consisted of development and reliability-testing phases. In the development phase, the SSADDA alcohol module was translated and the translation was evaluated through cognitive interviews. In the reliability-testing phase, the bilingual SSADDA was administered to 40 ethnic Samoans, including a sub-sample of 26 individuals who were retested. Cognitive interviews indicated the initial translation was culturally and linguistically appropriate except items pertaining to alcohol tolerance, which were modified to reflect Samoan concepts. SSADDA reliability testing indicated diagnoses of DSM-III-R and DSM-IV alcohol dependence were reliable. Reliability varied by language of administration. The English/Samoan version of the SSADDA is appropriate for the diagnosis of DSM-III-R alcohol dependence, which may be useful in advancing research and public health efforts to address alcohol problems in American Samoa and the Western Pacific. The translation methods may inform researchers translating diagnostic and assessment tools into different languages and cultures. © The Author 2014. Medical Council on Alcohol and Oxford University Press. All rights reserved.
Neurology objective structured clinical examination reliability using generalizability theory
Park, Yoon Soo; Lukas, Rimas V.; Brorson, James R.
2015-01-01
Objectives: This study examines factors affecting reliability, or consistency of assessment scores, from an objective structured clinical examination (OSCE) in neurology through generalizability theory (G theory). Methods: Data include assessments from a multistation OSCE taken by 194 medical students at the completion of a neurology clerkship. Facets evaluated in this study include cases, domains, and items. Domains refer to areas of skill (or constructs) that the OSCE measures. G theory is used to estimate variance components associated with each facet, derive reliability, and project the number of cases required to obtain a reliable (consistent, precise) score. Results: Reliability using G theory is moderate (Φ coefficient = 0.61, G coefficient = 0.64). Performance is similar across cases but differs by the particular domain, such that the majority of variance is attributed to the domain. Projections in reliability estimates reveal that students need to participate in 3 OSCE cases in order to increase reliability beyond the 0.70 threshold. Conclusions: This novel use of G theory in evaluating an OSCE in neurology provides meaningful measurement characteristics of the assessment. Differing from prior work in other medical specialties, the cases students were randomly assigned did not influence their OSCE score; rather, scores varied in expected fashion by domain assessed. PMID:26432851
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamachi La Commare, Kristina
Metrics for reliability, such as the frequency and duration of power interruptions, have been reported by electric utilities for many years. This study examines current utility practices for collecting and reporting electricity reliability information and discusses challenges that arise in assessing reliability because of differences among these practices. The study is based on reliability information for year 2006 reported by 123 utilities in 37 states representing over 60percent of total U.S. electricity sales. We quantify the effects that inconsistencies among current utility reporting practices have on comparisons of System Average Interruption Duration Index (SAIDI) and System Average Interruption Frequency Indexmore » (SAIFI) reported by utilities. We recommend immediate adoption of IEEE Std. 1366-2003 as a consistent method for measuring and reporting reliability statistics.« less
2014-01-01
Background A balance test provides important information such as the standard to judge an individual’s functional recovery or make the prediction of falls. The development of a tool for a balance test that is inexpensive and widely available is needed, especially in clinical settings. The Wii Balance Board (WBB) is designed to test balance, but there is little software used in balance tests, and there are few studies on reliability and validity. Thus, we developed a balance assessment software using the Nintendo Wii Balance Board, investigated its reliability and validity, and compared it with a laboratory-grade force platform. Methods Twenty healthy adults participated in our study. The participants participated in the test for inter-rater reliability, intra-rater reliability, and concurrent validity. The tests were performed with balance assessment software using the Nintendo Wii balance board and a laboratory-grade force platform. Data such as Center of Pressure (COP) path length and COP velocity were acquired from the assessment systems. The inter-rater reliability, the intra-rater reliability, and concurrent validity were analyzed by an intraclass correlation coefficient (ICC) value and a standard error of measurement (SEM). Results The inter-rater reliability (ICC: 0.89-0.79, SEM in path length: 7.14-1.90, SEM in velocity: 0.74-0.07), intra-rater reliability (ICC: 0.92-0.70, SEM in path length: 7.59-2.04, SEM in velocity: 0.80-0.07), and concurrent validity (ICC: 0.87-0.73, SEM in path length: 5.94-0.32, SEM in velocity: 0.62-0.08) were high in terms of COP path length and COP velocity. Conclusion The balance assessment software incorporating the Nintendo Wii balance board was used in our study and was found to be a reliable assessment device. In clinical settings, the device can be remarkably inexpensive, portable, and convenient for the balance assessment. PMID:24912769
Deltombe, T; Jamart, J; Recloux, S; Legrand, C; Vandenbroeck, N; Theys, S; Hanson, P
2007-03-01
We conducted a reliability comparison study to determine the intrarater and inter-rater reliability and the limits of agreement of the volume estimated by circumferential measurements using the frustum sign method and the disk model method, by water displacement volumetry, and by infrared optoelectronic volumetry in the assessment of upper limb lymphedema. Thirty women with lymphedema following axillary lymph node dissection surgery for breast cancer surgery were enrolled. In each patient, the volumes of the upper limbs were estimated by three physical therapists using circumference measurements, water displacement and optoelectronic volumetry. One of the physical therapists performed each measure twice. Intraclass correlation coefficients (ICCs), relative differences, and limits of agreement were determined. Intrarater and interrater reliability ICCs ranged from 0.94 to 1. Intrarater relative differences were 1.9% for the disk model method, 3.2% for the frustum sign model method, 2.9% for water displacement volumetry, and 1.5% for optoelectronic volumetry. Intrarater reliability was always better than interrater, except for the optoelectronic method. Intrarater and interrater limits of agreement were calculated for each technique. The disk model method and optoelectronic volumetry had better reliability than the frustum sign method and water displacement volumetry, which is usually considered to be the gold standard. In terms of low-cost, simplicity, and reliability, we recommend the disk model method as the method of choice in clinical practice. Since intrarater reliability was always better than interrater reliability (except for optoelectronic volumetry), patients should therefore, ideally, always be evaluated by the same therapist. Additionally, the limits of agreement must be taken into account when determining the response of a patient to treatment.
Elinder, L S; Brunosson, A; Bergström, H; Hagströmer, M; Patterson, E
2012-02-01
Dietary assessment is a challenge in general, and specifically in individuals with intellectual disabilities (ID). This study aimed to evaluate personal digital photography as a method of assessing different aspects of dietary quality in this target group. Eighteen adults with ID were recruited from community residences and activity centres in Stockholm County. Participants were instructed to photograph all foods and beverages consumed during 1 day, while observed. Photographs were coded by two raters. Observations and photographs of meal frequency, intake occasions of four specific food and beverage items, meal quality and dietary diversity were compared. Evaluation of inter-rater reliability and validity of the method was performed by intra-class correlation analysis. With reminders from staff, 85% of all observed eating or drinking occasions were photographed. The inter-rater reliability was excellent for all assessed variables (ICC ≥ 0.88), except for meal quality where ICC was 0.66. The correlations between items assessed in photos and observations were strong to almost perfect with ICC values ranging from 0.71 to 0.92 and all were statistically significant. Personal digital photography appears to be a feasible, reliable and valid method for assessing dietary quality in people with mild to moderate ID, who have daily staff support. © 2011 The Authors. Journal of Intellectual Disability Research © 2011 Blackwell Publishing Ltd.
Satoh, Masayuki; Mori, Chika; Matsuda, Kana; Ueda, Yukito; Tabei, Ken-ichi; Kida, Hirotaka; Tomimoto, Hidekazu
2016-01-01
Background/Aims Constructional apraxia (CA) is usually diagnosed by having patients draw figures; however, the reported assessments only evaluate the drawn figure. We designed a new assessment battery for CA (the Mie Constructional Apraxia Scale, MCAS) which includes both the shape and drawing process, and investigated its utility against other assessment methods. Methods We designed the MCAS, and evaluated inter- and intrarater reliability. We also investigated the sensitivity, specificity, and positive and negative predictive values in dementia patients, and compared MCAS assessment with other reported batteries in the same subjects. Results Moderate interrater reliability was shown for speech therapists with limited experience. Moderate to substantial intrarater reliability was shown several weeks after initial assessment. When cutoff scores and times were set at 2/3 points and 39/40 s, sensitivity and specificity were 77.1 and 70.4%, respectively, with positive and negative predictive values of 80.0 and 66.7%, respectively. Dementia patients had significantly worse scores and times for Necker cube drawing than an elderly control group on the MCAS, and on other assessments. Conclusions We conclude that the MCAS, which includes both the assessment of the drawn Necker cube shape and the drawing process, is useful for detecting even mild CA. PMID:27790241
One-year test-retest reliability of intrinsic connectivity network fMRI in older adults
Guo, Cong C.; Kurth, Florian; Zhou, Juan; Mayer, Emeran A.; Eickhoff, Simon B; Kramer, Joel H.; Seeley, William W.
2014-01-01
“Resting-state” or task-free fMRI can assess intrinsic connectivity network (ICN) integrity in health and disease, suggesting a potential for use of these methods as disease-monitoring biomarkers. Numerous analytical options are available, including model-driven ROI-based correlation analysis and model-free, independent component analysis (ICA). High test-retest reliability will be a necessary feature of a successful ICN biomarker, yet available reliability data remains limited. Here, we examined ICN fMRI test-retest reliability in 24 healthy older subjects scanned roughly one year apart. We focused on the salience network, a disease-relevant ICN not previously subjected to reliability analysis. Most ICN analytical methods proved reliable (intraclass coefficients > 0.4) and could be further improved by wavelet analysis. Seed-based ROI correlation analysis showed high map-wise reliability, whereas graph theoretical measures and temporal concatenation group ICA produced the most reliable individual unit-wise outcomes. Including global signal regression in ROI-based correlation analyses reduced reliability. Our study provides a direct comparison between the most commonly used ICN fMRI methods and potential guidelines for measuring intrinsic connectivity in aging control and patient populations over time. PMID:22446491
Reliability of System Identification Techniques to Assess Standing Balance in Healthy Elderly
Maier, Andrea B.; Aarts, Ronald G. K. M.; van Gerven, Joop M. A.; Arendzen, J. Hans; Schouten, Alfred C.; Meskers, Carel G. M.; van der Kooij, Herman
2016-01-01
Objectives System identification techniques have the potential to assess the contribution of the underlying systems involved in standing balance by applying well-known disturbances. We investigated the reliability of standing balance parameters obtained with multivariate closed loop system identification techniques. Methods In twelve healthy elderly balance tests were performed twice a day during three days. Body sway was measured during two minutes of standing with eyes closed and the Balance test Room (BalRoom) was used to apply four disturbances simultaneously: two sensory disturbances, to the proprioceptive and the visual system, and two mechanical disturbances applied at the leg and trunk segment. Using system identification techniques, sensitivity functions of the sensory disturbances and the neuromuscular controller were estimated. Based on the generalizability theory (G theory), systematic errors and sources of variability were assessed using linear mixed models and reliability was assessed by computing indexes of dependability (ID), standard error of measurement (SEM) and minimal detectable change (MDC). Results A systematic error was found between the first and second trial in the sensitivity functions. No systematic error was found in the neuromuscular controller and body sway. The reliability of 15 of 25 parameters and body sway were moderate to excellent when the results of two trials on three days were averaged. To reach an excellent reliability on one day in 7 out of 25 parameters, it was predicted that at least seven trials must be averaged. Conclusion This study shows that system identification techniques are a promising method to assess the underlying systems involved in standing balance in elderly. However, most of the parameters do not appear to be reliable unless a large number of trials are collected across multiple days. To reach an excellent reliability in one third of the parameters, a training session for participants is needed and at least seven trials of two minutes must be performed on one day. PMID:26953694
Relevance and reliability of experimental data in human health risk assessment of pesticides.
Kaltenhäuser, Johanna; Kneuer, Carsten; Marx-Stoelting, Philip; Niemann, Lars; Schubert, Jens; Stein, Bernd; Solecki, Roland
2017-08-01
Evaluation of data relevance, reliability and contribution to uncertainty is crucial in regulatory health risk assessment if robust conclusions are to be drawn. Whether a specific study is used as key study, as additional information or not accepted depends in part on the criteria according to which its relevance and reliability are judged. In addition to GLP-compliant regulatory studies following OECD Test Guidelines, data from peer-reviewed scientific literature have to be evaluated in regulatory risk assessment of pesticide active substances. Publications should be taken into account if they are of acceptable relevance and reliability. Their contribution to the overall weight of evidence is influenced by factors including test organism, study design and statistical methods, as well as test item identification, documentation and reporting of results. Various reports make recommendations for improving the quality of risk assessments and different criteria catalogues have been published to support evaluation of data relevance and reliability. Their intention was to guide transparent decision making on the integration of the respective information into the regulatory process. This article describes an approach to assess the relevance and reliability of experimental data from guideline-compliant studies as well as from non-guideline studies published in the scientific literature in the specific context of uncertainty and risk assessment of pesticides. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Asgari, Fatemeh; Haghdoost, Faraidoon; Masjedi, Samaneh Sadat; Manouchehri, Navid; Banihashemi, Mahboobeh; Ghorbani, Abbas; Najafi, Mohammad Reza; Saadatnia, Mohammad; Lipton, Richard B.
2014-01-01
Introduction. MIDAS is a valid and reliable short questionnaire for assessment of headache related disability. Linguistic validation of Persian MIDAS and assessment of psychometric properties between tension type headache (TTH) and migraine were the aims of this study. Methods. Patients with migraine or TTH were included. At the first visit, we administered a headache symptom questionnaire, MIDAS, and SF-36. Patients filled out MIDAS in second and third visit within three and eight weeks after base line visit. Internal consistency (Cronbach α) and test-retest reproducibility (Spearman correlation coefficient) were used to assess reliability. Convergent validity and MIDAS capability to differentiate between chronic and episodic headaches (migraine and TTH) were also assessed. Results. The 267 participants had episodic migraine (EM-64%), chronic migraine (CM-13.5%), episodic TTH (ETTH-13.5%), and chronic TTH (CTTH-9). Internal consistency reliability was 0.8 for the entire sample, 0.72 for TTH, and 0.82 for migraine. Test-retest reliability for all questions between visit 1 and visit 2 varied from 0.54 to 0.71. Convergent validity was assessed using SF-36 as an external referent. Patients with episodic headaches (EM and ETTH) had significantly lower MIDAS scores than chronic headaches (CM and CTTH). Conclusion. Persian MIDAS is a valid and reliable questionnaire for migraine and TTH that can differentiate between episodic headache and chronic headache. PMID:24527462
Evaluating the Performance of the IEEE Standard 1366 Method for Identifying Major Event Days
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eto, Joseph H.; LaCommare, Kristina Hamachi; Sohn, Michael D.
IEEE Standard 1366 offers a method for segmenting reliability performance data to isolate the effects of major events from the underlying year-to-year trends in reliability. Recent analysis by the IEEE Distribution Reliability Working Group (DRWG) has found that reliability performance of some utilities differs from the expectations that helped guide the development of the Standard 1366 method. This paper proposes quantitative metrics to evaluate the performance of the Standard 1366 method in identifying major events and in reducing year-to-year variability in utility reliability. The metrics are applied to a large sample of utility-reported reliability data to assess performance of themore » method with alternative specifications that have been considered by the DRWG. We find that none of the alternatives perform uniformly 'better' than the current Standard 1366 method. That is, none of the modifications uniformly lowers the year-to-year variability in System Average Interruption Duration Index without major events. Instead, for any given alternative, while it may lower the value of this metric for some utilities, it also increases it for other utilities (sometimes dramatically). Thus, we illustrate some of the trade-offs that must be considered in using the Standard 1366 method and highlight the usefulness of the metrics we have proposed in conducting these evaluations.« less
FLiGS Score: A New Method of Outcome Assessment for Lip Carcinoma–Treated Patients
Grassi, Rita; Toia, Francesca; Di Rosa, Luigi; Cordova, Adriana
2015-01-01
Background: Lip cancer and its treatment have considerable functional and cosmetic effects with resultant nutritional and physical detriments. As we continue to investigate new treatment regimens, we are simultaneously required to assess postoperative outcomes to design interventions that lessen the adverse impact of this disease process. We wish to introduce Functional Lip Glasgow Scale (FLiGS) score as a new method of outcome assessment to measure the effect of lip cancer and its treatment on patients’ daily functioning. Methods: Fifty patients affected by lip squamous cell carcinoma were recruited between 2009 and 2013. Patients were asked to fill the FLiGS questionnaire before surgery, 1 month, 6 months, and 1 year after surgery. The subscores were used to calculate a total FLiGS score of global oral disability. Statistical analysis was performed to test validity and reliability. Results: FLiGS scores improved significantly from preoperative to 12 months postoperative values (P = 0.000). Statistical evidence of validity was provided through rs (Spearman correlation coefficient) that resulted >0.30 for all surveys and for which P < 0.001. FLiGS score reliability was shown through examination of internal consistency and test-retest reliability. Conclusions: FLiGS score is a simple way of assessing functional impairment related to lip cancer before and after surgery; it is sensitive, valid, reliable, and clinically relevant: it provides useful information to orient the physician in the postoperative management and in the rehabilitation program. PMID:26034652
Manosudprasit, Montian; Wangsrimongkol, Tasanee; Pisek, Poonsak; Chantaramungkorn, Melissa
2013-09-01
To test the measure of agreement between use of the Skeletal Maturation Index (SMI) method of Fishman using hand-wrist radiographs and the Cervical Vertebral Maturation Index (CVMI) method for assessing skeletal maturity of the cleft patients. Hand-wrist and lateral cephalometric radiographs of 60 cleft subjects (35 females and 25 males, age range: 7-16 years) were used. Skeletal age was assessed using an adjustment to the SMI method of Fishman to compare with the CVMI method of Hassel and Farman. Agreement between skeletal age assessed by both methods and the intra- and inter-examiner reliability of both methods were tested by weighted kappa analysis. There was good agreement between the two methods with a kappa value of 0.80 (95% CI = 0.66-0.88, p-value <0.001). Reliability of intra- and inter-examiner of both methods was very good with kappa value ranging from 0.91 to 0.99. The CVMI method can be used as an alternative to the SMI method in skeletal age assessment in cleft patients with the benefit of no need of an additional radiograph and avoiding extra-radiation exposure. Comparing the two methods, the present study found better agreement from peak of adolescence onwards.
NASA Astrophysics Data System (ADS)
Lamour, B. G.; Harris, R. T.; Roberts, A. G.
2010-06-01
Power system reliability problems are very difficult to solve because the power systems are complex and geographically widely distributed and influenced by numerous unexpected events. It is therefore imperative to employ the most efficient optimization methods in solving the problems relating to reliability of the power system. This paper presents a reliability analysis and study of the power interruptions resulting from severe power outages in the Nelson Mandela Bay Municipality (NMBM), South Africa and includes an overview of the important factors influencing reliability, and methods to improve the reliability. The Blue Horizon Bay 22 kV overhead line, supplying a 6.6 kV residential sector has been selected. It has been established that 70% of the outages, recorded at the source, originate on this feeder.
ERIC Educational Resources Information Center
Ling, Guangming
2012-01-01
To assess the value of individual students' subscores on the Major Field Test in Business (MFT Business), I examined the test's internal structure with factor analysis and structural equation model methods, and analyzed the subscore reliabilities using the augmented scores method. Analyses of the internal structure suggested that the MFT Business…
ERIC Educational Resources Information Center
Schoemaker, Marina M.; Niemeijer, Anuschka S.; Flapper, Boudien C. T.; Smits-Engelsman, Bouwien C. M.
2012-01-01
Aim: The aim of this study was to investigate the validity and reliability of the Movement Assessment Battery for Children-2 Checklist (MABC-2). Method: Teacher[subscript S] completed the Checklist for 383 children (age range 5-8y; mean age 6y 9mo; 190 males; 193 females) and the parents of 130 of these children completed the Developmental…
Clothing Protection from Ultraviolet Radiation: A New Method for Assessment.
Gage, Ryan; Leung, William; Stanley, James; Reeder, Anthony; Barr, Michelle; Chambers, Tim; Smith, Moira; Signal, Louise
2017-11-01
Clothing modifies ultraviolet radiation (UVR) exposure from the sun and has an impact on skin cancer risk and the endogenous synthesis of vitamin D. There is no standardized method available for assessing body surface area (BSA) covered by clothing, which limits generalizability between study findings. We calculated the body cover provided by 38 clothing items using diagrams of BSA, adjusting the values to account for differences in BSA by age. Diagrams displaying each clothing item were developed and incorporated into a coverage assessment procedure (CAP). Five assessors used the CAP and Lund & Browder chart, an existing method for estimating BSA, to calculate the clothing coverage of an image sample of 100 schoolchildren. Values of clothing coverage, inter-rater reliability and assessment time were compared between CAP and Lund & Browder methods. Both methods had excellent inter-rater reliability (>0.90) and returned comparable results, although the CAP method was significantly faster in determining a person's clothing coverage. On balance, the CAP method appears to be a feasible method for calculating clothing coverage. Its use could improve comparability between sun-safety studies and aid in quantifying the health effects of UVR exposure. © 2017 The American Society of Photobiology.
Detecting long-term growth trends using tree rings: a critical evaluation of methods.
Peters, Richard L; Groenendijk, Peter; Vlam, Mart; Zuidema, Pieter A
2015-05-01
Tree-ring analysis is often used to assess long-term trends in tree growth. A variety of growth-trend detection methods (GDMs) exist to disentangle age/size trends in growth from long-term growth changes. However, these detrending methods strongly differ in approach, with possible implications for their output. Here, we critically evaluate the consistency, sensitivity, reliability and accuracy of four most widely used GDMs: conservative detrending (CD) applies mathematical functions to correct for decreasing ring widths with age; basal area correction (BAC) transforms diameter into basal area growth; regional curve standardization (RCS) detrends individual tree-ring series using average age/size trends; and size class isolation (SCI) calculates growth trends within separate size classes. First, we evaluated whether these GDMs produce consistent results applied to an empirical tree-ring data set of Melia azedarach, a tropical tree species from Thailand. Three GDMs yielded similar results - a growth decline over time - but the widely used CD method did not detect any change. Second, we assessed the sensitivity (probability of correct growth-trend detection), reliability (100% minus probability of detecting false trends) and accuracy (whether the strength of imposed trends is correctly detected) of these GDMs, by applying them to simulated growth trajectories with different imposed trends: no trend, strong trends (-6% and +6% change per decade) and weak trends (-2%, +2%). All methods except CD, showed high sensitivity, reliability and accuracy to detect strong imposed trends. However, these were considerably lower in the weak or no-trend scenarios. BAC showed good sensitivity and accuracy, but low reliability, indicating uncertainty of trend detection using this method. Our study reveals that the choice of GDM influences results of growth-trend studies. We recommend applying multiple methods when analysing trends and encourage performing sensitivity and reliability analysis. Finally, we recommend SCI and RCS, as these methods showed highest reliability to detect long-term growth trends. © 2014 John Wiley & Sons Ltd.
Saito, Rintaro; Suzuki, Harukazu; Hayashizaki, Yoshihide
2003-04-12
Recent screening techniques have made large amounts of protein-protein interaction data available, from which biologically important information such as the function of uncharacterized proteins, the existence of novel protein complexes, and novel signal-transduction pathways can be discovered. However, experimental data on protein interactions contain many false positives, making these discoveries difficult. Therefore computational methods of assessing the reliability of each candidate protein-protein interaction are urgently needed. We developed a new 'interaction generality' measure (IG2) to assess the reliability of protein-protein interactions using only the topological properties of their interaction-network structure. Using yeast protein-protein interaction data, we showed that reliable protein-protein interactions had significantly lower IG2 values than less-reliable interactions, suggesting that IG2 values can be used to evaluate and filter interaction data to enable the construction of reliable protein-protein interaction networks.
ERIC Educational Resources Information Center
Howard, Steven J.; Melhuish, Edward
2017-01-01
Several methods of assessing executive function (EF), self-regulation, language development, and social development in young children have been developed over previous decades. Yet new technologies make available methods of assessment not previously considered. In resolving conceptual and pragmatic limitations of existing tools, the Early Years…
Amann, Michael; Pezold, Simon; Naegelin, Yvonne; Fundana, Ketut; Andělová, Michaela; Weier, Katrin; Stippich, Christoph; Kappos, Ludwig; Radue, Ernst-Wilhelm; Cattin, Philippe; Sprenger, Till
2016-07-01
Spinal cord (SC) atrophy is an important contributor to the development of disability in many neurological disorders including multiple sclerosis (MS). To assess the spinal cord atrophy in clinical trials and clinical practice, largely automated methods are needed due to the sheer amount of data. Moreover, using these methods in longitudinal trials requires them to deliver highly reliable measurements, enabling comparisons of multiple data sets of the same subject over time. We present a method for SC volumetry using 3D MRI data providing volume measurements for SC sections of fixed length and location. The segmentation combines a continuous max flow approach with SC surface reconstruction that locates the SC boundary based on image voxel intensities. Two cutting planes perpendicular to the SC centerline are determined based on predefined distances to an anatomical landmark, and the cervical SC volume (CSCV) is then calculated in-between these boundaries. The development of the method focused on its application in MRI follow-up studies; the method provides a high scan-rescan reliability, which was tested on healthy subject data. Scan-rescan reliability coefficients of variation (COV) were below 1 %, intra- and interrater COV were even lower (0.1-0.2 %). To show the applicability in longitudinal trials, 3-year follow-up data of 48 patients with a progressive course of MS were assessed. In this cohort, CSCV loss was the only significant predictor of disability progression (p = 0.02). We are, therefore, confident that our method provides a reliable tool for SC volumetry in longitudinal clinical trials.
Guetterman, Timothy C.; Creswell, John W.; Wittink, Marsha; Barg, Fran K.; Castro, Felipe G.; Dahlberg, Britt; Watkins, Daphne C.; Deutsch, Charles; Gallo, Joseph J.
2017-01-01
Introduction Demand for training in mixed methods is high, with little research on faculty development or assessment in mixed methods. We describe the development of a Self-Rated Mixed Methods Skills Assessment and provide validity evidence. The instrument taps six research domains: “Research question,” “Design/approach,” “Sampling,” “Data collection,” “Analysis,” and “Dissemination.” Respondents are asked to rate their ability to define or explain concepts of mixed methods under each domain, their ability to apply the concepts to problems, and the extent to which they need to improve. Methods We administered the questionnaire to 145 faculty and students using an internet survey. We analyzed descriptive statistics and performance characteristics of the questionnaire using Cronbach’s alpha to assess reliability and an ANOVA that compared a mixed methods experience index with assessment scores to assess criterion-relatedness. Results Internal consistency reliability was high for the total set of items (.95) and adequate (>=.71) for all but one subscale. Consistent with establishing criterion validity, respondents who had more professional experiences with mixed methods (e.g., published a mixed methods paper) rated themselves as more skilled, which was statistically significant across the research domains. Discussion This Self-Rated Mixed Methods Assessment instrument may be a useful tool to assess skills in mixed methods for training programs. It can be applied widely at the graduate and faculty level. For the learner, assessment may lead to enhanced motivation to learn and training focused on self-identified needs. For faculty, the assessment may improve curriculum and course content planning. PMID:28562495
Pneumothorax size measurements on digital chest radiographs: Intra- and inter- rater reliability.
Thelle, Andreas; Gjerdevik, Miriam; Grydeland, Thomas; Skorge, Trude D; Wentzel-Larsen, Tore; Bakke, Per S
2015-10-01
Detailed and reliable methods may be important for discussions on the importance of pneumothorax size in clinical decision-making. Rhea's method is widely used to estimate pneumothorax size in percent based on chest X-rays (CXRs) from three measure points. Choi's addendum is used for anterioposterior projections. The aim of this study was to examine the intrarater and interrater reliability of the Rhea and Choi method using digital CXR in the ward based PACS monitors. Three physicians examined a retrospective series of 80 digital CXRs showing pneumothorax, using Rhea and Choi's method, then repeated in a random order two weeks later. We used the analysis of variance technique by Eliasziw et al. to assess the intrarater and interrater reliability in altogether 480 estimations of pneumothorax size. Estimated pneumothorax sizes ranged between 5% and 100%. The intrarater reliability coefficient was 0.98 (95% one-sided lower-limit confidence interval C 0.96), and the interrater reliability coefficient was 0.95 (95% one-sided lower-limit confidence interval 0.93). This study has shown that the Rhea and Choi method for calculating pneumothorax size has high intrarater and interrater reliability. These results are valid across gender, side of pneumothorax and whether the patient is diagnosed with primary or secondary pneumothorax. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Assessing disease stress and modeling yield losses in alfalfa
NASA Astrophysics Data System (ADS)
Guan, Jie
Alfalfa is the most important forage crop in the U.S. and worldwide. Fungal foliar diseases are believed to cause significant yield losses in alfalfa, yet, little quantitative information exists regarding the amount of crop loss. Different fungicides and application frequencies were used as tools to generate a range of foliar disease intensities in Ames and Nashua, IA. Visual disease assessments (disease incidence, disease severity, and percentage defoliation) were obtained weekly for each alfalfa growth cycle (two to three growing cycles per season). Remote sensing assessments were performed using a hand-held, multispectral radiometer to measure the amount and quality of sunlight reflected from alfalfa canopies. Factors such as incident radiation, sun angle, sensor height, and leaf wetness were all found to significantly affect the percentage reflectance of sunlight reflected from alfalfa canopies. The precision of visual and remote sensing assessment methods was quantified. Precision was defined as the intra-rater repeatability and inter-rater reliability of assessment methods. F-tests, slopes, intercepts, and coefficients of determination (R2) were used to compare assessment methods for precision. Results showed that among the three visual disease assessment methods (disease incidence, disease severity, and percentage defoliation), percentage defoliation had the highest intra-rater repeatability and inter-rater reliability. Remote sensing assessment method had better precision than the percentage defoliation assessment method based upon higher intra-rater repeatability and inter-rater reliability. Significant linear relationships between canopy reflectance (810 nm), percentage defoliation and yield were detected using linear regression and percentage reflectance (810 nm) assessments were found to have a stronger relationship with yield than percentage defoliation assessments. There were also significant linear relationships between percentage defoliation, dry weight, percentage reflectance (810 nm), and green leaf area index (GLAI). Percentage reflectance (810 nm) assessments had a stronger relationship with dry weight and green leaf area index than percentage defoliation assessments. Our research conclusively demonstrates that percentage reflectance measurements can be used to nondestructively assess green leaf area index which is a direct measure of plant health and an indirect measure of productivity. This research conclusively demonstrates that remote sensing is superior to visual assessment method to assess alfalfa stress and to model yield and GLAI in the alfalfa foliar disease pathosystem.
Sekiyama, Juliana Y; Camargo, Cintia Z; Eduardo, Luís; Andrade, C; Kayser, Cristiane
2013-11-01
To analyze the diagnostic performance and reliability of different parameters evaluated by widefield nailfold capillaroscopy (NFC) with those obtained by video capillaroscopy in patients with Raynaud’s phenomenon (RP). Two hundred fifty-two individuals were assessed, including 101 systemic sclerosis (SSc; scleroderma) patients,61 patients with undifferentiated connective tissue disease, 37 patients with primary RP, and 53 controls. Widefield NFC was performed using a stereomicroscope under 10–25 x magnification and direct measurement of all parameters. Video capillaroscopy was performed under 200 x magnification, with the acquirement of 32 images per individual (4 fields per finger in 8 fingers). The following parameters were analyzed in 8 fingers of the hands (excluding thumbs) by both methods: number of capillaries/mm, number of enlarged and giant capillaries, microhemorrhages, and avascular score.Intra- and interobserver reliability was evaluated by performing both examinations in 20 individuals on 2 different days and by 2 long-term experienced observers. There was a significant correlation (P < 0.000) between widefield NFC and video capillaroscopy in the comparison of all parameters. Kappa values and intraclass correlation coefficient analysis showed excellent intra- and interobserver reproducibility for all parameters evaluated by widefield NFC and video capillaroscopy. Bland-Altman analysis showed high agreement of all parameters evaluated in both methods. According to receiver operating characteristic curve analysis, both methods showed a similar performance in discriminating SSc patients from controls. Widefield NFC and video capillaroscopy are reliable and accurate methods and can be used equally for assessing peripheral microangiopathy in RP and SSc patients. Nonetheless, the high reliability obtained may not be similar for less experienced examiners.
Ho, Chester H; Cheung, Amanda; Southern, Danielle; Ocampo, Wrechelle; Kaufman, Jaime; Hogan, David B; Baylis, Barry; Conly, John M; Stelfox, Henry T; Ghali, William A
2016-12-01
Research regarding the reliability of the Braden Scale and nurses' perspectives on the instrument for predicting pressure ulcer (PU) risk in acute care settings is limited. A mixed-methods study was conducted in a tertiary acute care facility to examine interrater reliability (IRR) of the Braden Scale and its subscales, and a qualitative survey using semi-structured interviews was conducted among nurses caring for patients in acute care units to gain nurse perspective regarding scale usability. Data were extracted from a previous retrospective, randomized, controlled trial involving adult patients with compromised mobility receiving care in a tertiary acute care hospital in Canada. One-way, intraclass correlation coefficients (ICCs) were calculated on item and total scores, and kappa statistics were used to determine reliability of categorizing patients on their risk. Interview results were categorized by common themes. Reliability was assessed on 64 patients, where nurses and research staff independently assessed enrolled participants at baseline and after 72 hours using the Braden Scale as it appeared on an electronic medical record. IRR for the total score was high (ICC = 0.807). The friction and shear item had the lowest reliability (ICC = 0.266). Reliability of categorizing patients' level of risk had moderate agreement (κ = 0.408). Three (3) major and 12 subthemes emerged from the 14 nurse interviews; nurses were aware of the scale's purpose but were uncertain of its effectiveness, some items were difficult to rate, and questions were raised as to whether using the scale enhanced patient care. Aspects identified by nurses to enhance usability included: 1) changes to the electronic version (incorporating the scale into daily assessment documents with readily available item descriptions), 2) additional training, and 3) easily available resource material to improve reliability and usability of scale. These findings need to be considered when using the Braden Scale in clinical practice. Further study of the value of the total Braden Scale and its subscales is warranted.
Staggs, Vincent S; Cramer, Emily
2016-08-01
Hospital performance reports often include rankings of unit pressure ulcer rates. Differentiating among units on the basis of quality requires reliable measurement. Our objectives were to describe and apply methods for assessing reliability of hospital-acquired pressure ulcer rates and evaluate a standard signal-noise reliability measure as an indicator of precision of differentiation among units. Quarterly pressure ulcer data from 8,199 critical care, step-down, medical, surgical, and medical-surgical nursing units from 1,299 US hospitals were analyzed. Using beta-binomial models, we estimated between-unit variability (signal) and within-unit variability (noise) in annual unit pressure ulcer rates. Signal-noise reliability was computed as the ratio of between-unit variability to the total of between- and within-unit variability. To assess precision of differentiation among units based on ranked pressure ulcer rates, we simulated data to estimate the probabilities of a unit's observed pressure ulcer rate rank in a given sample falling within five and ten percentiles of its true rank, and the probabilities of units with ulcer rates in the highest quartile and highest decile being identified as such. We assessed the signal-noise measure as an indicator of differentiation precision by computing its correlations with these probabilities. Pressure ulcer rates based on a single year of quarterly or weekly prevalence surveys were too susceptible to noise to allow for precise differentiation among units, and signal-noise reliability was a poor indicator of precision of differentiation. To ensure precise differentiation on the basis of true differences, alternative methods of assessing reliability should be applied to measures purported to differentiate among providers or units based on quality. © 2016 The Authors. Research in Nursing & Health published by Wiley Periodicals, Inc. © 2016 The Authors. Research in Nursing & Health published by Wiley Periodicals, Inc.
Donnan, Peter T; Symon, Andrew G; Kellett, Gillian; Monteith-Hodge, Ewa; Rauchhaus, Petra; Wyatt, Jeremy C
2012-01-01
Objective To test the reliability, validity, acceptability, and practicality of short message service (SMS) messaging for collection of research data. Materials and methods The studies were carried out in a cohort of recently delivered women in Tayside, Scotland, UK, who were asked about their current infant feeding method and future feeding plans. Reliability was assessed by comparison of their responses to two SMS messages sent 1 day apart. Validity was assessed by comparison of their responses to text questions and the same question administered by phone 1 day later, by comparison with the same data collected from other sources, and by correlation with other related measures. Acceptability was evaluated using quantitative and qualitative questions, and practicality by analysis of a researcher log. Results Reliability of the factual SMS message gave perfect agreement. Reliabilities for the numerical question were reasonable, with κ between 0.76 (95% CI 0.56 to 0.96) and 0.80 (95% CI 0.59 to 1.00). Validity for data compared with that collected by phone within 24 h (κ =0.92 (95% CI 0.84 to 1.00)) and with health visitor data (κ =0.85 (95% CI 0.73 to 0.97)) was excellent. Correlation validity between the text responses and other related demographic and clinical measures was as expected. Participants found the method a convenient and acceptable way of providing data. For researchers, SMS text messaging provided an easy and functional method of gathering a large volume of data. Conclusion In this sample and for these questions, SMS was a reliable and valid method for capturing research data. PMID:22539081
Reliably detectable flaw size for NDE methods that use calibration
NASA Astrophysics Data System (ADS)
Koshti, Ajay M.
2017-04-01
Probability of detection (POD) analysis is used in assessing reliably detectable flaw size in nondestructive evaluation (NDE). MIL-HDBK-1823 and associated mh18232 POD software gives most common methods of POD analysis. In this paper, POD analysis is applied to an NDE method, such as eddy current testing, where calibration is used. NDE calibration standards have known size artificial flaws such as electro-discharge machined (EDM) notches and flat bottom hole (FBH) reflectors which are used to set instrument sensitivity for detection of real flaws. Real flaws such as cracks and crack-like flaws are desired to be detected using these NDE methods. A reliably detectable crack size is required for safe life analysis of fracture critical parts. Therefore, it is important to correlate signal responses from real flaws with signal responses form artificial flaws used in calibration process to determine reliably detectable flaw size.
Reliably Detectable Flaw Size for NDE Methods that Use Calibration
NASA Technical Reports Server (NTRS)
Koshti, Ajay M.
2017-01-01
Probability of detection (POD) analysis is used in assessing reliably detectable flaw size in nondestructive evaluation (NDE). MIL-HDBK-1823 and associated mh1823 POD software gives most common methods of POD analysis. In this paper, POD analysis is applied to an NDE method, such as eddy current testing, where calibration is used. NDE calibration standards have known size artificial flaws such as electro-discharge machined (EDM) notches and flat bottom hole (FBH) reflectors which are used to set instrument sensitivity for detection of real flaws. Real flaws such as cracks and crack-like flaws are desired to be detected using these NDE methods. A reliably detectable crack size is required for safe life analysis of fracture critical parts. Therefore, it is important to correlate signal responses from real flaws with signal responses form artificial flaws used in calibration process to determine reliably detectable flaw size.
First Order Reliability Application and Verification Methods for Semistatic Structures
NASA Technical Reports Server (NTRS)
Verderaime, Vincent
1994-01-01
Escalating risks of aerostructures stimulated by increasing size, complexity, and cost should no longer be ignored by conventional deterministic safety design methods. The deterministic pass-fail concept is incompatible with probability and risk assessments, its stress audits are shown to be arbitrary and incomplete, and it compromises high strength materials performance. A reliability method is proposed which combines first order reliability principles with deterministic design variables and conventional test technique to surmount current deterministic stress design and audit deficiencies. Accumulative and propagation design uncertainty errors are defined and appropriately implemented into the classical safety index expression. The application is reduced to solving for a factor that satisfies the specified reliability and compensates for uncertainty errors, and then using this factor as, and instead of, the conventional safety factor in stress analyses. The resulting method is consistent with current analytical skills and verification practices, the culture of most designers, and with the pace of semistatic structural designs.
Developing Confidence Limits For Reliability Of Software
NASA Technical Reports Server (NTRS)
Hayhurst, Kelly J.
1991-01-01
Technique developed for estimating reliability of software by use of Moranda geometric de-eutrophication model. Pivotal method enables straightforward construction of exact bounds with associated degree of statistical confidence about reliability of software. Confidence limits thus derived provide precise means of assessing quality of software. Limits take into account number of bugs found while testing and effects of sampling variation associated with random order of discovering bugs.
ERIC Educational Resources Information Center
Neubauer, Anna; Gawrilow, Caterina; Hasselhorn, Marcus
2012-01-01
A preschooler's ability to delay gratification in the waiting task is predictive of several developmental outcomes, despite this task's relatively low reliability level. Success in this task depends on the use of distraction strategies. The new Watch-and-Wait Task (WWT) has been developed to enhance reliability and to investigate whether the…
Uncertainties in obtaining high reliability from stress-strength models
NASA Technical Reports Server (NTRS)
Neal, Donald M.; Matthews, William T.; Vangel, Mark G.
1992-01-01
There has been a recent interest in determining high statistical reliability in risk assessment of aircraft components. The potential consequences are identified of incorrectly assuming a particular statistical distribution for stress or strength data used in obtaining the high reliability values. The computation of the reliability is defined as the probability of the strength being greater than the stress over the range of stress values. This method is often referred to as the stress-strength model. A sensitivity analysis was performed involving a comparison of reliability results in order to evaluate the effects of assuming specific statistical distributions. Both known population distributions, and those that differed slightly from the known, were considered. Results showed substantial differences in reliability estimates even for almost nondetectable differences in the assumed distributions. These differences represent a potential problem in using the stress-strength model for high reliability computations, since in practice it is impossible to ever know the exact (population) distribution. An alternative reliability computation procedure is examined involving determination of a lower bound on the reliability values using extreme value distributions. This procedure reduces the possibility of obtaining nonconservative reliability estimates. Results indicated the method can provide conservative bounds when computing high reliability. An alternative reliability computation procedure is examined involving determination of a lower bound on the reliability values using extreme value distributions. This procedure reduces the possibility of obtaining nonconservative reliability estimates. Results indicated the method can provide conservative bounds when computing high reliability.
Evaluation of the Validity and Reliability of the Waterlow Pressure Ulcer Risk Assessment Scale
Charalambous, Charalambos; Koulori, Agoritsa; Vasilopoulos, Aristidis; Roupa, Zoe
2018-01-01
Introduction Prevention is the ideal strategy to tackle the problem of pressure ulcers. Pressure ulcer risk assessment scales are one of the most pivotal measures applied to tackle the problem, much criticisms has been developed regarding the validity and reliability of these scales. Objective To investigate the validity and reliability of the Waterlow pressure ulcer risk assessment scale. Method The methodology used is a narrative literature review, the bibliography was reviewed through Cinahl, Pubmed, EBSCO, Medline and Google scholar, 26 scientific articles where identified. The articles where chosen due to their direct correlation with the objective under study and their scientific relevance. Results The construct and face validity of the Waterlow appears adequate, but with regards to content validity changes in the category age and gender can be beneficial. The concurrent validity cannot be assessed. The predictive validity of the Waterlow is characterized by high specificity and low sensitivity. The inter-rater reliability has been demonstrated to be inadequate, this may be due to lack of clear definitions within the categories and differentiating level of knowledge between the users. Conclusion Due to the limitations presented regarding the validity and reliability of the Waterlow pressure ulcer risk assessment scale, the scale should be used in conjunction with clinical assessment to provide optimum results. PMID:29736104
2014-01-01
Background Premarital sexual behaviors are important issue for women’s health. The present study was designed to develop and examine the psychometric properties of a scale in order to identify young women who are at greater risk of premarital sexual behavior. Method This was an exploratory mixed method investigation. Indeed, the study was conducted in two phases. In the first phase, qualitative methods (focus group discussion and individual interview) were applied to generate items and develop the questionnaire. In the second phase, psychometric properties (validity and reliability) of the questionnaire were assessed. Results In the first phase an item pool containing 53 statements related to premarital sexual behavior was generated. In the second phase item reduction was applied and the final version of the questionnaire containing 26 items was developed. The psychometric properties of this final version were assessed and the results showed that the instrument has a good structure, and reliability. The results from exploratory factory analysis indicated a 5-factor solution for the instrument that jointly accounted for the 57.4% of variance observed. The Cronbach’s alpha coefficient for the instrument was found to be 0.87. Conclusion This study provided a valid and reliable scale to identify premarital sexual behavior in young women. Assessment of premarital sexual behavior might help to improve women’s sexual abstinence. PMID:24924696
Organizational readiness for implementing change: a psychometric assessment of a new measure
2014-01-01
Background Organizational readiness for change in healthcare settings is an important factor in successful implementation of new policies, programs, and practices. However, research on the topic is hindered by the absence of a brief, reliable, and valid measure. Until such a measure is developed, we cannot advance scientific knowledge about readiness or provide evidence-based guidance to organizational leaders about how to increase readiness. This article presents results of a psychometric assessment of a new measure called Organizational Readiness for Implementing Change (ORIC), which we developed based on Weiner’s theory of organizational readiness for change. Methods We conducted four studies to assess the psychometric properties of ORIC. In study one, we assessed the content adequacy of the new measure using quantitative methods. In study two, we examined the measure’s factor structure and reliability in a laboratory simulation. In study three, we assessed the reliability and validity of an organization-level measure of readiness based on aggregated individual-level data from study two. In study four, we conducted a small field study utilizing the same analytic methods as in study three. Results Content adequacy assessment indicated that the items developed to measure change commitment and change efficacy reflected the theoretical content of these two facets of organizational readiness and distinguished the facets from hypothesized determinants of readiness. Exploratory and confirmatory factor analysis in the lab and field studies revealed two correlated factors, as expected, with good model fit and high item loadings. Reliability analysis in the lab and field studies showed high inter-item consistency for the resulting individual-level scales for change commitment and change efficacy. Inter-rater reliability and inter-rater agreement statistics supported the aggregation of individual level readiness perceptions to the organizational level of analysis. Conclusions This article provides evidence in support of the ORIC measure. We believe this measure will enable testing of theories about determinants and consequences of organizational readiness and, ultimately, assist healthcare leaders to reduce the number of health organization change efforts that do not achieve desired benefits. Although ORIC shows promise, further assessment is needed to test for convergent, discriminant, and predictive validity. PMID:24410955
Myer, Gregory D; Wordeman, Samuel C; Sugimoto, Dai; Bates, Nathaniel A; Roewer, Benjamin D; Medina McKeon, Jennifer M; DiCesare, Christopher A; Di Stasi, Stephanie L; Barber Foss, Kim D; Thomas, Staci M; Hewett, Timothy E
2014-05-01
Multi-center collaborations provide a powerful alternative to overcome the inherent limitations to single-center investigations. Specifically, multi-center projects can support large-scale prospective, longitudinal studies that investigate relatively uncommon outcomes, such as anterior cruciate ligament injury. This project was conceived to assess within- and between-center reliability of an affordable, clinical nomogram utilizing two-dimensional video methods to screen for risk of knee injury. The authors hypothesized that the two-dimensional screening methods would provide good-to-excellent reliability within and between institutions for assessment of frontal and sagittal plane biomechanics. Nineteen female, high school athletes participated. Two-dimensional video kinematics of the lower extremity during a drop vertical jump task were collected on all 19 study participants at each of the three facilities. Within-center and between-center reliability were assessed with intra- and inter-class correlation coefficients. Within-center reliability of the clinical nomogram variables was consistently excellent, but between-center reliability was fair-to-good. Within-center intra-class correlation coefficient for all nomogram variables combined was 0.98, while combined between-center inter-class correlation coefficient was 0.63. Injury risk screening protocols were reliable within and repeatable between centers. These results demonstrate the feasibility of multi-site biomechanical studies and establish a framework for further dissemination of injury risk screening algorithms. Specifically, multi-center studies may allow for further validation and optimization of two-dimensional video screening tools. 2b.
Aasvang, E K; Werner, M U; Kehlet, H
2014-09-01
Deep pain complaints are more frequent than cutaneous in post-surgical patients, and a prevalent finding in quantitative sensory testing studies. However, the preferred assessment method - pressure algometry - is indirect and tissue unspecific, hindering advances in treatment and preventive strategies. Thus, there is a need for development of methods with direct stimulation of suspected hyperalgesic tissues to identify the peripheral origin of nociceptive input. We compared the reliability of an ultrasound-guided needle stimulation protocol of electrical detection and pain thresholds to pressure algometry, by performing identical test-retest sequences 10 days apart, in deep tissues in the groin region. Electrical stimulation was performed by five up-and-down staircase series of single impulses of 0.04 ms duration, starting from 0 mA in increments of 0.2 mA until a threshold was reached and descending until sensation was lost. Method reliability was assessed by Bland-Altman plots, descriptive statistics, coefficients of variance and intraclass correlation coefficients. The electrical stimulation method was comparable to pressure algometry regarding 10 days test-retest repeatability, but with superior same-day reliability for electrical stimulation (P < 0.05). Between-subject variance rather than within-subject variance was the main source for test variation. There were no systematic differences in electrical thresholds across tissues and locations (P > 0.05). The presented tissue-specific direct deep tissue electrical stimulation technique has equal or superior reliability compared with the indirect tissue-unspecific stimulation by pressure algometry. This method may facilitate advances in mechanism based preventive and treatment strategies in acute and chronic post-surgical pain states. © 2014 The Acta Anaesthesiologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
Chang, Wen-Dien; Chang, Wan-Yi; Lee, Chia-Lun; Feng, Chi-Yen
2013-10-01
[Purpose] Balance is an integral part of human ability. The smart balance master system (SBM) is a balance test instrument with good reliability and validity, but it is expensive. Therefore, we modified a Wii Fit balance board, which is a convenient balance assessment tool, and analyzed its reliability and validity. [Subjects and Methods] We recruited 20 healthy young adults and 20 elderly people, and administered 3 balance tests. The correlation coefficient and intraclass correlation of both instruments were analyzed. [Results] There were no statistically significant differences in the 3 tests between the Wii Fit balance board and the SBM. The Wii Fit balance board had a good intraclass correlation (0.86-0.99) for the elderly people and positive correlations (r = 0.58-0.86) with the SBM. [Conclusions] The Wii Fit balance board is a balance assessment tool with good reliability and high validity for elderly people, and we recommend it as an alternative tool for assessing balance ability.
Patient simulation: a literary synthesis of assessment tools in anesthesiology.
Edler, Alice A; Fanning, Ruth G; Chen, Michael I; Claure, Rebecca; Almazan, Dondee; Struyk, Brain; Seiden, Samuel C
2009-12-20
High-fidelity patient simulation (HFPS) has been hypothesized as a modality for assessing competency of knowledge and skill in patient simulation, but uniform methods for HFPS performance assessment (PA) have not yet been completely achieved. Anesthesiology as a field founded the HFPS discipline and also leads in its PA. This project reviews the types, quality, and designated purpose of HFPS PA tools in anesthesiology. We used the systematic review method and systematically reviewed anesthesiology literature referenced in PubMed to assess the quality and reliability of available PA tools in HFPS. Of 412 articles identified, 50 met our inclusion criteria. Seventy seven percent of studies have been published since 2000; more recent studies demonstrated higher quality. Investigators reported a variety of test construction and validation methods. The most commonly reported test construction methods included "modified Delphi Techniques" for item selection, reliability measurement using inter-rater agreement, and intra-class correlations between test items or subtests. Modern test theory, in particular generalizability theory, was used in nine (18%) of studies. Test score validity has been addressed in multiple investigations and shown a significant improvement in reporting accuracy. However the assessment of predicative has been low across the majority of studies. Usability and practicality of testing occasions and tools was only anecdotally reported. To more completely comply with the gold standards for PA design, both shared experience of experts and recognition of test construction standards, including reliability and validity measurements, instrument piloting, rater training, and explicit identification of the purpose and proposed use of the assessment tool, are required.
Wind power error estimation in resource assessments.
Rodríguez, Osvaldo; Del Río, Jesús A; Jaramillo, Oscar A; Martínez, Manuel
2015-01-01
Estimating the power output is one of the elements that determine the techno-economic feasibility of a renewable project. At present, there is a need to develop reliable methods that achieve this goal, thereby contributing to wind power penetration. In this study, we propose a method for wind power error estimation based on the wind speed measurement error, probability density function, and wind turbine power curves. This method uses the actual wind speed data without prior statistical treatment based on 28 wind turbine power curves, which were fitted by Lagrange's method, to calculate the estimate wind power output and the corresponding error propagation. We found that wind speed percentage errors of 10% were propagated into the power output estimates, thereby yielding an error of 5%. The proposed error propagation complements the traditional power resource assessments. The wind power estimation error also allows us to estimate intervals for the power production leveled cost or the investment time return. The implementation of this method increases the reliability of techno-economic resource assessment studies.
Wind Power Error Estimation in Resource Assessments
Rodríguez, Osvaldo; del Río, Jesús A.; Jaramillo, Oscar A.; Martínez, Manuel
2015-01-01
Estimating the power output is one of the elements that determine the techno-economic feasibility of a renewable project. At present, there is a need to develop reliable methods that achieve this goal, thereby contributing to wind power penetration. In this study, we propose a method for wind power error estimation based on the wind speed measurement error, probability density function, and wind turbine power curves. This method uses the actual wind speed data without prior statistical treatment based on 28 wind turbine power curves, which were fitted by Lagrange's method, to calculate the estimate wind power output and the corresponding error propagation. We found that wind speed percentage errors of 10% were propagated into the power output estimates, thereby yielding an error of 5%. The proposed error propagation complements the traditional power resource assessments. The wind power estimation error also allows us to estimate intervals for the power production leveled cost or the investment time return. The implementation of this method increases the reliability of techno-economic resource assessment studies. PMID:26000444
Towards an operational definition of pharmacy clinical competency
NASA Astrophysics Data System (ADS)
Douglas, Charles Allen
The scope of pharmacy practice and the training of future pharmacists have undergone a strategic shift over the last few decades. The pharmacy profession recognizes greater pharmacist involvement in patient care activities. Towards this strategic objective, pharmacy schools are training future pharmacists to meet these new clinical demands. Pharmacy students have clerkships called Advanced Pharmacy Practice Experiences (APPEs), and these clerkships account for 30% of the professional curriculum. APPEs provide the only opportunity for students to refine clinical skills under the guidance of an experienced pharmacist. Nationwide, schools of pharmacy need to evaluate whether students have successfully completed APPEs and are ready treat patients. Schools are left to their own devices to develop assessment programs that demonstrate to the public and regulatory agencies, students are clinically competent prior to graduation. There is no widely accepted method to evaluate whether these assessment programs actually discriminate between the competent and non-competent students. The central purpose of this study is to demonstrate a rigorous method to evaluate the validity and reliability of APPE assessment programs. The method introduced in this study is applicable to a wide variety of assessment programs. To illustrate this method, the study evaluated new performance criteria with a novel rating scale. The study had two main phases. In the first phase, a Delphi panel was created to bring together expert opinions. Pharmacy schools nominated exceptional preceptors to join a Delphi panel. Delphi is a method to achieve agreement of complex issues among experts. The principal researcher recruited preceptors representing a variety of practice settings and geographical regions. The Delphi panel evaluated and refined the new performance criteria. In the second phase, the study produced a novel set of video vignettes that portrayed student performances based on recommendations of an expert panel. Pharmacy preceptors assessed the performances with the new performance criteria. Estimates of reliability and accuracy from preceptors' assessments can be used to establish benchmarks for future comparisons. Findings from the first phase suggested preceptors held a unique perspective, where APPE assessments are based in relevance to clinical activities. The second phase analyzed assessment results from pharmacy preceptors who watched the video simulations. Reliability results were higher for non-randomized compared to randomized video simulations. Accuracy results showed preceptors more readily identified high and low student performances compared to average students. These results indicated the need for pharmacy preceptor training in performance assessment. The study illustrated a rigorous method to evaluate the validity and reliability of APPE assessment instruments.
NDE detectability of fatigue type cracks in high strength alloys
NASA Technical Reports Server (NTRS)
Christner, B. K.; Rummel, W. D.
1983-01-01
Specimens suitable for investigating the reliability of production nondestructive evaluation (NDE) to detect tightly closed fatigue cracks in high strength alloys representative of those materials used in spacecraft engine/booster construction were produced. Inconel 718 was selected as representative of nickel base alloys and Haynes 188 was selected as representative of cobalt base alloys used in this application. Cleaning procedures were developed to insure the reusability of the test specimens and a flaw detection reliability assessment of the fluorescent penetrant inspection method was performed using the test specimens produced to characterize their use for future reliability assessments and to provide additional NDE flaw detection reliability data for high strength alloys. The statistical analysis of the fluorescent penetrant inspection data was performed to determine the detection reliabilities for each inspection at a 90% probability/95% confidence level.
Karbalaie, Abdolamir; Abtahi, Farhad; Fatemi, Alimohammad; Etehadtavakol, Mahnaz; Emrani, Zahra; Erlandsson, Björn-Erik
2017-09-01
Nailfold capillaroscopy is a practical method for identifying and obtaining morphological changes in capillaries which might reveal relevant information about diseases and health. Capillaroscopy is harmless, and seems simple and repeatable. However, there is lack of established guidelines and instructions for acquisition as well as the interpretation of the obtained images; which might lead to various ambiguities. In addition, assessment and interpretation of the acquired images are very subjective. In an attempt to overcome some of these problems, in this study a new modified technique for assessment of nailfold capillary density is introduced. The new method is named elliptic broken line (EBL) which is an extension of the two previously known methods by defining clear criteria for finding the apex of capillaries in different scenarios by using a fitted elliptic. A graphical user interface (GUI) is developed for pre-processing, manual assessment of capillary apexes and automatic correction of selected apexes based on 90° rule. Intra- and inter-observer reliability of EBL and corrected EBL is evaluated in this study. Four independent observers familiar with capillaroscopy performed the assessment for 200 nailfold videocapillaroscopy images, form healthy subject and systemic lupus erythematosus patients, in two different sessions. The results show elevation from moderate (ICC=0.691) and good (ICC=0.753) agreements to good (ICC=0.750) and good (ICC=0.801) for intra- and inter-observer reliability after automatic correction of EBL. This clearly shows the potential of this method to improve the reliability and repeatability of assessment which motivates us for further development of automatic tool for EBL method. Copyright © 2017 Elsevier Inc. All rights reserved.
Reliability and Validity of the Research Methods Skills Assessment
ERIC Educational Resources Information Center
Smith, Tamarah; Smith, Samantha
2018-01-01
The Research Methods Skills Assessment (RMSA) was created to measure psychology majors' statistics knowledge and skills. The American Psychological Association's Guidelines for the Undergraduate Major in Psychology (APA, 2007, 2013) served as a framework for development. Results from a Rasch analysis with data from n = 330 undergraduates showed…
METHOD FOR MEASURING BASE/NEUTRAL AND CARBAMATE PESTICIDES IN PERSONAL DIETARY SAMPLES
Dietary uptake may be a significant pathway of exposure to contaminants. As such,dietary exposure assessments should be considered an important part of the total exposure assessment process. The objective of this work was to develop reliable methods that are applicable to a wide ...
METHOD FOR MEASURING BASE/NEUTRAL AND CARBAMATE PESTICIDES IN PERSONAL DIETARY SAMPLES
Dietary uptake may be a significant pathway of exposure to contaminants. As such, dietary exposure assessments should be considered an important part of the total exposure assessment process. The objective of this work was to develop reliable methods that are applicable to a wide...
Lempereur, Mathieu; Lelievre, Mathieu; Burdin, Valérie; Ben Salem, Douraied; Brochard, Sylvain
2017-01-01
Purpose To report evidence for the concurrent validity and reliability of dynamic MRI techniques to evaluate in vivo joint and muscle mechanics, and to propose recommendations for their use in the assessment of normal and impaired musculoskeletal function. Materials and methods The search was conducted on articles published in Web of science, PubMed, Scopus, Academic search Premier, and Cochrane Library between 1990 and August 2017. Studies that reported the concurrent validity and/or reliability of dynamic MRI techniques for in vivo evaluation of joint or muscle mechanics were included after assessment by two independent reviewers. Selected articles were assessed using an adapted quality assessment tool and a data extraction process. Results for concurrent validity and reliability were categorized as poor, moderate, or excellent. Results Twenty articles fulfilled the inclusion criteria with a mean quality assessment score of 66% (±10.4%). Concurrent validity and/or reliability of eight dynamic MRI techniques were reported, with the knee being the most evaluated joint (seven studies). Moderate to excellent concurrent validity and reliability were reported for seven out of eight dynamic MRI techniques. Cine phase contrast and real-time MRI appeared to be the most valid and reliable techniques to evaluate joint motion, and spin tag for muscle motion. Conclusion Dynamic MRI techniques are promising for the in vivo evaluation of musculoskeletal mechanics; however results should be evaluated with caution since validity and reliability have not been determined for all joints and muscles, nor for many pathological conditions. PMID:29232401
Alzyoud, Sukaina; Veeranki, Sreenivas P.; Kheirallah, Khalid A.; Shotar, Ali M.; Pbert, Lori
2016-01-01
Introduction: Waterpipe use among adolescents has been increasing progressively. Yet no studies were reported to assess the validity and reliability of nicotine dependence scale. The current study aims to assess the validity and reliability of an Arabic version of the modified Waterpipe Tolerance Questionnaire WTQ among school-going adolescent waterpipe users. Methods: In a cross-sectional study conducted in Jordan, information on waterpipe use among 333 school-going adolescents aged 11-18 years was obtained using the Arabic version of the WTQ. An exploratory factor analysis and correlation matrices were conducted to assess validity and reliability of the WTQ. Results: The WTQ had a 0.73 alpha of internal consistency indicating moderate level of reliability. The scale showed multidimensionality with items loading on two factors, namely waterpipe consumption and morning smoking. Conclusion: This study report nicotine dependence level among school-going adolescents who identify themselves as waterpipe users using the WTQ. PMID:26383198
Assessment and risk classification protocol for patients in emergency units1
Silva, Michele de Freitas Neves; Oliveira, Gabriela Novelli; Pergola-Marconato, Aline Maino; Marconato, Rafael Silva; Bargas, Eliete Boaventura; Araujo, Izilda Esmenia Muglia
2014-01-01
Objective to develop, validate the contents and verify the reliability of a risk classification protocol for an Emergency Unit. Method the content validation was developed in a University Hospital in a country town located in the state of Sao Paulo and was carried out in two stages: the first with the individual assessment of specialists and the second with the meeting between the researchers and the specialists. The use of the protocol followed a specific guide. Concerning reliability, the concordance or equivalent method among observers was used. Results the protocol developed showed to have content validity and, after the suggested changes were made, there were excellent results concerning reliability. Conclusion the assistance flow chart was shown to be easy to use, and facilitate the search for the complaint in each assistance priority. PMID:26107828
Mousazadeh, Somayeh; Rakhshan, Mahnaz; Mohammadi, Fateme
2017-01-01
Objective: This study aimed to determine the psychometric properties of sociocultural attitude towards appearance questionnaire in female adolescents. Method: This was a methodological study. The English version of the questionnaire was translated into Persian, using forward-backward method. Then the face validity, content validity and reliability were checked. To ensure face validity, the questionnaire was given to 25 female adolescents, a psychologist and three nurses, who were required to evaluate the items with respect to problems, ambiguity, relativity, proper terms and grammar, and understandability. For content validity, 15 experts in psychology and nursing, who met the inclusion criteria, were required. They were asked to assess the qualitative of content validity. To determine the quantitative content validity, content validity index and content validity ratio were calculated. At the end, internal consistency of the items was assessed, using Cronbach’s alpha method. Results: According to the expert judgments, content validity ratio was 0.81 and content validity index was 0.91. Besides, the reliability of the questionnaire was confirmed with Cronbach’s alpha = 0.91, and physical and developmental areas showed the highest reliability indices. Conclusion: The aforementioned questionnaire could be used in researches to assess female adolescents’ self-concept. This can be a stepping-stone towards identification of problems and improvement of adolescents’ body image. PMID:28496497
R, Cuesta-Barriuso; A, Torres-Ortuño; S, Pérez-Alenda; J, Carrasco Juan; F, Querol; J, Nieto-Munuera; Ja, López-Pina
2018-02-27
Numerous measuring instruments for the evaluation of hemophilic arthropathy have been developed. One of the most used systems is the Hemophilia Joint Health Score (HJHS) given its sensitivity to clinical changes appearing in the joints because of recurrent hemarthrosis. Assessing the interrater reliability, using the Spanish version of the HJHS (version 2.1) in children with hemophilia. Reliability study to assess the interrater reliability of the Spanish version of HJHS. A sample of 36 children aged 7-13 years diagnosed with hemophilia A or B was used. Two physiotherapists performed physical assessments with the Spanish version of the HJHS. Descriptive statistics (range, mean, standard deviation) and the analysis of interrater reliability were calculated. The interrater reliability was heterogeneous since the Kappa coefficient range (ĸ), although significant (p < 0.001), ranged 0.31-1.00 in the variables of HJHS (swelling, duration of swelling, muscle atrophy, crepitus on motion, flexion loss, extension loss, joint pain, strength, and global gait). In assessing the bias of observers with the Bland and Altman method, the observer 1 scored 0.41 (CI [-0.67, 1.49]) units above observer 2, and the difference between the two was significant (t(36) = 4.48), p < 0.001). The interrater reliability of the Spanish population version of the HJHS is high. This scale should be used generically in evaluating musculoskeletal pediatric patients with hemophilia.
Technology for Online Portfolio Assessment Programs
ERIC Educational Resources Information Center
Ferrara, Victoria M.
2010-01-01
Portfolio assessment is a valid and reliable method to assess experiential learning. Developing a fully online portfolio assessment program is neither easy nor inexpensive. The institution seeking to take its portfolio assessment program online must make a commitment to its students by offering the technologies most suited to meet students' needs.…
Climate Change Impacts at Department of Defense Installations
2017-06-16
locations. The ease of use of this method and its flexibility have led to a wide variety of applications for assessing impacts of climate change 4...versions of these statistical methods to provide the basis for regional climate assessments for various states, regions, and government agencies...averaging (REA) method proposed by Giorgi and Mearns (2002). This method assigns reliability classifications for the multi-model ensemble simulation by
Gorgos, Kara S; Wasylyk, Nicole T; Van Lunen, Bonnie L; Hoch, Matthew C
2014-04-01
Joint mobilizations are commonly used by clinicians to decrease pain and restore joint arthrokinematics following musculoskeletal injury. The force applied during a joint mobilization treatment is subjective to the individual clinician but may have an effect on patient outcomes. The purpose of this systematic review was to critically appraise and synthesize the studies which examined the reliability of clinicians' force application during joint mobilization. A systematic search of PubMed and EBSCO Host databases from inception to March 1, 2013 was conducted to identify studies assessing the reliability of force application during joint mobilizations. Two reviewers utilized the Quality Appraisal of Reliability Studies (QAREL) assessment tool to determine the quality of included studies. The relative reliability of the included studies was examined through intraclass correlation coefficients (ICC) to synthesize study findings. All results were collated qualitatively with a level of evidence approach. A total of seven studies met the eligibility and were included. Five studies were included that assessed inter-clinician reliability, and six studies were included that assessed intra-clinician reliability. The overall level of evidence for inter-clinician reliability was strong for poor-to-moderate reliability (ICC = -0.04 to 0.70). The overall level of evidence for intra-clinician reliability was strong for good reliability (ICC = 0.75-0.99). This systematic review indicates there is variability in force application between clinicians but individual clinicians apply forces consistently. The results of this systematic review suggest innovative instructional methods are needed to improve consistency and validate the forces applied during of joint mobilization treatments. This is particularly evident for improving the consistency of force application across clinicians. Copyright © 2014 Elsevier Ltd. All rights reserved.
Gunnarsson, U; Johansson, M; Strigård, K
2011-08-01
The decrease in recurrence rates in ventral hernia surgery have led to a redirection of focus towards other important patient-related endpoints. One such endpoint is abdominal wall function. The aim of the present study was to evaluate the reliability and external validity of abdominal wall strength measurement using the Biodex System-4 with a back abdomen unit. Ten healthy volunteers and ten patients with ventral hernias exceeding 10 cm were recruited. Test-retest reliability, both with and without girdle, was evaluated by comparison of measurements at two test occasions 1 week apart. Reliability was calculated by the interclass correlation coefficients (ICC) method. Validity was evaluated by correlation with the well-established International Physical Activity Questionnaire (IPAQ) and a self-assessment of abdominal wall strength. One person in the healthy group was excluded after the first test due to neck problems following minor trauma. The reliability was excellent (>0.75), with ICC values between 0.92 and 0.97 for the different modalities tested. No differences were seen between testing with and without a girdle. Validity was also excellent both when calculated as correlation to self-assessment of abdominal wall strength, and to IPAQ, giving Kendall tau values of 0.51 and 0.47, respectively, and corresponding P values of 0.002 and 0.004. Measurement of abdominal muscle function using the Biodex System-4 is a reliable and valid method to assess this important patient-related endpoint. Further investigations will be made to explore the potential of this technique in the evaluation of the results of ventral hernia surgery, and to compare muscle function after different abdominal wall reconstruction techniques.
Computerized Analysis of Digital Photographs for Evaluation of Tooth Movement.
Toodehzaeim, Mohammad Hossein; Karandish, Maryam; Karandish, Mohammad Nabi
2015-03-01
Various methods have been introduced for evaluation of tooth movement in orthodontics. The challenge is to adopt the most accurate and most beneficial method for patients. This study was designed to introduce analysis of digital photographs with AutoCAD software as a method to evaluate tooth movement and assess the reliability of this method. Eighteen patients were evaluated in this study. Three intraoral digital images from the buccal view were captured from each patient in half an hour interval. All the photos were sent to AutoCAD software 2011, calibrated and the distance between canine and molar hooks were measured. The data was analyzed using intraclass correlation coefficient. Photographs were found to have high reliability coefficient (P > 0.05). The introduced method is an accurate, efficient and reliable method for evaluation of tooth movement.
ERIC Educational Resources Information Center
Caballero, Marcos D.; Doughty, Leanne; Turnbull, Anna M.; Pepper, Rachel E.; Pollock, Steven J.
2017-01-01
Reliable and validated assessments of introductory physics have been instrumental in driving curricular and pedagogical reforms that lead to improved student learning. As part of an effort to systematically improve our sophomore-level classical mechanics and math methods course (CM 1) at CU Boulder, we have developed a tool to assess student…
Hong, Tran Thi; Phuong Hoa, Nguyen; Walker, Sue M; Hill, Peter S; Rao, Chalapati
2018-01-01
Mortality statistics form a crucial component of national Health Management Information Systems (HMIS). However, there are limitations in the availability and quality of mortality data at national level in Viet Nam. This study assessed the completeness of recorded deaths and the reliability of recorded causes of death (COD) in the A6 death registers in the national routine HMIS in Viet Nam. 1477 identified deaths in 2014 were reviewed in two provinces. A capture-recapture method was applied to assess the completeness of the A6 death registers. 1365 household verbal autopsy (VA) interviews were successfully conducted, and these were reviewed by physicians who assigned multiple and underlying cause of death (UCOD). These UCODs from VA were then compared with the CODs recorded in the A6 death registers, using kappa scores to assess the reliability of the A6 death register diagnoses. The overall completeness of the A6 death registers in the two provinces was 89.3% (95%CI: 87.8-90.8). No COD recorded in the A6 death registers demonstrated good reliability. There is very low reliability in recording of cardiovascular deaths (kappa for stroke = 0.47 and kappa for ischaemic heart diseases = 0.42) and diabetes (kappa = 0.33). The reporting of deaths due to road traffic accidents, HIV and some cancers are at a moderate level of reliability with kappa scores ranging between 0.57-0.69 (p<0.01). VA methods identify more specific COD than the A6 death registers, and also allow identification of multiple CODs. The study results suggest that data completeness in HMIS A6 death registers in the study sample of communes was relatively high (nearly 90%), but triangulation with death records from other sources would improve the completeness of this system. Further, there is an urgent need to enhance the reliability of COD recorded in the A6 death registers, for which VA methods could be effective. Focussed consultation among stakeholders is needed to develop a suitable mechanism and process for integrating VA methods into the national routine HMIS A6 death registers in Viet Nam.
The reliability of the Glasgow Coma Scale: a systematic review.
Reith, Florence C M; Van den Brande, Ruben; Synnot, Anneliese; Gruen, Russell; Maas, Andrew I R
2016-01-01
The Glasgow Coma Scale (GCS) provides a structured method for assessment of the level of consciousness. Its derived sum score is applied in research and adopted in intensive care unit scoring systems. Controversy exists on the reliability of the GCS. The aim of this systematic review was to summarize evidence on the reliability of the GCS. A literature search was undertaken in MEDLINE, EMBASE and CINAHL. Observational studies that assessed the reliability of the GCS, expressed by a statistical measure, were included. Methodological quality was evaluated with the consensus-based standards for the selection of health measurement instruments checklist and its influence on results considered. Reliability estimates were synthesized narratively. We identified 52 relevant studies that showed significant heterogeneity in the type of reliability estimates used, patients studied, setting and characteristics of observers. Methodological quality was good (n = 7), fair (n = 18) or poor (n = 27). In good quality studies, kappa values were ≥0.6 in 85%, and all intraclass correlation coefficients indicated excellent reliability. Poor quality studies showed lower reliability estimates. Reliability for the GCS components was higher than for the sum score. Factors that may influence reliability include education and training, the level of consciousness and type of stimuli used. Only 13% of studies were of good quality and inconsistency in reported reliability estimates was found. Although the reliability was adequate in good quality studies, further improvement is desirable. From a methodological perspective, the quality of reliability studies needs to be improved. From a clinical perspective, a renewed focus on training/education and standardization of assessment is required.
Validation of the self-assessment teamwork tool (SATT) in a cohort of nursing and medical students.
Roper, Lucinda; Shulruf, Boaz; Jorm, Christine; Currie, Jane; Gordon, Christopher J
2018-02-09
Poor teamwork has been implicated in medical error and teamwork training has been shown to improve patient care. Simulation is an effective educational method for teamwork training. Post-simulation reflection aims to promote learning and we have previously developed a self-assessment teamwork tool (SATT) for health students to measure teamwork performance. This study aimed to evaluate the psychometric properties of a revised self-assessment teamwork tool. The tool was tested in 257 medical and nursing students after their participation in one of several mass casualty simulations. Using exploratory and confirmatory factor analysis, the revised self-assessment teamwork tool was shown to have strong construct validity, high reliability, and the construct demonstrated invariance across groups (Medicine & Nursing). The modified SATT was shown to be a reliable and valid student self-assessment tool. The SATT is a quick and practical method of guiding students' reflection on important teamwork skills.
A method for evaluating competency in assessment and management of suicide risk.
Hung, Erick K; Binder, Renée L; Fordwood, Samantha R; Hall, Stephen E; Cramer, Robert J; McNiel, Dale E
2012-01-01
Although health professionals increasingly are expected to be able to assess and manage patients' risk for suicide, few methods are available to evaluate this competency. This report describes development of a competency-assessment instrument for suicide risk-assessment (CAI-S), and evaluates its use in an objective structured clinical examination (OSCE). The authors developed the CAI-S on the basis of the literature on suicide risk-assessment and management, and consultation with faculty focus groups from three sites in a large academic psychiatry department. The CAI-S structures faculty ratings regarding interviewing and data collection, case formulation and presentation, treatment-planning, and documentation. To evaluate the CAI-S, 31 faculty members used it to rate the performance of 31 learners (26 psychiatric residents and 5 clinical psychology interns) who participated in an OSCE. After interviewing a standardized patient, learners presented their risk-assessment findings and treatment plans. Faculty used the CAI-S to structure feedback to the learners. In a subsidiary study of interrater reliability, six faculty members rated video-recorded suicide risk-assessments. The CAI-S showed good internal consistency, reliability, and interrater reliability. Concurrent validity was supported by the finding that CAI-S ratings were higher for senior learners than junior learners, and were higher for learners with more clinical experience with suicidal patients than learners with less clinical experience. Faculty and learners rated the method as helpful for structuring feedback and supervision. The findings support the usefulness of the CAI-S for evaluating competency in suicide risk-assessment and management.
Internet addiction assessment tools: dimensional structure and methodological status.
Lortie, Catherine L; Guitton, Matthieu J
2013-07-01
Excessive internet use is becoming a concern, and some have proposed that it may involve addiction. We evaluated the dimensions assessed by, and psychometric properties of, a range of questionnaires purporting to assess internet addiction. Fourteen questionnaires were identified purporting to assess internet addiction among adolescents and adults published between January 1993 and October 2011. Their reported dimensional structure, construct, discriminant and convergent validity and reliability were assessed, as well as the methods used to derive these. Methods used to evaluate internet addiction questionnaires varied considerably. Three dimensions of addiction predominated: compulsive use (79%), negative outcomes (86%) and salience (71%). Less common were escapism (21%), withdrawal symptoms (36%) and other dimensions. Measures of validity and reliability were found to be within normally acceptable limits. There is a broad convergence of questionnaires purporting to assess internet addiction suggesting that compulsive use, negative outcome and salience should be covered and the questionnaires show adequate psychometric properties. However, the methods used to evaluate the questionnaires vary widely and possible factors contributing to excessive use such as social motivation do not appear to be covered. © 2013 Society for the Study of Addiction.
Gage, R; Wilson, N; Signal, L; Barr, M; Mackay, C; Reeder, A; Thomson, G
2018-05-16
Shade in public spaces can lower the risk of and sun burning and skin cancer. However, existing methods of auditing shade require travel between sites, and sunny weather conditions. This study aimed to evaluate the feasibility of free computer software-Google Earth-for assessing shade in urban open spaces. A shade projection method was developed that uses Google Earth street view and aerial images to estimate shade at solar noon on the summer solstice, irrespective of the date of image capture. Three researchers used the method to separately estimate shade cover over pre-defined activity areas in a sample of 45 New Zealand urban open spaces, including 24 playgrounds, 12 beaches and 9 outdoor pools. Outcome measures included method accuracy (assessed by comparison with a subsample of field observations of 10 of the settings) and inter-rater reliability. Of the 164 activity areas identified in the 45 settings, most (83%) had no shade cover. The method identified most activity areas in playgrounds (85%) and beaches (93%) and was accurate for assessing shade over these areas (predictive values of 100%). Only 8% of activity areas at outdoor pools were identified, due to a lack of street view images. Reliability for shade cover estimates was excellent (intraclass correlation coefficient of 0.97, 95% CI 0.97-0.98). Google Earth appears to be a reasonably accurate and reliable and shade audit tool for playgrounds and beaches. The findings are relevant for programmes focused on supporting the development of healthy urban open spaces.
An overview of the mathematical and statistical analysis component of RICIS
NASA Technical Reports Server (NTRS)
Hallum, Cecil R.
1987-01-01
Mathematical and statistical analysis components of RICIS (Research Institute for Computing and Information Systems) can be used in the following problem areas: (1) quantification and measurement of software reliability; (2) assessment of changes in software reliability over time (reliability growth); (3) analysis of software-failure data; and (4) decision logic for whether to continue or stop testing software. Other areas of interest to NASA/JSC where mathematical and statistical analysis can be successfully employed include: math modeling of physical systems, simulation, statistical data reduction, evaluation methods, optimization, algorithm development, and mathematical methods in signal processing.
Dibai-Filho, Almir V.; Guirro, Elaine C. O.; Ferreira, Vânia T. K.; Brandino, Hugo E.; Vaz, Maíta M. O. L. L.; Guirro, Rinaldo R. J.
2015-01-01
BACKGROUND: Infrared thermography is recognized as a viable method for evaluation of subjects with myofascial pain. OBJECTIVE: The aim of the present study was to assess the intra- and inter-rater reliability of infrared image analysis of myofascial trigger points in the upper trapezius muscle. METHOD: A reliability study was conducted with 24 volunteers of both genders (23 females) between 18 and 30 years of age (22.12±2.54), all having cervical pain and presence of active myofascial trigger point in the upper trapezius muscle. Two trained examiners performed analysis of point, line, and area of the infrared images at two different periods with a 1-week interval. The intra-class correlation coefficient (ICC2,1) was used to assess the intra- and inter-rater reliability. RESULTS: With regard to the intra-rater reliability, ICC values were between 0.591 and 0.993, with temperatures between 0.13 and 1.57 °C for values of standard error of measurement (SEM) and between 0.36 and 4.35 °C for the minimal detectable change (MDC). For the inter-rater reliability, ICC ranged from 0.615 to 0.918, with temperatures between 0.43 and 1.22 °C for the SEM and between 1.19 and 3.38 °C for the MDC. CONCLUSION: The methods of infrared image analyses of myofascial trigger points in the upper trapezius muscle employed in the present study are suitable for clinical and research practices. PMID:25993626
Reliability of a smartphone-based goniometer for knee joint goniometry.
Ferriero, Giorgio; Vercelli, Stefano; Sartorio, Francesco; Muñoz Lasa, Susana; Ilieva, Elena; Brigatti, Elisa; Ruella, Carolina; Foti, Calogero
2013-06-01
The aim of this study was to assess the reliability of a smartphone-based application developed for photographic-based goniometry, DrGoniometer (DrG), by comparing its measurement of the knee joint angle with that made by a universal goniometer (UG). Joint goniometry is a common mode of clinical assessment used in many disciplines, in particular in rehabilitation. One validated method is photographic-based goniometry, but the procedure is usually complex: the image has to be downloaded from the camera to a computer and then edited using dedicated software. This disadvantage may be overcome by the new generation of mobile phones (smartphones) that have computer-like functionality and an integrated digital camera. This validation study was carried out under two different controlled conditions: (i) with the participant to measure in a fixed position and (ii) with a battery of pictures to assess. In the first part, four raters performed repeated measurements with DrG and UG at different knee joint angles. Then, 10 other raters measured the knee at different flexion angles ranging 20-145° on a battery of 35 pictures taken in a clinical setting. The results showed that inter-rater and intra-rater correlations were always more than 0.958. Agreement with the UG showed a width of 18.2° [95% limits of agreement (LoA)=-7.5/+10.7°] and 14.1° (LoA=-6.6/+7.5°). In conclusion, DrG seems to be a reliable method for measuring knee joint angle. This mHealth application can be an alternative/additional method of goniometry, easier to use than other photographic-based goniometric assessments. Further studies are required to assess its reliability for the measurement of other joints.
Cheng, Shu-Fen; Rose, Susan
2009-01-01
This study investigated the technical adequacy of curriculum-based measures of written expression (CBM-W) in terms of writing prompts and scoring methods for deaf and hard-of-hearing students. Twenty-two students at the secondary school-level completed 3-min essays within two weeks, which were scored for nine existing and alternative curriculum-based measurement (CBM) scoring methods. The technical features of the nine scoring methods were examined for interrater reliability, alternate-form reliability, and criterion-related validity. The existing CBM scoring method--number of correct minus incorrect word sequences--yielded the highest reliability and validity coefficients. The findings from this study support the use of the CBM-W as a reliable and valid tool for assessing general writing proficiency with secondary students who are deaf or hard of hearing. The CBM alternative scoring methods that may serve as additional indicators of written expression include correct subject-verb agreements, correct clauses, and correct morphemes.
Fatigue Reliability of Gas Turbine Engine Structures
NASA Technical Reports Server (NTRS)
Cruse, Thomas A.; Mahadevan, Sankaran; Tryon, Robert G.
1997-01-01
The results of an investigation are described for fatigue reliability in engine structures. The description consists of two parts. Part 1 is for method development. Part 2 is a specific case study. In Part 1, the essential concepts and practical approaches to damage tolerance design in the gas turbine industry are summarized. These have evolved over the years in response to flight safety certification requirements. The effect of Non-Destructive Evaluation (NDE) methods on these methods is also reviewed. Assessment methods based on probabilistic fracture mechanics, with regard to both crack initiation and crack growth, are outlined. Limit state modeling techniques from structural reliability theory are shown to be appropriate for application to this problem, for both individual failure mode and system-level assessment. In Part 2, the results of a case study for the high pressure turbine of a turboprop engine are described. The response surface approach is used to construct a fatigue performance function. This performance function is used with the First Order Reliability Method (FORM) to determine the probability of failure and the sensitivity of the fatigue life to the engine parameters for the first stage disk rim of the two stage turbine. A hybrid combination of regression and Monte Carlo simulation is to use incorporate time dependent random variables. System reliability is used to determine the system probability of failure, and the sensitivity of the system fatigue life to the engine parameters of the high pressure turbine. 'ne variation in the primary hot gas and secondary cooling air, the uncertainty of the complex mission loading, and the scatter in the material data are considered.
ShahAli, Shabnam; Arab, Amir Massoud; Talebian, Saeed; Ebrahimi, Esmaeil; Bahmani, Andia; Karimi, Noureddin; Nabavi, Hoda
2015-07-01
The study was designed to evaluate the intra-examiner reliability of ultrasound (US) thickness measurement of abdominal muscles activity when supine lying and during two isometric endurance tests in subjects with and without Low back pain (LBP). A total of 19 women (9 with LBP, 10 without LBP) participated in the study. Within-day reliability of the US thickness measurements at supine lying and the two isometric endurance tests were assessed in all subjects. The intra-class correlation coefficient (ICC) was used to assess the relative reliability of thickness measurement. The standard error of measurement (SEM), minimal detectable change (MDC) and the coefficient of variation (CV) were used to evaluate the absolute reliability. Results indicated high ICC scores (0.73-0.99) and also small SEM and MDC scores for within-day reliability assessment. The Bland-Altman plots of agreement in US measurement of the abdominal muscles during the two isometric endurance tests demonstrated that 95% of the observations fall between the limits of agreement for test and retest measurements. Together the results indicate high intra-tester reliability for the US measurement of the thickness of abdominal muscles in all the positions tested. According to the study's findings, US imaging can be used as a reliable method for assessment of abdominal muscles activity in supine lying and the two isometric endurance tests employed, in participants with and without LBP. Copyright © 2014 Elsevier Ltd. All rights reserved.
Reliability of cervical vertebral maturation staging.
Rainey, Billie-Jean; Burnside, Girvan; Harrison, Jayne E
2016-07-01
Growth and its prediction are important for the success of many orthodontic treatments. The aim of this study was to determine the reliability of the cervical vertebral maturation (CVM) method for the assessment of mandibular growth. A group of 20 orthodontic clinicians, inexperienced in CVM staging, was trained to use the improved version of the CVM method for the assessment of mandibular growth with a teaching program. They independently assessed 72 consecutive lateral cephalograms, taken at Liverpool University Dental Hospital, on 2 occasions. The cephalograms were presented in 2 different random orders and interspersed with 11 additional images for standardization. The intraobserver and interobserver agreement values were evaluated using the weighted kappa statistic. The intraobserver and interobserver agreement values were substantial (weighted kappa, 0.6-0.8). The overall intraobserver agreement was 0.70 (SE, 0.01), with average agreement of 89%. The interobserver agreement values were 0.68 (SE, 0.03) for phase 1 and 0.66 (SE, 0.03) for phase 2, with average interobserver agreement of 88%. The intraobserver and interobserver agreement values of classifying the vertebral stages with the CVM method were substantial. These findings demonstrate that this method of CVM classification is reproducible and reliable. Copyright © 2016 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
Behrangrad, Shabnam; Kordi Yoosefinejad, Amin
2018-03-01
The purpose of this study is to investigate the validity and reliability of the Persian version of the Multidimensional Assessment of Fatigue Scale (MAFS) in an Iranian population with multiple sclerosis. A self-reported survey on fatigue including the MAFS, Fatigue Impact Scale and demographic measures was completed by 130 patients with multiple sclerosis and 60 healthy persons sampled with a convenience method. Test-retest reliability and validity were evaluated 3 days apart. Construct validity of the MAFS was assessed with the Fatigue Impact Scale. The MAFS had high internal consistency (Cronbach's alpha >0.9) and 3-d test-retest reliability (intraclass correlation coefficient = 0.99). Correlation between the Fatigue Impact Scale and MAFS was high (r = 0.99). Correlation between MAFS scores and the Expanded Disability Status Scale was also strong (r = 0.85). Questionnaire items showed acceptable item-scale correlation (0.968-0.993). The Persian version of the MAFS appears to be a valid and reliable questionnaire. It is an appropriate short multidimensional instrument to assess fatigue in patients with multiple sclerosis in clinical practice and research. Implications for Rehabilitation The Persian version of Multidimensional Assessment of Fatigue is a valid and reliable instrument for the assessment and monitoring the fatigue in Persian-language patients with multiple sclerosis. It is very easy to administer and a time efficient scale in comparison to other instruments evaluating fatigue in patients with multiple sclerosis.
Assessment of four midcarpal radiologic determinations.
Cho, Mickey S; Battista, Vincent; Dubin, Norman H; Pirela-Cruz, Miguel
2006-03-01
Several radiologic measurement methods have been described for determining static carpal alignment of the wrist. These include the scapholunate, radiolunate, and capitolunate angles. The triangulation method is an alternative radiologic measurement which we believe is easier to use and more reproducible and reliable than the above mentioned methods. The purpose of this study is to assess the intraobserver reproducibility and interobserver reliability of the triangulation method, scapholunate, radiolunate, and capitolunate angles. Twenty orthopaedic residents and staff at varying levels of training made four radiologic measurements including the scapholunate, radiolunate and capitolunate angles as well as the triangulation method on five different lateral, digitized radiographs of the wrist and forearm in neutral radioulnar deviation. Thirty days after the initial measurements, the participants repeated the four radiologic measurements using the same radiographs. The triangulation method had the best intra-and-interobserver agreement of the four methods tested. This agreement was significantly better than the capitolunate and radiolunate angles. The scapholunate angle had the next best intraobserver reproducibility and interobserver reliability. The triangulation method has the best overall observer agreement when compared to the scapholunate, radiolunate, and capitolunate angles in determining static midcarpal alignment. No comment can be made on the validity of the measurements since there is no radiographic gold standard in determining static carpal alignment.
Automated reliability assessment for spectroscopic redshift measurements
NASA Astrophysics Data System (ADS)
Jamal, S.; Le Brun, V.; Le Fèvre, O.; Vibert, D.; Schmitt, A.; Surace, C.; Copin, Y.; Garilli, B.; Moresco, M.; Pozzetti, L.
2018-03-01
Context. Future large-scale surveys, such as the ESA Euclid mission, will produce a large set of galaxy redshifts (≥106) that will require fully automated data-processing pipelines to analyze the data, extract crucial information and ensure that all requirements are met. A fundamental element in these pipelines is to associate to each galaxy redshift measurement a quality, or reliability, estimate. Aim. In this work, we introduce a new approach to automate the spectroscopic redshift reliability assessment based on machine learning (ML) and characteristics of the redshift probability density function. Methods: We propose to rephrase the spectroscopic redshift estimation into a Bayesian framework, in order to incorporate all sources of information and uncertainties related to the redshift estimation process and produce a redshift posterior probability density function (PDF). To automate the assessment of a reliability flag, we exploit key features in the redshift posterior PDF and machine learning algorithms. Results: As a working example, public data from the VIMOS VLT Deep Survey is exploited to present and test this new methodology. We first tried to reproduce the existing reliability flags using supervised classification in order to describe different types of redshift PDFs, but due to the subjective definition of these flags (classification accuracy 58%), we soon opted for a new homogeneous partitioning of the data into distinct clusters via unsupervised classification. After assessing the accuracy of the new clusters via resubstitution and test predictions (classification accuracy 98%), we projected unlabeled data from preliminary mock simulations for the Euclid space mission into this mapping to predict their redshift reliability labels. Conclusions: Through the development of a methodology in which a system can build its own experience to assess the quality of a parameter, we are able to set a preliminary basis of an automated reliability assessment for spectroscopic redshift measurements. This newly-defined method is very promising for next-generation large spectroscopic surveys from the ground and in space, such as Euclid and WFIRST. A table of the reclassified VVDS redshifts and reliability is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/611/A53
Between-day reliability of time-to-contact measures used to assess postural stability.
Wheat, Jonathan S; Haddad, Jeffrey M; Scaife, Robert
2012-02-01
Traditional measures of postural stability consider movement of the center of pressure (COP) or the center of mass (COM) without regard to the boundary of support (BOS). A potentially more appropriate measure is postural time-to-contact (TtC) which defines the spatio-temporal proximity of the COM or COP to the BOS. Given the increasing popularity of TtC measures, it is important to determine their reliability. Therefore, the purpose of this study was to determine the effects of the number of trials and trial duration on the reliability of postural TtC measures. COP data were collected (100 Hz) in 16 young healthy participants during 10 trials (60-s duration) of quiet standing with eyes open on two occasions - seven days apart. Postural TtC of each trial was calculated using two different methods. The intersession reliability of the TtC measures was assessed by calculating between session intraclass correlation coefficients (ICC(2,1)) using different combinations of the number of trials (1-10) and trial duration (10, 20, 30, 40, 50 and 60s). Both TtC methods were very reliable. Additionally, both measures of TtC were more reliable than the standard deviation of the anterior-posterior COP and slightly more reliable than path length. This difference was most pronounced when fewer and shorter trials were used. Copyright © 2011 Elsevier B.V. All rights reserved.
Collender, Philip A.; Kirby, Amy E.; Addiss, David G.; Freeman, Matthew C.; Remais, Justin V.
2015-01-01
Limiting the environmental transmission of soil-transmitted helminths (STH), which infect 1.5 billion people worldwide, will require sensitive, reliable, and cost effective methods to detect and quantify STH in the environment. We review the state of the art of STH quantification in soil, biosolids, water, produce, and vegetation with respect to four major methodological issues: environmental sampling; recovery of STH from environmental matrices; quantification of recovered STH; and viability assessment of STH ova. We conclude that methods for sampling and recovering STH require substantial advances to provide reliable measurements for STH control. Recent innovations in the use of automated image identification and developments in molecular genetic assays offer considerable promise for improving quantification and viability assessment. PMID:26440788
Computerized Analysis of Digital Photographs for Evaluation of Tooth Movement
Toodehzaeim, Mohammad Hossein; Karandish, Maryam; Karandish, Mohammad Nabi
2015-01-01
Objectives: Various methods have been introduced for evaluation of tooth movement in orthodontics. The challenge is to adopt the most accurate and most beneficial method for patients. This study was designed to introduce analysis of digital photographs with AutoCAD software as a method to evaluate tooth movement and assess the reliability of this method. Materials and Methods: Eighteen patients were evaluated in this study. Three intraoral digital images from the buccal view were captured from each patient in half an hour interval. All the photos were sent to AutoCAD software 2011, calibrated and the distance between canine and molar hooks were measured. The data was analyzed using intraclass correlation coefficient. Results: Photographs were found to have high reliability coefficient (P > 0.05). Conclusion: The introduced method is an accurate, efficient and reliable method for evaluation of tooth movement. PMID:26622272
A method for assessment of watershed health is developed by employing measures of reliability, resilience and vulnerability (R-R-V) using stream water quality data. Observed water quality data are usually sparse, so that a water quality time series is often reconstructed using s...
The Validation of a Food Label Literacy Questionnaire for Elementary School Children
ERIC Educational Resources Information Center
Reynolds, Jesse S.; Treu, Judith A.; Njike, Valentine; Walker, Jennifer; Smith, Erica; Katz, Catherine S.; Katz, David L.
2012-01-01
Objective: To determine the reliability and validity of a 10-item questionnaire, the Food Label Literacy for Applied Nutrition Knowledge questionnaire. Methods: Participants were elementary school children exposed to a 90-minute school-based nutrition program. Reliability was assessed via Cronbach alpha and intraclass correlation coefficient…
Dwyer, Tim; Martin, C Ryan; Kendra, Rita; Sermer, Corey; Chahal, Jaskarndip; Ogilvie-Harris, Darrell; Whelan, Daniel; Murnaghan, Lucas; Nauth, Aaron; Theodoropoulos, John
2017-06-01
To determine the interobserver reliability of the International Cartilage Repair Society (ICRS) grading system of chondral lesions in cadavers, to determine the intraobserver reliability of the ICRS grading system comparing arthroscopy and video assessment, and to compare the arthroscopic ICRS grading system with histological grading of lesion depth. Eighteen lesions in 5 cadaveric knee specimens were arthroscopically graded by 7 fellowship-trained arthroscopic surgeons using the ICRS classification system. The arthroscopic video of each lesion was sent to the surgeons 6 weeks later for repeat grading and determination of intraobserver reliability. Lesions were biopsied, and the depth of the cartilage lesion was assessed. Reliability was calculated using intraclass correlations. The interobserver reliability was 0.67 (95% confidence interval, 0.5-0.89) for the arthroscopic grading, and the intraobserver reliability with the video grading was 0.8 (95% confidence interval, 0.67-0.9). A high correlation was seen between the arthroscopic grading of depth and the histological grading of depth (0.91); on average, surgeons graded lesions using arthroscopy a mean of 0.37 (range, 0-0.86) deeper than the histological grade. The arthroscopic ICRS classification system has good interobserver and intraobserver reliability. A high correlation with histological assessment of depth provides evidence of validity for this classification system. As cartilage lesions are treated on the basis of the arthroscopic ICRS classification, it is important to ascertain the reliability and validity of this method. Copyright © 2016 Arthroscopy Association of North America. Published by Elsevier Inc. All rights reserved.
Gwynne, Craig R; Curran, Sarah A
2014-12-01
Clinical assessment of lower limb kinematics during dynamic tasks may identify individuals who demonstrate abnormal movement patterns that may lead to etiology of exacerbation of knee conditions such as patellofemoral joint (PFJt) pain. The purpose of this study was to determine the reliability, validity and associated measurement error of a clinically appropriate two-dimensional (2-D) procedure of quantifying frontal plane knee alignment during single limb squats. Nine female and nine male recreationally active subjects with no history of PFJt pain had frontal plane limb alignment assessed using three-dimensional (3-D) motion analysis and digital video cameras (2-D analysis) while performing single limb squats. The association between 2-D and 3-D measures was quantified using Pearson's product correlation coefficients. Intraclass correlation coefficients (ICCs) were determined for within- and between-session reliability of 2-D data and standard error of measurement (SEM) was used to establish measurement error. Frontal plane limb alignment assessed with 2-D analysis demonstrated good correlation compared with 3-D methods (r = 0.64 to 0.78, p < 0.001). Within-session (0.86) and between-session ICCs (0.74) demonstrated good reliability for 2-D measures and SEM scores ranged from 2° to 4°. 2-D measures have good consistency and may provide a valid measure of lower limb alignment when compared to existing 3-D methods. Assessment of lower limb kinematics using 2-D methods may be an accurate and clinically useful alternative to 3-D motion analysis when identifying individuals who demonstrate abnormal movement patterns associated with PFJt pain. 2b.
Feasibility of a Semi-computerized Line Bisection Test for Unilateral Visual Neglect Assessment.
Jee, H; Kim, J; Kim, C; Kim, T; Park, J
2015-01-01
Commonly used paper-and-pencil based test modalities for assessing the degree of unilateral visual neglect (ULN) in patients with hemispheric cerebral lesions consume human resources with a significant inter and intra-rater variability. To explore the feasibility of a semi-computerized electronic-pen based ULN assessment system (e-system) to improve assessment quality without altering the conventional user interface. Thirty cognitively healthy participants (HG) and 11 participants diagnosed with right-hemispheric lesion and unilateral visual neglect (NG) were recruited to evaluate the e-system. Line bisection tests (LBT) were repeatedly conducted twice for the inter-rater and intra-rater (reliability) comparisons. The LBT results were assessed by the e-system and the golden standard methods (manual rater assessment). The percent deviation (%), assessment duration (sec), and number of neglected line (each) were evaluated. The inter-rater comparisons of the assessed deviation (%) variable showed excellent interrater reliabilities (CCCs) ranging from .84 (.59 to .95 (p < .001)) to .99 (.90 to .99 (p < .001)) for HG and NG. The Bland Altman mean difference (B-A) plots with bias (95% LOA (limits of agreement)) showed similar agreements between the e-system and the raters ranging from -.04 % (-2.10 to 1.97) to 1.30 % (-2.23 to 4.84) for HG and NG. The effect sizes (ES), which show similarities between the assessment methods, yielded smaller ranges from .01 to .30 for HG and NG. The reliability (test-retest) comparisons showed similar assessment results between the e-system, rater 1, and rater 2. The manual rater assessment time ranging from 5.85 to 6.00 minutes and inter- and intraassessment variations were virtually eliminated with the e-system. The semi-computerized system with the conventional paper-and pencil user-interface showed valid and reliable assessment results. It may be a feasible replacement for the manual rater assessment modality even in a clinical setting.
Figueroa, José; Guarachi, Juan Pablo; Matas, José; Arnander, Magnus; Orrego, Mario
2016-04-01
Computed tomography (CT) is widely used to assess component rotation in patients with poor results after total knee arthroplasty (TKA). The purpose of this study was to simultaneously determine the accuracy and reliability of CT in measuring TKA component rotation. TKA components were implanted in dry-bone models and assigned to two groups. The first group (n = 7) had variable femoral component rotations, and the second group (n = 6) had variable tibial tray rotations. CT images were then used to assess component rotation. Accuracy of CT rotational assessment was determined by mean difference, in degrees, between implanted component rotation and CT-measured rotation. Intraclass correlation coefficient (ICC) was applied to determine intra-observer and inter-observer reliability. Femoral component accuracy showed a mean difference of 2.5° and the tibial tray a mean difference of 3.2°. There was good intra- and inter-observer reliability for both components, with a femoral ICC of 0.8 and 0.76, and tibial ICC of 0.68 and 0.65, respectively. CT rotational assessment accuracy can differ from true component rotation by approximately 3° for each component. It does, however, have good inter- and intra-observer reliability.
Probabilistic Assessment of Fracture Progression in Composite Structures
NASA Technical Reports Server (NTRS)
Chamis, Christos C.; Minnetyan, Levon; Mauget, Bertrand; Huang, Dade; Addi, Frank
1999-01-01
This report describes methods and corresponding computer codes that are used to evaluate progressive damage and fracture and to perform probabilistic assessment in built-up composite structures. Structural response is assessed probabilistically, during progressive fracture. The effects of design variable uncertainties on structural fracture progression are quantified. The fast probability integrator (FPI) is used to assess the response scatter in the composite structure at damage initiation. The sensitivity of the damage response to design variables is computed. The methods are general purpose and are applicable to stitched and unstitched composites in all types of structures and fracture processes starting from damage initiation to unstable propagation and to global structure collapse. The methods are demonstrated for a polymer matrix composite stiffened panel subjected to pressure. The results indicated that composite constituent properties, fabrication parameters, and respective uncertainties have a significant effect on structural durability and reliability. Design implications with regard to damage progression, damage tolerance, and reliability of composite structures are examined.
Reliability of cervical lordosis measurement techniques on long-cassette radiographs.
Janusz, Piotr; Tyrakowski, Marcin; Yu, Hailong; Siemionow, Kris
2016-11-01
Lateral radiographs are commonly used to assess cervical sagittal alignment. Three assessment methods have been described and are commonly utilized in clinical practice. These methods are described for perfect lateral cervical radiographs, however in everyday practice radiograph quality varies. The aim of this study was to compare the reliability and reproducibility of 3 cervical lordosis (CL) measurement methods. Forty-four standing lateral radiographs were randomly chosen from a lateral long-cassette radiograph database. Measurements of CL were performed with: Cobb method C2-C7 (CM), C2-C7 posterior tangent method (PTM), sum of posterior tangent method for each segment (SPTM). Three independent orthopaedic surgeons measured CL using the three methods on 44 lateral radiographs. One researcher used the three methods to measured CL three times at 4-week time intervals. Agreement between the methods as well as their intra- and interobserver reliability were tested and quantified by intraclass correlation coefficient (ICC) and median error for a single measurement (SEM). ICC of 0.75 or more reflected an excellent agreement/reliability. The results were compared with repeated ANOVA test, with p < 0.05 considered as significant. All methods revealed excellent intra- and interobserver reliability. Agreement (ICC, SEM) between three methods was (0.89°, 3.44°), between CM and SPTM was (0.82°, 4.42°), between CM and PTM was (0.80°, 4.80°) and between PTM and SPTM was (0.99°, 1.10°). Mean values CL for a CM, PTM, SPTM were 10.5° ± 13.9°, 17.5° ± 15.6° and 17.7° ± 15.9° (p < 0.0001), respectively. The significant difference was between CM vs PTM (p < 0.0001) and CM vs SPTM (p < 0.0001), but not between PTM vs SPTM (p > 0.05). All three methods appeared to be highly reliable. Although, high agreement between all measurement methods was shown, we do not recommend using Cobb measurement method interchangeably with PTM or SPTM within a single study as this could lead to error, whereas, such a comparison between tangent methods can be considered.
Huang, X N; Zhang, Y; Feng, W W; Wang, H S; Cao, B; Zhang, B; Yang, Y F; Wang, H M; Zheng, Y; Jin, X M; Jia, M X; Zou, X B; Zhao, C X; Robert, J; Jing, Jin
2017-06-02
Objective: To evaluate the reliability and validity of warning signs checklist developed by the National Health and Family Planning Commission of the People's Republic of China (NHFPC), so as to determine the screening effectiveness of warning signs on developmental problems of early childhood. Method: Stratified random sampling method was used to assess the reliability and validity of checklist of warning sign and 2 110 children 0 to 6 years of age(1 513 low-risk subjects and 597 high-risk subjects) were recruited from 11 provinces of China. The reliability evaluation for the warning signs included the test-retest reliability and interrater reliability. With the use of Age and Stage Questionnaire (ASQ) and Gesell Development Diagnosis Scale (GESELL) as the criterion scales, criterion validity was assessed by determining the correlation and consistency between the screening results of warning signs and the criterion scales. Result: In terms of the warning signs, the screening positive rates at different ages ranged from 10.8%(21/141) to 26.2%(51/137). The median (interquartile) testing time for each subject was 1(0.6) minute. Both the test-retest reliability and interrater reliability of warning signs reached 0.7 or above, indicating that the stability was good. In terms of validity assessment, there was remarkable consistency between ASQ and warning signs, with the Kappa value of 0.63. With the use of GESELL as criterion, it was determined that the sensitivity of warning signs in children with suspected developmental delay was 82.2%, and the specificity was 77.7%. The overall Youden index was 0.6. Conclusion: The reliability and validity of warning signs checklist for screening early childhood developmental problems have met the basic requirements of psychological screening scales, with the characteristics of short testing time and easy operation. Thus, this warning signs checklist can be used for screening psychological and behavioral problems of early childhood, especially in community settings.
Whitfield, Richard H; Newcombe, Robert G; Woollard, Malcolm
2003-12-01
The introduction of the European Resuscitation Guidelines (2000) for cardiopulmonary resuscitation (CPR) and automated external defibrillation (AED) prompted the development of an up-to-date and reliable method of assessing the quality of performance of CPR in combination with the use of an AED. The Cardiff Test of basic life support (BLS) and AED version 3.1 was developed to meet this need and uses standardised checklists to retrospectively evaluate performance from analyses of video recordings and data drawn from a laptop computer attached to a training manikin. This paper reports the inter- and intra-observer reliability of this test. Data used to assess reliability were obtained from an investigation of CPR and AED skill acquisition in a lay responder AED training programme. Six observers were recruited to evaluate performance in 33 data sets, repeating their evaluation after a minimum interval of 3 weeks. More than 70% of the 42 variables considered in this study had a kappa score of 0.70 or above for inter-observer reliability or were drawn from computer data and therefore not subject to evaluator variability. 85% of the 42 variables had kappa scores for intra-observer reliability of 0.70 or above or were drawn from computer data. The standard deviations for inter- and intra-observer measures of time to first shock were 11.6 and 7.7 s, respectively. The inter- and intra-observer reliability for the majority of the variables in the Cardiff Test of BLS and AED version 3.1 is satisfactory. However, reliability is less acceptable with respect to shaking when checking for responsiveness, initial check/clearing of the airway, checks for signs of circulation, time to first shock and performance of interventions in the correct sequence. Further research is required to determine if modifications to the method of assessing these variables can increase reliability.
Monheit, Gary D; Gendler, Ellen C; Poff, Bradley; Fleming, Laura; Bachtell, Nathan; Garcia, Emily; Burkholder, David
2010-11-01
Various scoring techniques prone to subjective interpretation have been used to evaluate soft tissue augmentation of nasolabial folds (NLFs). To design and validate a reliable wrinkle assessment scoring scale. Six photographed wrinkles of varying severity were electronically copied onto the same facial image to become a 6-point grading scale (GGS). A pilot training program (13 investigators) determined reliability, and a 12-week multicenter survey study validated the GGS scoring method. Pilot study inter- and intrarater scoring reliability were high (weighted kappa scores of 0.85 and 0.86, respectively). Seventy-five percent of survey investigators and independent review panel (IRP) members considered a GGS score difference of 0.5 to be a minimally perceivable difference. Interrater weighted kappa scores were 0.91 for the IRP and 0.80 for investigators. Intrarater agreements after repeat testing were 0.91 and 0.89, respectively. The baseline "live" assessment GGS mean score was 3.34, and the baseline blinded photographic assessment GGS mean score was 2.00 for the IRP and 2.16 for the investigators. The GGS is a reproducible method of grading the severity of NLF wrinkles. Treatment effectiveness of a dermal filler can be reliably evaluated using the GGS by comparing "live" assessments with the standard GGS photographic panel. © 2010 by the American Society for Dermatologic Surgery, Inc.
Sayer, R Drew; Tamer, Gregory G; Chen, Ningning; Tregellas, Jason R; Cornier, Marc-Andre; Kareken, David A; Talavage, Thomas M; McCrory, Megan A; Campbell, Wayne W
2016-01-01
Objective The brain’s reward system influences ingestive behavior and subsequently, obesity risk. Functional magnetic resonance imaging (fMRI) is a common method for investigating brain reward function. We sought to assess the reproducibility of fasting-state brain responses to visual food stimuli using BOLD fMRI. Methods A priori brain regions of interest included bilateral insula, amygdala, orbitofrontal cortex, caudate, and putamen. Fasting-state fMRI and appetite assessments were completed by 28 women (n=16) and men (n=12) with overweight or obesity on 2 days. Reproducibility was assessed by comparing mean fasting-state brain responses and measuring test-retest reliability of these responses on the 2 testing days. Results Mean fasting-state brain responses on Day 2 were reduced compared to Day 1 in the left insula and right amygdala, but mean Day 1 and Day 2 responses were not different in the other regions of interest. With the exception of the left orbitofrontal cortex response (fair reliability), test-retest reliabilities of brain responses were poor or unreliable. Conclusion fMRI-measured responses to visual food cues in adults with overweight or obesity show relatively good mean-level reproducibility, but considerable within-subject variability. Poor test-retest reliability reduces the likelihood of observing true correlations and increases the necessary sample sizes for studies. PMID:27542906
Methodology for Developing a New EFNEP Food and Physical Activity Behaviors Questionnaire.
Murray, Erin K; Auld, Garry; Baker, Susan S; Barale, Karen; Franck, Karen; Khan, Tarana; Palmer-Keenan, Debra; Walsh, Jennifer
2017-10-01
Research methods are described for developing a food and physical activity behaviors questionnaire for the Expanded Food and Nutrition Education Program (EFNEP), a US Department of Agriculture nutrition education program serving low-income families. Mixed-methods observational study. The questionnaire will include 5 domains: (1) diet quality, (2) physical activity, (3) food safety, (4) food security, and (5) food resource management. A 5-stage process will be used to assess the questionnaire's test-retest reliability and content, face, and construct validity. Research teams across the US will coordinate questionnaire development and testing nationally. Convenience samples of low-income EFNEP, or EFNEP-eligible, adult participants across the US. A 5-stage process: (1) prioritize domain concepts to evaluate (2) question generation and content analysis panel, (3) question pretesting using cognitive interviews, (4) test-retest reliability assessment, and (5) construct validity testing. A nationally tested valid and reliable food and physical activity behaviors questionnaire for low-income adults to evaluate EFNEP's effectiveness. Cognitive interviews will be summarized to identify themes and dominant trends. Paired t tests (P ≤ .05) and Spearman and intra-class correlation coefficients (r > .5) will be conducted to assess reliability. Construct validity will be assessed using Wilcoxon t test (P ≤ .05), Spearman correlations, and Bland-Altman plots. Copyright © 2017 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
Inter-Observer Reliability of DSM-5 Substance Use Disorders*
Denis, Cécile M.; Gelernter, Joel; Hart, Amy B.; Kranzler, Henry R.
2015-01-01
Aims Although studies have examined the impact of changes made in DSM-5 on the estimated prevalence of substance use disorder (SUD) diagnoses, there is limited evidence of the reliability of DSM-5 SUDs. We evaluated the inter-observer reliability of four DSM-5 SUDs in a sample in which we had previously evaluated the reliability of DSM-IV diagnoses, allowing us to compare the two systems. Methods Two different interviewers each assessed 173 subjects over a 2-week period using the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA). Using the percent agreement and kappa (κ) coefficient, we examined the reliability of DSM-5 lifetime alcohol, opioid, cocaine, and cannabis use disorders, which we compared to that of SSADDA-derived DSM-IV SUD diagnoses. We also assessed the effect of additional lifetime SUD and lifetime mood or anxiety disorder diagnoses on the reliability of the DSM-5 SUD diagnoses. Results Reliability was good to excellent for the four disorders, with κ values ranging from 0.65 to 0.94. Agreement was consistently lower for SUDs of mild severity than for moderate or severe disorders. DSM-5 SUD diagnoses showed greater reliability than DSM-IV diagnoses of abuse or dependence or dependence only. Co-occurring SUD and lifetime mood or anxiety disorders exerted a modest effect on the reliability of the DSM-5 SUD diagnoses. Conclusions For alcohol, opioid, cocaine and cannabis use disorders, DSM-5 criteria and diagnoses are at least as reliable as those of DSM-IV. PMID:26048641
Methods to Improve Reliability of Video Recorded Behavioral Data
Haidet, Kim Kopenhaver; Tate, Judith; Divirgilio-Thomas, Dana; Kolanowski, Ann; Happ, Mary Beth
2009-01-01
Behavioral observation is a fundamental component of nursing practice and a primary source of clinical research data. The use of video technology in behavioral research offers important advantages to nurse scientists in assessing complex behaviors and relationships between behaviors. The appeal of using this method should be balanced, however, by an informed approach to reliability issues. In this paper, we focus on factors that influence reliability, such as the use of sensitizing sessions to minimize participant reactivity and the importance of training protocols for video coders. In addition, we discuss data quality, the selection and use of observational tools, calculating reliability coefficients, and coding considerations for special populations based on our collective experiences across three different populations and settings. PMID:19434651
Assessment of NDE reliability data
NASA Technical Reports Server (NTRS)
Yee, B. G. W.; Couchman, J. C.; Chang, F. H.; Packman, D. F.
1975-01-01
Twenty sets of relevant nondestructive test (NDT) reliability data were identified, collected, compiled, and categorized. A criterion for the selection of data for statistical analysis considerations was formulated, and a model to grade the quality and validity of the data sets was developed. Data input formats, which record the pertinent parameters of the defect/specimen and inspection procedures, were formulated for each NDE method. A comprehensive computer program was written and debugged to calculate the probability of flaw detection at several confidence limits by the binomial distribution. This program also selects the desired data sets for pooling and tests the statistical pooling criteria before calculating the composite detection reliability. An example of the calculated reliability of crack detection in bolt holes by an automatic eddy current method is presented.
Assessment of College and University Campus Tobacco-Free Policies in North Carolina
ERIC Educational Resources Information Center
Lee, Joseph G. L.; Goldstein, Adam O.; Klein, Elizabeth G.; Ranney, Leah M.; Carver, Ashlea M.
2012-01-01
Objective: To develop a reliable and efficient method for assessing prevalence and strength of college/university tobacco-related policies. Participants: North Carolina (NC) public universities, community colleges, and private colleges/universities (N = 110). Methods: A census of policies using campus handbooks and Web sites was conducted in March…
RELIABILITY OF ANKLE-FOOT MORPHOLOGY, MOBILITY, STRENGTH, AND MOTOR PERFORMANCE MEASURES.
Fraser, John J; Koldenhoven, Rachel M; Saliba, Susan A; Hertel, Jay
2017-12-01
Assessment of foot posture, morphology, intersegmental mobility, strength and motor control of the ankle-foot complex are commonly used clinically, but measurement properties of many assessments are unclear. To determine test-retest and inter-rater reliability, standard error of measurement, and minimal detectable change of morphology, joint excursion and play, strength, and motor control of the ankle-foot complex. Reliability study. 24 healthy, recreationally-active young adults without history of ankle-foot injury were assessed by two clinicians on two occasions, three to ten days apart. Measurement properties were assessed for foot morphology (foot posture index, total and truncated length, width, arch height), joint excursion (weight-bearing dorsiflexion, rearfoot and hallux goniometry, forefoot inclinometry, 1 st metatarsal displacement) and joint play, strength (handheld dynamometry), and motor control rating during intrinsic foot muscle (IFM) exercises. Clinician order was randomized using a Latin Square. The clinicians performed independent examinations and did not confer on the findings for the duration of the study. Test-retest and inter-tester reliability and agreement was assessed using intraclass correlation coefficients (ICC 2,k ) and weighted kappa ( K w ). Test-retest reliability ICC were as follows: morphology: .80-1.00, joint excursion: .58-.97, joint play: -.67-.84, strength: .67-.92, IFM motor rating: K W -.01-.71. Inter-rater reliability ICC were as follows: morphology: .81-1.00, joint excursion: .32-.97, joint play: -1.06-1.00, strength: .53-.90, and IFM motor rating: K w .02-.56. Measures of ankle-foot posture, morphology, joint excursion, and strength demonstrated fair to excellent test-retest and inter-rater reliability. Test-retest reliability for rating of perceived difficulty and motor performance was good to excellent for short-foot, toe-spread-out, and hallux exercises and poor to fair for lesser toe extension. Joint play measures had poor to fair reliability overall. The findings of this study should be considered when choosing methods of clinical assessment and outcome measures in practice and research. 3.
Item Response Theory for Peer Assessment
ERIC Educational Resources Information Center
Uto, Masaki; Ueno, Maomi
2016-01-01
As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…
ERIC Educational Resources Information Center
Kim, Ho Sung
2013-01-01
A quantitative method for estimating an expected uncertainty (reliability and validity) in assessment results arising from the relativity between four variables, viz examiner's expertise, examinee's expertise achieved, assessment task difficulty and examinee's performance, was developed for the complex assessment applicable to final…
Whitford, Heather M; Donnan, Peter T; Symon, Andrew G; Kellett, Gillian; Monteith-Hodge, Ewa; Rauchhaus, Petra; Wyatt, Jeremy C
2012-01-01
To test the reliability, validity, acceptability, and practicality of short message service (SMS) messaging for collection of research data. The studies were carried out in a cohort of recently delivered women in Tayside, Scotland, UK, who were asked about their current infant feeding method and future feeding plans. Reliability was assessed by comparison of their responses to two SMS messages sent 1 day apart. Validity was assessed by comparison of their responses to text questions and the same question administered by phone 1 day later, by comparison with the same data collected from other sources, and by correlation with other related measures. Acceptability was evaluated using quantitative and qualitative questions, and practicality by analysis of a researcher log. Reliability of the factual SMS message gave perfect agreement. Reliabilities for the numerical question were reasonable, with κ between 0.76 (95% CI 0.56 to 0.96) and 0.80 (95% CI 0.59 to 1.00). Validity for data compared with that collected by phone within 24 h (κ =0.92 (95% CI 0.84 to 1.00)) and with health visitor data (κ =0.85 (95% CI 0.73 to 0.97)) was excellent. Correlation validity between the text responses and other related demographic and clinical measures was as expected. Participants found the method a convenient and acceptable way of providing data. For researchers, SMS text messaging provided an easy and functional method of gathering a large volume of data. In this sample and for these questions, SMS was a reliable and valid method for capturing research data.
Park, Ji Eun; Han, Kyunghwa; Sung, Yu Sub; Chung, Mi Sun; Koo, Hyun Jung; Yoon, Hee Mang; Choi, Young Jun; Lee, Seung Soo; Kim, Kyung Won; Shin, Youngbin; An, Suah; Cho, Hyo-Min
2017-01-01
Objective To evaluate the frequency and adequacy of statistical analyses in a general radiology journal when reporting a reliability analysis for a diagnostic test. Materials and Methods Sixty-three studies of diagnostic test accuracy (DTA) and 36 studies reporting reliability analyses published in the Korean Journal of Radiology between 2012 and 2016 were analyzed. Studies were judged using the methodological guidelines of the Radiological Society of North America-Quantitative Imaging Biomarkers Alliance (RSNA-QIBA), and COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative. DTA studies were evaluated by nine editorial board members of the journal. Reliability studies were evaluated by study reviewers experienced with reliability analysis. Results Thirty-one (49.2%) of the 63 DTA studies did not include a reliability analysis when deemed necessary. Among the 36 reliability studies, proper statistical methods were used in all (5/5) studies dealing with dichotomous/nominal data, 46.7% (7/15) of studies dealing with ordinal data, and 95.2% (20/21) of studies dealing with continuous data. Statistical methods were described in sufficient detail regarding weighted kappa in 28.6% (2/7) of studies and regarding the model and assumptions of intraclass correlation coefficient in 35.3% (6/17) and 29.4% (5/17) of studies, respectively. Reliability parameters were used as if they were agreement parameters in 23.1% (3/13) of studies. Reproducibility and repeatability were used incorrectly in 20% (3/15) of studies. Conclusion Greater attention to the importance of reporting reliability, thorough description of the related statistical methods, efforts not to neglect agreement parameters, and better use of relevant terminology is necessary. PMID:29089821
Measurement properties of gingival biotype evaluation methods.
Alves, Patrick Henry Machado; Alves, Thereza Cristina Lira Pacheco; Pegoraro, Thiago Amadei; Costa, Yuri Martins; Bonfante, Estevam Augusto; de Almeida, Ana Lúcia Pompéia Fraga
2018-06-01
There are numerous methods to measure the dimensions of the gingival tissue, but few have compared the effectiveness of one method over another. This study aimed to describe a new method and to estimate the validity of gingival biotype assessment with the aid of computed tomography scanning (CTS). In each patient different methods of evaluation of the gingival thickness were used: transparency of periodontal probe, transgingival, photography, and a new method of CTS). Intrarater and interrater reliability considering the categorical classification of the gingival biotype were estimated with Cohen's kappa coefficient, intraclass correlation coefficient (ICC), and ANOVA (P < .05). The criterion validity of the CTS was determined using the transgingival method as the reference standard. Sensitivity and specificity values were computed along with theirs 95% CI. Twelve patients were subjected to assessment of their gingival thickness. The highest agreement was found between transgingival and CTS (86.1%). The comparison between the categorical classifications of CTS and the transgingival method (reference standard) showed high specificity (94.92%) and low sensitivity (53.85%) for definition of a thin biotype. The new method of CTS assessment to classify gingival tissue thickness can be considered reliable and clinically useful to diagnose thick biotype. © 2018 Wiley Periodicals, Inc.
Home Lighting Assessment for Clients With Low Vision
Bhorade, Anjali; Gordon, Mae; Hollingsworth, Holly; Engsberg, Jack E.; Baum, M. Carolyn
2013-01-01
OBJECTIVE. The goal was to develop an objective, comprehensive, near-task home lighting assessment for older adults with low vision. METHOD. A home lighting assessment was developed and tested with older adults with low vision. Interrater and test–retest reliability studies were conducted. Clinical utility was assessed by occupational therapists with expertise in low vision rehabilitation. RESULTS. Interrater reliability was high (intraclass correlation coefficient [ICC] = .83–1.0). Test–retest reliability was moderate (ICC = .67). Responses to a Clinical Utility Feedback Form developed for this study indicated that the Home Environment Lighting Assessment (HELA) has strong clinical utility. CONCLUSION. The HELA provides a structured tool to describe the quantitative and qualitative aspects of home lighting environments where near tasks are performed and can be used to plan lighting interventions. The HELA has the potential to affect assessment and intervention practices of rehabilitation professionals in the area of low vision and improve near-task performance of people with low vision. PMID:24195901
Clinical assessment of effusion in knee osteoarthritis—A systematic review
Maricar, Nasimah; Callaghan, Michael J.; Parkes, Matthew J.; Felson, David T.; O׳Neill, Terence W.
2016-01-01
Objective The aim of this systematic review was to determine the validity and inter- and intra-observer reliability of the assessment of knee joint effusion in osteoarthritis (OA) of the knee. Methods MEDLINE, Web of Knowledge, CINAHL, EMBASE, and AMED were searched from their inception to February 2015. Articles were included according to a priori defined criteria: samples containing participants with knee OA; prospective evaluation of clinical tests and assessments of knee effusion that included reliability, sensitivity, and specificity of these tests. Results A total of 10 publications were reviewed. Eight of these considered reliability and four on validity of clinical assessments against ultrasound effusion. It was not possible to undertake a meta-analysis of reliability or validity because of differences in study designs and the clinical tests. Intra-observer kappa agreement for visible swelling ranged from 0.37 (suprapatellar) to 1.0 (prepatellar); for bulge sign 0.47 and balloon sign 0.37. Inter-observer kappa agreement for visible swelling ranged from −0.02 (prepatellar) to 0.65 (infrapatellar), the balloon sign −0.11 to 0.82, patellar tap −0.02 to 0.75 and bulge sign kappa −0.04 to 0.14 or reliability coefficient 0.97. Reliability and diagnostic accuracy tended to be better in experienced observers. Very few data looked at performance of individual clinical tests with sensitivity ranging 18.2–85.7% and specificity 35.3–93.3%, both higher with larger effusions. Conclusion The majority of unstandardized clinical tests to assess joint effusion in knee OA had relatively low intra- and inter-observer reliability. There is some evidence experience improved reliability and diagnostic accuracy of tests. Currently there is insufficient evidence to recommend any particular test in clinical practice. PMID:26581486
Rollover risk prediction of heavy vehicles by reliability index and empirical modelling
NASA Astrophysics Data System (ADS)
Sellami, Yamine; Imine, Hocine; Boubezoul, Abderrahmane; Cadiou, Jean-Charles
2018-03-01
This paper focuses on a combination of a reliability-based approach and an empirical modelling approach for rollover risk assessment of heavy vehicles. A reliability-based warning system is developed to alert the driver to a potential rollover before entering into a bend. The idea behind the proposed methodology is to estimate the rollover risk by the probability that the vehicle load transfer ratio (LTR) exceeds a critical threshold. Accordingly, a so-called reliability index may be used as a measure to assess the vehicle safe functioning. In the reliability method, computing the maximum of LTR requires to predict the vehicle dynamics over the bend which can be in some cases an intractable problem or time-consuming. With the aim of improving the reliability computation time, an empirical model is developed to substitute the vehicle dynamics and rollover models. This is done by using the SVM (Support Vector Machines) algorithm. The preliminary obtained results demonstrate the effectiveness of the proposed approach.
Koch, Michael S; DeSesso, John M; Williams, Amy Lavin; Michalek, Suzanne; Hammond, Bruce
2016-01-01
To determine the reliability of food safety studies carried out in rodents with genetically modified (GM) crops, a Food Safety Study Reliability Tool (FSSRTool) was adapted from the European Centre for the Validation of Alternative Methods' (ECVAM) ToxRTool. Reliability was defined as the inherent quality of the study with regard to use of standardized testing methodology, full documentation of experimental procedures and results, and the plausibility of the findings. Codex guidelines for GM crop safety evaluations indicate toxicology studies are not needed when comparability of the GM crop to its conventional counterpart has been demonstrated. This guidance notwithstanding, animal feeding studies have routinely been conducted with GM crops, but their conclusions on safety are not always consistent. To accurately evaluate potential risks from GM crops, risk assessors need clearly interpretable results from reliable studies. The development of the FSSRTool, which provides the user with a means of assessing the reliability of a toxicology study to inform risk assessment, is discussed. Its application to the body of literature on GM crop food safety studies demonstrates that reliable studies report no toxicologically relevant differences between rodents fed GM crops or their non-GM comparators.
Golden angle based scanning for robust corneal topography with OCT
Wagner, Joerg; Goldblum, David; Cattin, Philippe C.
2017-01-01
Corneal topography allows the assessment of the cornea’s refractive power which is crucial for diagnostics and surgical planning. The use of optical coherence tomography (OCT) for corneal topography is still limited. One limitation is the susceptibility to disturbances like blinking of the eye. This can result in partially corrupted scans that cannot be evaluated using common methods. We present a new scanning method for reliable corneal topography from partial scans. Based on the golden angle, the method features a balanced scan point distribution which refines over measurement time and remains balanced when part of the scan is removed. The performance of the method is assessed numerically and by measurements of test surfaces. The results confirm that the method enables numerically well-conditioned and reliable corneal topography from partially corrupted scans and reduces the need for repeated measurements in case of abrupt disturbances. PMID:28270961
Assessing Technical Competence in Surgical Trainees: A Systematic Review.
Szasz, Peter; Louridas, Marisa; Harris, Kenneth A; Aggarwal, Rajesh; Grantcharov, Teodor P
2015-06-01
To systematically examine the literature describing the methods by which technical competence is assessed in surgical trainees. The last decade has witnessed an evolution away from time-based surgical education. In response, governing bodies worldwide have implemented competency-based education paradigms. The definition of competence, however, remains elusive, and the impact of these education initiatives in terms of assessment methods remains unclear. A systematic review examining the methods by which technical competence is assessed was conducted by searching MEDLINE, EMBASE, PsychINFO, and the Cochrane database of systematic reviews. Abstracts of retrieved studies were reviewed and those meeting inclusion criteria were selected for full review. Data were retrieved in a systematic manner, the validity and reliability of the assessment methods was evaluated, and quality was assessed using the Grading of Recommendations Assessment, Development and Evaluation classification. Of the 6814 studies identified, 85 studies involving 2369 surgical residents were included in this review. The methods used to assess technical competence were categorized into 5 groups; Likert scales (37), benchmarks (31), binary outcomes (11), novel tools (4), and surrogate outcomes (2). Their validity and reliability were mostly previously established. The overall Grading of Recommendations Assessment, Development and Evaluation for randomized controlled trials was high and low for the observational studies. The definition of technical competence continues to be debated within the medical literature. The methods used to evaluate technical competence predominantly include instruments that were originally created to assess technical skill. Very few studies identify standard setting approaches that differentiate competent versus noncompetent performers; subsequently, this has been identified as an area with great research potential.
Azari, Nadia; Soleimani, Farin; Vameghi, Roshanak; Sajedi, Firoozeh; Shahshahani, Soheila; Karimi, Hossein; Kraskian, Adis; Shahrokhi, Amin; Teymouri, Robab; Gharib, Masoud
2017-01-01
Bayley Scales of infant & toddler development is a well-known diagnostic developmental assessment tool for children aged 1-42 months. Our aim was investigating the validity & reliability of this scale in Persian speaking children. The method was descriptive-analytic. Translation- back translation and cultural adaptation was done. Content & face validity of translated scale was determined by experts' opinions. Overall, 403 children aged 1 to 42 months were recruited from health centers of Tehran, during years of 2013-2014 for developmental assessment in cognitive, communicative (receptive & expressive) and motor (fine & gross) domains. Reliability of scale was calculated through three methods; internal consistency using Cronbach's alpha coefficient, test-retest and interrater methods. Construct validity was calculated using factor analysis and comparison of the mean scores methods. Cultural and linguistic changes were made in items of all domains especially on communication subscale. Content and face validity of the test were approved by experts' opinions. Cronbach's alpha coefficient was above 0.74 in all domains. Pearson correlation coefficient in various domains, were ≥ 0.982 in test retest method, and ≥0.993 in inter-rater method. Construct validity of the test was approved by factor analysis. Moreover, the mean scores for the different age groups were compared and statistically significant differences were observed between mean scores of different age groups, that confirms validity of the test. The Bayley Scales of Infant and Toddler Development is a valid and reliable tool for child developmental assessment in Persian language children.
Structured assessment of microsurgery skills in the clinical setting.
Chan, WoanYi; Niranjan, Niri; Ramakrishnan, Venkat
2010-08-01
Microsurgery is an essential component in plastic surgery training. Competence has become an important issue in current surgical practice and training. The complexity of microsurgery requires detailed assessment and feedback on skills components. This article proposes a method of Structured Assessment of Microsurgery Skills (SAMS) in a clinical setting. Three types of assessment (i.e., modified Global Rating Score, errors list and summative rating) were incorporated to develop the SAMS method. Clinical anastomoses were recorded on videos using a digital microscope system and were rated by three consultants independently and in a blinded fashion. Fifteen clinical cases of microvascular anastomoses performed by trainees and a consultant microsurgeon were assessed using SAMS. The consultant had consistently the highest scores. Construct validity was also demonstrated by improvement of SAMS scores of microsurgery trainees. The overall inter-rater reliability was strong (alpha=0.78). The SAMS method provides both formative and summative assessment of microsurgery skills. It is demonstrated to be a valid, reliable and feasible assessment tool of operating room performance to provide systematic and comprehensive feedback as part of the learning cycle. Copyright 2009 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. All rights reserved.
Tomita, Andrew; Kandolo, Ka Muzombo; Susser, Ezra; Burns, Jonathan K
2016-01-01
Few studies in developing nations have assessed the use of short messaging services (SMS) to identify psychological challenges in refugee populations. This study aimed to assess the feasibility of SMS-based methods to screen for depression risk among refugees in South Africa attending mental health services, and to compare its reliability and acceptability with face-to-face consultation. Of the 153 refugees enrolled at baseline, 135 were available for follow-up assessments in our cohort study. Depression symptomatology was assessed using the 16-item Quick Inventory of Depressive Symptomatology (QIDS) instrument. Nearly everyone possessed a mobile phone and utilized SMS. Furthermore, low incomplete item response in QIDS and high perceived ease of interacting via SMS with service providers supported the feasibility of this method. There was a fair level of reliability between face-to-face and SMS-based screening methods, but no significant difference in preference rating between the two methods. Despite potential implementation barriers (network delay/phone theft), depression screening using SMS may be viable for refugee mental health services in low-resource settings. PMID:26407989
Adigozali, Hakimeh; Shadmehr, Azadeh; Ebrahimi, Esmail; Rezasoltani, Asghar; Naderi, Farrokh
2017-01-01
In the present study, the intra-rater reliability of upper trapezius morphology, its mechanical properties and intramuscular blood circulation in females with myofascial pain syndrome were assessed using ultrasonography. A total of 37 patients (31.05 ± 10 years old) participated in this study. Ultrasonography producer was set up in three stages: a) Gray-scale: to measure muscle thickness, size and area of trigger points; b) Ultrasound elastography: to measure muscle stiffness; and c) Doppler imaging: to assess blood flow indices. According to data analysis, all variables, except End Diastolic Velocity (EDV), had excellent reliability (>0.806). Intra-class Correlation Coefficient (ICC) for EDV was 0.738, which was considered a poor to good reliability. The results of this study introduced a reliable method for developing details of upper trapezius features using muscular ultrasonography in female patients. These variables could be used for objective examination and provide guidelines for treatment plans in clinical settings. Copyright © 2016 Elsevier Ltd. All rights reserved.
Developing and validating a nutrition knowledge questionnaire: key methods and considerations.
Trakman, Gina Louise; Forsyth, Adrienne; Hoye, Russell; Belski, Regina
2017-10-01
To outline key statistical considerations and detailed methodologies for the development and evaluation of a valid and reliable nutrition knowledge questionnaire. Literature on questionnaire development in a range of fields was reviewed and a set of evidence-based guidelines specific to the creation of a nutrition knowledge questionnaire have been developed. The recommendations describe key qualitative methods and statistical considerations, and include relevant examples from previous papers and existing nutrition knowledge questionnaires. Where details have been omitted for the sake of brevity, the reader has been directed to suitable references. We recommend an eight-step methodology for nutrition knowledge questionnaire development as follows: (i) definition of the construct and development of a test plan; (ii) generation of the item pool; (iii) choice of the scoring system and response format; (iv) assessment of content validity; (v) assessment of face validity; (vi) purification of the scale using item analysis, including item characteristics, difficulty and discrimination; (vii) evaluation of the scale including its factor structure and internal reliability, or Rasch analysis, including assessment of dimensionality and internal reliability; and (viii) gathering of data to re-examine the questionnaire's properties, assess temporal stability and confirm construct validity. Several of these methods have previously been overlooked. The measurement of nutrition knowledge is an important consideration for individuals working in the nutrition field. Improved methods in the development of nutrition knowledge questionnaires, such as the use of factor analysis or Rasch analysis, will enable more confidence in reported measures of nutrition knowledge.
Reliability and validity of the Microsoft Kinect for evaluating static foot posture
2013-01-01
Background The evaluation of foot posture in a clinical setting is useful to screen for potential injury, however disagreement remains as to which method has the greatest clinical utility. An inexpensive and widely available imaging system, the Microsoft Kinect™, may possess the characteristics to objectively evaluate static foot posture in a clinical setting with high accuracy. The aim of this study was to assess the intra-rater reliability and validity of this system for assessing static foot posture. Methods Three measures were used to assess static foot posture; traditional visual observation using the Foot Posture Index (FPI), a 3D motion analysis (3DMA) system and software designed to collect and analyse image and depth data from the Kinect. Spearman’s rho was used to assess intra-rater reliability and concurrent validity of the Kinect to evaluate foot posture, and a linear regression was used to examine the ability of the Kinect to predict total visual FPI score. Results The Kinect demonstrated moderate to good intra-rater reliability for four FPI items of foot posture (ρ = 0.62 to 0.78) and moderate to good correlations with the 3DMA system for four items of foot posture (ρ = 0.51 to 0.85). In contrast, intra-rater reliability of visual FPI items was poor to moderate (ρ = 0.17 to 0.63), and correlations with the Kinect and 3DMA systems were poor (absolute ρ = 0.01 to 0.44). Kinect FPI items with moderate to good reliability predicted 61% of the variance in total visual FPI score. Conclusions The majority of the foot posture items derived using the Kinect were more reliable than the traditional visual assessment of FPI, and were valid when compared to a 3DMA system. Individual foot posture items recorded using the Kinect were also shown to predict a moderate degree of variance in the total visual FPI score. Combined, these results support the future potential of the Kinect to accurately evaluate static foot posture in a clinical setting. PMID:23566934
Reliability of pubertal maturation self-assessment in a school-based survey.
Jaruratanasirikul, Somchit; Kreetapirom, Piyawut; Tassanakijpanich, Nattaporn; Sriplung, Hutcha
2015-03-01
To assess the reliability of pubertal self-assessment of Thai adolescents. Some 927 girls and 997 boys, aged 8-18 years, from nine schools in Hat-Yai municipality. The adolescents evaluated their pubertal status after being shown a line drawing of the five Tanner stages with a short description. Girls assessed their breast and pubic hair development, and boys assessed their pubic hair development. The pubertal self-assessments were compared to pubertal assessments made by a pediatrician who examined the children after their self-assessment. Kappa coefficient and percent agreement were used for statistical analysis. The percent agreement of breast and pubic hair development between the girl's self-assessments and the assessments by the pediatrician were 60.8% and 78%, respectively. Kappa coefficient for breast assessment was 0.50 (95% confidence interval, CI 0.47-0.53) and for pubic hair 0.68 (95% CI 0.65-0.72). Nearly 30% of girls aged younger than 10 years overestimated their breast development status while 45% of girls aged over 14 years underestimated their breast development (p<0.001). For boys, the percent agreement of pubic hair development between the adolescents and the pediatrician was 76.4%, with a weighted kappa coefficient of 0.68 (95% CI 0.65-0.72). Pubertal self-assessment using line drawings with a short description can be used as a reliable method to assess pubic hair maturation in boys and girls, but can be used with less reliability to assess the breast maturation in young girls.
Chang, Jasper O; Levy, Susan S; Seay, Seth W; Goble, Daniel J
2014-05-01
Recent guidelines advocate sports medicine professionals to use balance tests to assess sensorimotor status in the management of concussions. The present study sought to determine whether a low-cost balance board could provide a valid, reliable, and objective means of performing this balance testing. Criterion validity testing relative to a gold standard and 7 day test-retest reliability. University biomechanics laboratory. Thirty healthy young adults. Balance ability was assessed on 2 days separated by 1 week using (1) a gold standard measure (ie, scientific grade force plate), (2) a low-cost Nintendo Wii Balance Board (WBB), and (3) the Balance Error Scoring System (BESS). Validity of the WBB center of pressure path length and BESS scores were determined relative to the force plate data. Test-retest reliability was established based on intraclass correlation coefficients. Composite scores for the WBB had excellent validity (r = 0.99) and test-retest reliability (R = 0.88). Both the validity (r = 0.10-0.52) and test-retest reliability (r = 0.61-0.78) were lower for the BESS. These findings demonstrate that a low-cost balance board can provide improved balance testing accuracy/reliability compared with the BESS. This approach provides a potentially more valid/reliable, yet affordable, means of assessing sports-related concussion compared with current methods.
Evidence-based dentistry: analysis of dental anxiety scales for children.
Al-Namankany, A; de Souza, M; Ashley, P
2012-03-09
To review paediatric dental anxiety measures (DAMs) and assess the statistical methods used for validation and their clinical implications. A search of four computerised databases between 1960 and January 2011 associated with DAMs, using pre-specified search terms, to assess the method of validation including the reliability as intra-observer agreement 'repeatability or stability' and inter-observer agreement 'reproducibility' and all types of validity. Fourteen paediatric DAMs were predominantly validated in schools and not in the clinical setting while five of the DAMs were not validated at all. The DAMs that were validated were done so against other paediatric DAMs which may not have been validated previously. Reliability was not assessed in four of the DAMs. However, all of the validated studies assessed reliability which was usually 'good' or 'acceptable'. None of the current DAMs used a formal sample size technique. Diversity was seen between the studies ranging from a few simple pictograms to lists of questions reported by either the individual or an observer. To date there is no scale that can be considered as a gold standard, and there is a need to further develop an anxiety scale with a cognitive component for children and adolescents.
Scott, Tannath J; Black, Cameron R; Quinn, John; Coutts, Aaron J
2013-01-01
The purpose of this study was to examine and compare the criterion validity and test-retest reliability of the CR10 and CR100 rating of perceived exertion (RPE) scales for team sport athletes that undertake high-intensity, intermittent exercise. Twenty-one male Australian football (AF) players (age: 19.0 ± 1.8 years, body mass: 83.92 ± 7.88 kg) participated the first part (part A) of this study, which examined the construct validity of the session-RPE (sRPE) method for quantifying training load in AF. Ten male athletes (age: 16.1 ± 0.5 years) participated in the second part of the study (part B), which compared the test-retest reliability of the CR10 and CR100 RPE scales. In part A, the validity of the sRPE method was assessed by examining the relationships between sRPE, and objective measures of internal (i.e., heart rate) and external training load (i.e., distance traveled), collected from AF training sessions. Part B of the study assessed the reliability of sRPE through examining the test-retest reliability of sRPE during 3 different intensities of controlled intermittent running (10, 11.5, and 13 km·h(-1)). Results from part A demonstrated strong correlations for CR10- and CR100-derived sRPE with measures of internal training load (Banisters TRIMP and Edwards TRIMP) (CR10: r = 0.83 and 0.83, and CR100: r = 0.80 and 0.81, p < 0.05). Correlations between sRPE and external training load (distance, higher speed running and player load) for both the CR10 (r = 0.81, 0.71, and 0.83) and CR100 (r = 0.78, 0.69, and 0.80) were significant (p < 0.05). Results from part B demonstrated poor reliability for both the CR10 (31.9% CV) and CR100 (38.6% CV) RPE scales after short bouts of intermittent running. Collectively, these results suggest both CR10- and CR100-derived sRPE methods have good construct validity for assessing training load in AF. The poor levels of reliability revealed under field testing indicate that the sRPE method may not be sensible to detecting small changes in exercise intensity during brief intermittent running bouts. Despite this limitation, the sRPE remains a valid method to quantify training loads in high-intensity, intermittent team sport.
Values of a Patient and Observer Scar Assessment Scale to Evaluate the Facial Skin Graft Scar
Chae, Jin Kyung; Kim, Eun Jung; Park, Kun
2016-01-01
Background The patient and observer scar assessment scale (POSAS) recently emerged as a promising method, reflecting both observer's and patient's opinions in evaluating scar. This tool was shown to be consistent and reliable in burn scar assessment, but it has not been tested in the setting of skin graft scar in skin cancer patients. Objective To evaluate facial skin graft scar applied to POSAS and to compare with objective scar assessment tools. Methods Twenty three patients, who diagnosed with facial cutaneous malignancy and transplanted skin after Mohs micrographic surgery, were recruited. Observer assessment was performed by three independent rates using the observer component of the POSAS and Vancouver scar scale (VSS). Patient self-assessment was performed using the patient component of the POSAS. To quantify scar color and scar thickness more objectively, spectrophotometer and ultrasonography was applied. Results Inter-observer reliability was substantial with both VSS and the observer component of the POSAS (average measure intraclass coefficient correlation, 0.76 and 0.80, respectively). The observer component consistently showed significant correlations with patients' ratings for the parameters of the POSAS (all p-values<0.05). The correlation between subjective assessment using POSAS and objective assessment using spectrophotometer and ultrasonography showed low relationship. Conclusion In facial skin graft scar assessment in skin cancer patients, the POSAS showed acceptable inter-observer reliability. This tool was more comprehensive and had higher correlation with patient's opinion. PMID:27746642
NASA Astrophysics Data System (ADS)
Sil, Arjun; Longmailai, Thaihamdau
2017-09-01
The lateral displacement of Reinforced Concrete (RC) frame building during an earthquake has an important impact on the structural stability and integrity. However, seismic analysis and design of RC building needs more concern due to its complex behavior as the performance of the structure links to the features of the system having many influencing parameters and other inherent uncertainties. The reliability approach takes into account the factors and uncertainty in design influencing the performance or response of the structure in which the safety level or the probability of failure could be ascertained. This present study, aims to assess the reliability of seismic performance of a four storey residential RC building seismically located in Zone-V as per the code provisions given in the Indian Standards IS: 1893-2002. The reliability assessment performed by deriving an explicit expression for maximum roof-lateral displacement as a failure function by regression method. A total of 319, four storey RC buildings were analyzed by linear static method using SAP2000. However, the change in the lateral-roof displacement with the variation of the parameters (column dimension, beam dimension, grade of concrete, floor height and total weight of the structure) was observed. A generalized relation established by regression method which could be used to estimate the expected lateral displacement owing to those selected parameters. A comparison made between the displacements obtained from analysis with that of the equation so formed. However, it shows that the proposed relation could be used directly to determine the expected maximum lateral displacement. The data obtained from the statistical computations was then used to obtain the probability of failure and the reliability.
ERIC Educational Resources Information Center
O'Hare, Thomas; Sherrer, Margaret V.; LaButti, Annamaria; Emrick, Kelly
2004-01-01
Objective/Method: The use of brief, reliable, valid, and practical measures of substance use is critical for conducting individual assessments and program evaluation for integrated mental health-substance abuse services for persons with serious mental illness. This investigation examines the internal consistency reliability, concurrent validity,…
NASA Astrophysics Data System (ADS)
Hancock, G. R.; Webb, A. A.; Turner, L.
2017-11-01
Sediment transport and soil erosion can be determined by a variety of field and modelling approaches. Computer based soil erosion and landscape evolution models (LEMs) offer the potential to be reliable assessment and prediction tools. An advantage of such models is that they provide both erosion and deposition patterns as well as total catchment sediment output. However, before use, like all models they require calibration and validation. In recent years LEMs have been used for a variety of both natural and disturbed landscape assessment. However, these models have not been evaluated for their reliability in steep forested catchments. Here, the SIBERIA LEM is calibrated and evaluated for its reliability for two steep forested catchments in south-eastern Australia. The model is independently calibrated using two methods. Firstly, hydrology and sediment transport parameters are inferred from catchment geomorphology and soil properties and secondly from catchment sediment transport and discharge data. The results demonstrate that both calibration methods provide similar parameters and reliable modelled sediment transport output. A sensitivity study of the input parameters demonstrates the model's sensitivity to correct parameterisation and also how the model could be used to assess potential timber harvesting as well as the removal of vegetation by fire.
Krejsa, Martin; Janas, Petr; Yilmaz, Işık; Marschalko, Marian; Bouchal, Tomas
2013-01-01
The load-carrying system of each construction should fulfill several conditions which represent reliable criteria in the assessment procedure. It is the theory of structural reliability which determines probability of keeping required properties of constructions. Using this theory, it is possible to apply probabilistic computations based on the probability theory and mathematic statistics. Development of those methods has become more and more popular; it is used, in particular, in designs of load-carrying structures with the required level or reliability when at least some input variables in the design are random. The objective of this paper is to indicate the current scope which might be covered by the new method—Direct Optimized Probabilistic Calculation (DOProC) in assessments of reliability of load-carrying structures. DOProC uses a purely numerical approach without any simulation techniques. This provides more accurate solutions to probabilistic tasks, and, in some cases, such approach results in considerably faster completion of computations. DOProC can be used to solve efficiently a number of probabilistic computations. A very good sphere of application for DOProC is the assessment of the bolt reinforcement in the underground and mining workings. For the purposes above, a special software application—“Anchor”—has been developed. PMID:23935412
Kang, Edith; Fields, Henry W; Cornett, Sandy; Beck, F Michael
2005-01-01
The purpose of this study was to determine the appropriateness of nationally available dental information materials according to the suitability assessment of materials (SAM) method. Clinically related, professionally produced patient dental health education materials (N=22) provided by the American Academy of Pediatric Dentistry (AAPD) were evaluated using the SAM method that had previously been judged valid and reliable. A rater was trained by an experienced health literacy evaluator to establish validity. The rater then rated all materials for 5 categories of assessment (content, literacy demand, graphics, layout and typography, and learning stimulation/motivation) and an overall assessment, and repeated 5 materials to establish intrarater reliability. When compared to the experienced rater, the validity was K=0.43. The reliability was established for all ratings as K=0.52. The consistently weakest categories were content, graphics, and learning stimulation, while reading level as part of literacy demand was often not suitable. The overall suitability of the AAPD materials was generally classified as superior. Reliable and valid evaluation of available dental patient information materials can be accomplished. The materials were largely superior. There is great variability within the categories of evaluation. The categories of content, graphics, and learning stimulation require attention and could raise the overall quality of the materials.
The development and testing of a qualitative instrument designed to assess critical thinking
NASA Astrophysics Data System (ADS)
Clauson, Cynthia Louisa
This study examined a qualitative approach to assess critical thinking. An instrument was developed that incorporates an assessment process based on Dewey's (1933) concepts of self-reflection and critical thinking as problem solving. The study was designed to pilot test the critical thinking assessment process with writing samples collected from a heterogeneous group of students. The pilot test included two phases. Phase 1 was designed to determine the validity and inter-rater reliability of the instrument using two experts in critical thinking, problem solving, and literacy development. Validity of the instrument was addressed by requesting both experts to respond to ten questions in an interview. The inter-rater reliability was assessed by analyzing the consistency of the two experts' scorings of the 20 writing samples to each other, as well as to my scoring of the same 20 writing samples. Statistical analyses included the Spearman Rho and the Kuder-Richardson (Formula 20). Phase 2 was designed to determine the validity and reliability of the critical thinking assessment process with seven science teachers. Validity was addressed by requesting the teachers to respond to ten questions in a survey and interview. Inter-rater reliability was addressed by comparing the seven teachers' scoring of five writing samples with my scoring of the same five writing samples. Again, the Spearman Rho and the Kuder-Richardson (Formula 20) were used to determine the inter-rater reliability. The validity results suggest that the instrument is helpful as a guide for instruction and provides a systematic method to teach and assess critical thinking while problem solving with students in the classroom. The reliability results show the critical thinking assessment instrument to possess fairly high reliability when used by the experts, but weak reliability when used by classroom teachers. A major conclusion was drawn that teachers, as well as students, would need to receive instruction in critical thinking and in how to use the assessment process in order to gain more consistent interpretations of the six problem-solving steps. Specific changes needing to be made in the instrument to improve the quality are included.
Ashnagar, Zinat; Hadian, Mohammad Reza; Olyaei, Gholamreza; Talebian Moghadam, Saeed; Rezasoltani, Asghar; Saeedi, Hassan; Yekaninejad, Mir Saeed; Mahmoodi, Rahimeh
2017-07-01
The aim of this study was to investigate the intratester reliability of digital photographic method for quantifying static lower extremity alignment in individuals with flatfeet and normal feet types. Thirteen females with flexible flatfeet and nine females with normal feet types were recruited from university communities. Reflective markers were attached over the participant's body landmarks. Frontal and sagittal plane photographs were taken while the participants were in a standardized standing position. The markers were removed and after 30 min the same procedure was repeated. Pelvic angle, quadriceps angle, tibiofemoral angle, genu recurvatum, femur length and tibia length were measured from photographs using the Image j software. All measured variables demonstrated good to excellent intratester reliability using digital photography in both flatfeet (ICC: 0.79-0.93) and normal feet type (ICC: 0.84-0.97) groups. The findings of the current study indicate that digital photography is a highly reliable method of measurement for assessing lower extremity alignment in both flatfeet and normal feet type groups. Copyright © 2016. Published by Elsevier Ltd.
First-order reliability application and verification methods for semistatic structures
NASA Astrophysics Data System (ADS)
Verderaime, V.
1994-11-01
Escalating risks of aerostructures stimulated by increasing size, complexity, and cost should no longer be ignored in conventional deterministic safety design methods. The deterministic pass-fail concept is incompatible with probability and risk assessments; stress audits are shown to be arbitrary and incomplete, and the concept compromises the performance of high-strength materials. A reliability method is proposed that combines first-order reliability principles with deterministic design variables and conventional test techniques to surmount current deterministic stress design and audit deficiencies. Accumulative and propagation design uncertainty errors are defined and appropriately implemented into the classical safety-index expression. The application is reduced to solving for a design factor that satisfies the specified reliability and compensates for uncertainty errors, and then using this design factor as, and instead of, the conventional safety factor in stress analyses. The resulting method is consistent with current analytical skills and verification practices, the culture of most designers, and the development of semistatic structural designs.
Foster, J D; Miskovic, D; Allison, A S; Conti, J A; Ockrim, J; Cooper, E J; Hanna, G B; Francis, N K
2016-06-01
Laparoscopic rectal resection is technically challenging, with outcomes dependent upon technical performance. No robust objective assessment tool exists for laparoscopic rectal resection surgery. This study aimed to investigate the application of the objective clinical human reliability analysis (OCHRA) technique for assessing technical performance of laparoscopic rectal surgery and explore the validity and reliability of this technique. Laparoscopic rectal cancer resection operations were described in the format of a hierarchical task analysis. Potential technical errors were defined. The OCHRA technique was used to identify technical errors enacted in videos of twenty consecutive laparoscopic rectal cancer resection operations from a single site. The procedural task, spatial location, and circumstances of all identified errors were logged. Clinical validity was assessed through correlation with clinical outcomes; reliability was assessed by test-retest. A total of 335 execution errors identified, with a median 15 per operation. More errors were observed during pelvic tasks compared with abdominal tasks (p < 0.001). Within the pelvis, more errors were observed during dissection on the right side than the left (p = 0.03). Test-retest confirmed reliability (r = 0.97, p < 0.001). A significant correlation was observed between error frequency and mesorectal specimen quality (r s = 0.52, p = 0.02) and with blood loss (r s = 0.609, p = 0.004). OCHRA offers a valid and reliable method for evaluating technical performance of laparoscopic rectal surgery.
Developing a tool for assessing competency in root cause analysis.
Gupta, Priyanka; Varkey, Prathibha
2009-01-01
Root cause analysis (RCA) is a tool for identifying the key cause(s) contributing to a sentinel event or near miss. Although training in RCA is gaining popularity in medical education, there is no published literature on valid or reliable methods for assessing competency in the same. A tool for assessing competency in RCA was pilot tested as part of an eight-station Objective Structured Clinical Examination that was conducted at the completion of a three-week quality improvement (QI) curriculum for the Mayo Clinic Preventive Medicine and Endocrinology fellowship programs. As part of the curriculum, fellows completed a QI project to enhance physician communication of the diagnosis and treatment plan at the end of a patient visit. They had a didactic session on RCA, followed by process mapping of the information flow at the project clinic, after which fellows conducted an actual RCA using the Ishikawa fishbone diagram. For the RCA competency assessment, fellows performed an RCA regarding a scenario describing an adverse medication event and provided possible solutions to prevent such errors in the future. All faculty strongly agreed or agreed that they were able to accurately assess competency in RCA using the tool. Interrater reliability for the global competency rating and checklist scoring were 0.96 and 0.85, respectively. Internal consistency (Cronbach's alpha) was 0.76. Six of eight of the fellows found the difficulty level of the test to be optimal. Assessment methods must accompany education programs to ensure that graduates are competent in QI methodologies and are able to apply them effectively in the workplace. The RCA assessment tool was found to be a valid, reliable, feasible, and acceptable method for assessing competency in RCA. Further research is needed to examine its predictive validity and generalizability.
Stolinski, L; Kozinoga, M; Czaprowski, D; Tyrakowski, M; Cerny, P; Suzuki, N; Kotwicki, T
2017-01-01
Digital photogrammetry provides measurements of body angles or distances which allow for quantitative posture assessment with or without the use of external markers. It is becoming an increasingly popular tool for the assessment of the musculoskeletal system. The aim of this paper is to present a structured method for the analysis of posture and its changes using a standardized digital photography technique. The purpose of the study was twofold. The first one comprised 91 children (44 girls and 47 boys) aged 7-10 (8.2 ± 1.0), i.e., students of primary school, and its aim was to develop the photographic method, choose the quantitative parameters, and determine the intraobserver reliability (repeatability) along with the interobserver reliability (reproducibility) measurements in sagittal plane using digital photography, as well as to compare the Rippstein plurimeter and digital photography measurements. The second one involved 7782 children (3804 girls, 3978 boys) aged 7-10 (8.4 ± 0.5), who underwent digital photography postural screening. The methods consisted in measuring and calculating selected parameters, establishing the normal ranges of photographic parameters, presenting percentile charts, as well as noticing common pitfalls and possible sources of errors in digital photography. A standardized procedure for the photographic evaluation of child body posture was presented. The photographic measurements revealed very good intra- and inter-rater reliability regarding the five sagittal parameters and good reliability performed against Rippstein plurimeter measurements. The parameters displayed insignificant variability over time. Normative data were calculated based on photographic assessment, while the percentile charts were provided to serve as reference values. The technical errors observed during photogrammetry are carefully discussed in this article. Technical developments are allowed for the regular use of digital photogrammetry in body posture assessment. Specific child positioning (described above) enables us to avoid incidentally modified posture. Image registration is simple, quick, harmless, and cost-effective. The semi-automatic image analysis, together with the normal values and percentile charts, makes the technique reliable in terms of child's posture documentation and corrective therapy effects' monitoring.
Jabbour, Noel; Sidman, James
2011-10-01
There has been an increasing interest in assessment of technical skills in most medical and surgical disciplines. Many of these assessments involve microscopy or endoscopy and are thus amenable to video recording for post hoc review. An ideal skills assessment video would provide the reviewer with a simultaneous view of the examinee's instrument handling and the operative field. Ideally, a reviewer should be blinded to the identity of the examinee and whether the assessment was performed as a pretest or posttest examination, when given in conjunction with an educational intervention. We describe a simple method for reliably creating deidentified, multicamera, time-synced videos, which may be used in technical skills assessments. We pilot tested this method in a pediatric airway endoscopy Objective Assessment of Technical Skills (OSATS). Total video length was compared with the OSATS administration time. Thirty-nine OSATS were administered. There were no errors encountered in time-syncing the videos using this method. Mean duration of OSATS videos was 11 minutes and 20 seconds, which was significantly less than the time needed for an expert to be present at the administration of each 30-minute OSATS (P < 0.001). The described method for creating time-synced, multicamera skills assessment videos is reliable and may be used in endosurgical or microsurgical skills assessments. Compared with live review, post hoc video review using this method can save valuable expert reviewer time. Most importantly, this method allows a reviewer to simultaneously evaluate an examinee's instrument handling and the operative field while being blinded to the examinee's identity and timing of examination administration.
Ball, Sarah C; Benjamin, Sara E; Ward, Dianne S
2007-04-01
To our knowledge, a direct observation protocol for assessing dietary intake among young children in child care has not been published. This article reviews the development and testing of a diet observation system for child care facilities that occurred during a larger intervention trial. Development of this system was divided into five phases, done in conjunction with a larger intervention study; (a) protocol development, (b) training of field staff, (c) certification of field staff in a laboratory setting, (d) implementation in a child-care setting, and (e) certification of field staff in a child-care setting. During the certification phases, methods were used to assess the accuracy and reliability of all observers at estimating types and amounts of food and beverages commonly served in child care. Tests of agreement show strong agreement among five observers, as well as strong accuracy between the observers and 20 measured portions of foods and beverages with a mean intraclass correlation coefficient value of 0.99. This structured observation system shows promise as a valid and reliable approach for assessing dietary intake of children in child care and makes a valuable contribution to the growing body of literature on the dietary assessment of young children.
Chang, Wen-Dien; Chang, Wan-Yi; Lee, Chia-Lun; Feng, Chi-Yen
2013-01-01
[Purpose] Balance is an integral part of human ability. The smart balance master system (SBM) is a balance test instrument with good reliability and validity, but it is expensive. Therefore, we modified a Wii Fit balance board, which is a convenient balance assessment tool, and analyzed its reliability and validity. [Subjects and Methods] We recruited 20 healthy young adults and 20 elderly people, and administered 3 balance tests. The correlation coefficient and intraclass correlation of both instruments were analyzed. [Results] There were no statistically significant differences in the 3 tests between the Wii Fit balance board and the SBM. The Wii Fit balance board had a good intraclass correlation (0.86–0.99) for the elderly people and positive correlations (r = 0.58–0.86) with the SBM. [Conclusions] The Wii Fit balance board is a balance assessment tool with good reliability and high validity for elderly people, and we recommend it as an alternative tool for assessing balance ability. PMID:24259769
Teaching acute care nurses cognitive assessment using LOCFAS: what's the best method?
Flannery, J; Land, K
2001-02-01
The Levels of Cognitive Functioning Assessment Scale (LOCFAS) is a behavioral checklist used by nurses in the acute care setting to assess the level of cognitive functioning in severely brain-injured patients in the early post-trauma period. Previous research studies have supported the reliability and validity of LOCFAS. For LOCFAS to become a more firmly established method of cognitive assessment, nurses must become familiar with and proficient in the use of this instrument. The purpose of this study was to find the most effective method of instruction by comparing three methods: a self-directed manual, a teaching video, and a classroom presentation. Videotaped vignettes of actual brain-injured patients were presented at the end of each training session, and participants were required to categorize these videotaped patients by using LOCFAS. High levels of reliability were observed for both the self-directed manual group and the teaching video group, but an overall lower level of reliability was observed for the classroom presentation group. Examination of the accuracy of overall LOCFAS ratings revealed a significant difference for instructional groups; the accuracy of the classroom presentation group was significantly lower than that of either the self-directed manual group or the teaching video group. The three instructional groups also differed on the average accuracy of ratings of the individual behaviors; the accuracy of the classroom presentation group was significantly lower than that of the teaching video group, whereas the self-directed manual group fell in between. Nurses also rated the instructional methods across a number of evaluative dimensions on a 5-point Likert-type scale. Evaluative statements ranged from average to good, with no significant differences among instructional methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chu, Tsong-Lun; Varuttamaseni, Athi; Baek, Joo-Seok
The U.S. Nuclear Regulatory Commission (NRC) encourages the use of probabilistic risk assessment (PRA) technology in all regulatory matters, to the extent supported by the state-of-the-art in PRA methods and data. Although much has been accomplished in the area of risk-informed regulation, risk assessment for digital systems has not been fully developed. The NRC established a plan for research on digital systems to identify and develop methods, analytical tools, and regulatory guidance for (1) including models of digital systems in the PRAs of nuclear power plants (NPPs), and (2) incorporating digital systems in the NRC's risk-informed licensing and oversight activities.more » Under NRC's sponsorship, Brookhaven National Laboratory (BNL) explored approaches for addressing the failures of digital instrumentation and control (I and C) systems in the current NPP PRA framework. Specific areas investigated included PRA modeling digital hardware, development of a philosophical basis for defining software failure, and identification of desirable attributes of quantitative software reliability methods. Based on the earlier research, statistical testing is considered a promising method for quantifying software reliability. This paper describes a statistical software testing approach for quantifying software reliability and applies it to the loop-operating control system (LOCS) of an experimental loop of the Advanced Test Reactor (ATR) at Idaho National Laboratory (INL).« less
Electronic Quality of Life Assessment Using Computer-Adaptive Testing
2016-01-01
Background Quality of life (QoL) questionnaires are desirable for clinical practice but can be time-consuming to administer and interpret, making their widespread adoption difficult. Objective Our aim was to assess the performance of the World Health Organization Quality of Life (WHOQOL)-100 questionnaire as four item banks to facilitate adaptive testing using simulated computer adaptive tests (CATs) for physical, psychological, social, and environmental QoL. Methods We used data from the UK WHOQOL-100 questionnaire (N=320) to calibrate item banks using item response theory, which included psychometric assessments of differential item functioning, local dependency, unidimensionality, and reliability. We simulated CATs to assess the number of items administered before prespecified levels of reliability was met. Results The item banks (40 items) all displayed good model fit (P>.01) and were unidimensional (fewer than 5% of t tests significant), reliable (Person Separation Index>.70), and free from differential item functioning (no significant analysis of variance interaction) or local dependency (residual correlations < +.20). When matched for reliability, the item banks were between 45% and 75% shorter than paper-based WHOQOL measures. Across the four domains, a high standard of reliability (alpha>.90) could be gained with a median of 9 items. Conclusions Using CAT, simulated assessments were as reliable as paper-based forms of the WHOQOL with a fraction of the number of items. These properties suggest that these item banks are suitable for computerized adaptive assessment. These item banks have the potential for international development using existing alternative language versions of the WHOQOL items. PMID:27694100
Collender, Philip A; Kirby, Amy E; Addiss, David G; Freeman, Matthew C; Remais, Justin V
2015-12-01
Limiting the environmental transmission of soil-transmitted helminths (STHs), which infect 1.5 billion people worldwide, will require sensitive, reliable, and cost-effective methods to detect and quantify STHs in the environment. We review the state-of-the-art of STH quantification in soil, biosolids, water, produce, and vegetation with regard to four major methodological issues: environmental sampling; recovery of STHs from environmental matrices; quantification of recovered STHs; and viability assessment of STH ova. We conclude that methods for sampling and recovering STHs require substantial advances to provide reliable measurements for STH control. Recent innovations in the use of automated image identification and developments in molecular genetic assays offer considerable promise for improving quantification and viability assessment. Copyright © 2015 Elsevier Ltd. All rights reserved.
Lindsley, Kristina; Li, Tianjing; Ssemanda, Elizabeth; Virgili, Gianni; Dickersin, Kay
2016-01-01
Topic Are existing systematic reviews of interventions for age-related macular degeneration incorporated into clinical practice guidelines? Clinical relevance High-quality systematic reviews should be used to underpin evidence-based clinical practice guidelines and clinical care. We have examined the reliability of systematic reviews of interventions for age-related macular degeneration (AMD) and described the main findings of reliable reviews in relation to clinical practice guidelines. Methods Eligible publications are systematic reviews of the effectiveness of treatment interventions for AMD. We searched a database of systematic reviews in eyes and vision and employed no language or date restrictions; the database is up-to-date as of May 6, 2014. Two authors independently screened records for eligibility and abstracted and assessed the characteristics and methods of each review. We classified reviews as “reliable” when they reported eligibility criteria, comprehensive searches, appraisal of methodological quality of included studies, appropriate statistical methods for meta-analysis, and conclusions based on results. We mapped treatment recommendations from the American Academy of Ophthalmology Preferred Practice Patterns (AAO PPP) for AMD to the identified systematic reviews and assessed whether any reliable systematic review was cited or could have been cited to support each treatment recommendation. Results Of 1,570 systematic reviews in our database, 47 met our inclusion criteria. Most of the systematic reviews targeted neovascular AMD and investigated anti-vascular endothelial growth factor (anti-VEGF) interventions, dietary supplements or photodynamic therapy. We classified over two-thirds (33/47) of the reports as reliable. The quality of reporting varied, with criteria for reliable reporting met more often for Cochrane reviews and for reviews whose authors disclosed conflicts of interest. Although most systematic reviews were reliable, anti-VEGF agents and photodynamic therapy were the only interventions identified as effective by reliable reviews. Of 35 treatment recommendations extracted from the AAO PPP, 15 could have been supported with reliable systematic reviews; however, only one recommendation had an accompanying intervention systematic review citation, which we assessed as a reliable systematic review. No reliable systematic review was identified for 20 treatment recommendations, highlighting areas of evidence gaps. Conclusions For AMD, reliable systematic reviews exist for many treatment recommendations in the AAO PPP and should be used to support these recommendations. We also identified areas where no high-level evidence exists. Mapping clinical practice guidelines to existing systematic reviews is one way to highlight areas where evidence generation or evidence synthesis is either available or needed. PMID:26804762
Rahmani, Azam; Merghati-Khoei, Effat; Moghadam-Banaem, Lida; Hajizadeh, Ebrahim; Hamdieh, Mostafa; Montazeri, Ali
2014-06-13
Premarital sexual behaviors are important issue for women's health. The present study was designed to develop and examine the psychometric properties of a scale in order to identify young women who are at greater risk of premarital sexual behavior. This was an exploratory mixed method investigation. Indeed, the study was conducted in two phases. In the first phase, qualitative methods (focus group discussion and individual interview) were applied to generate items and develop the questionnaire. In the second phase, psychometric properties (validity and reliability) of the questionnaire were assessed. In the first phase an item pool containing 53 statements related to premarital sexual behavior was generated. In the second phase item reduction was applied and the final version of the questionnaire containing 26 items was developed. The psychometric properties of this final version were assessed and the results showed that the instrument has a good structure, and reliability. The results from exploratory factory analysis indicated a 5-factor solution for the instrument that jointly accounted for the 57.4% of variance observed. The Cronbach's alpha coefficient for the instrument was found to be 0.87. This study provided a valid and reliable scale to identify premarital sexual behavior in young women. Assessment of premarital sexual behavior might help to improve women's sexual abstinence.
Alizadehkhaiyat, O; Fisher, A C; Kemp, G J; Frostick, S P
2007-08-01
The aetiology of tennis elbow has remained uncertain for more than a century. To examine muscle imbalance as a possible pathophysiological factor requires a reliable method of assessment. This paper describes the development of such a method and its performance in healthy subjects. We propose a combination of surface and fine-wire EMG of shoulder and forearm muscles and wrist strength measurements as a reliable tool for assessing muscle imbalance relevant to the pathophysiology of tennis elbow. Six healthy volunteers participated. EMG data were acquired at 50% maximal voluntary isometric contraction from five forearm muscles during grip and three shoulder muscles during external rotation and abduction, and analysed using normalized median frequency slope as a fatigue index. Wrist extension/flexion strength was measured using a purpose-built dynamometer. Significant negative slope of median frequency was found for all muscles, with good reproducibility, and no significant difference in slope between the different muscles of the shoulder and the wrist. (Amplitude slope showed high variability and was therefore unsuitable for this purpose.) Wrist flexion was 27+/-8% stronger than extension (mean+/-SEM, p=0.006). This is a reliable method for measuring muscle fatigue in forearm and shoulder. EMG and wrist strength studies together can be used for assessing and identifying the muscle balance in the wrist-forearm-shoulder chain.
ERIC Educational Resources Information Center
Han, Turgay; Huang, Jinyan
2017-01-01
Using generalizability (G-) theory and rater interviews as both quantitative and qualitative approaches, this study examined the impact of scoring methods (i.e., holistic versus analytic scoring) on the scoring variability and reliability of an EFL institutional writing assessment at a Turkish university. Ten raters were invited to rate 36…
The Q-Sort method: use in landscape assessment research and landscape planning
David G. Pitt; Ervin H. Zube
1979-01-01
The assessment of visual quality inherently involves the measurement of perceptual response to landscape. The Q-Sort Method is a psychometric technique which produces reliable and valid interval measurements of people's perceptions of landscape visual quality as depicted in photographs. It is readily understood by participants across a wide range of age groups and...
Al-Abassi, Abdulla Ahmed; Al Saadi, Azan Saleh; Ahmed, Faisal
2018-06-19
Intra-abdominal pressure (IAP) can be measured by several indirect methods; however, the urinary bladder is largely preferred. The aim of this study was to compare intra-bladder pressure (IBP) at different levels of IAPs and assess its reliability as an indirect method for IAP measurement. We compared IBP with IAP in twenty-one patients undergoing laparoscopic cholecystectomy under general anesthesia. Measurements were recorded at increasing levels of insufflation pressures to approximately 22 mmHg. Pearson's correlation coefficient was calculated to establish the relationship between the two pressure measurements and Bland-Altman analysis was used to assess the limits of agreement between the two methods of measurements. The urinary bladder pressures reflected well the pressures in the abdominal cavity. Pearson correlation coefficient showed a good correlation between the two measurement techniques (r = 0.966, p < 0.0001) and Bland-Altman analysis indicated that the 95% limits of agreement between the two methods ranged from - 2.83 to 2.64. This range is accepted both clinically and according to the recommendations of the World Society of Abdominal Compartment Syndrome (WSACS). Our study showed that IBP measurement is a simple, minimally invasive method that may reliably estimates IAP in patients placed in supine position. Measurements for pressures higher than 12 mmHg may be less reliable. When applied clinically, this should alert the clinician to take safety measures to avoid abdominal compartment syndrome (ACS).
AZARI, Nadia; SOLEIMANI, Farin; VAMEGHI, Roshanak; SAJEDI, Firoozeh; SHAHSHAHANI, Soheila; KARIMI, Hossein; KRASKIAN, Adis; SHAHROKHI, Amin; TEYMOURI, Robab; GHARIB, Masoud
2017-01-01
Objective Bayley Scales of infant & toddler development is a well-known diagnostic developmental assessment tool for children aged 1–42 months. Our aim was investigating the validity & reliability of this scale in Persian speaking children. Materials & Methods The method was descriptive-analytic. Translation- back translation and cultural adaptation was done. Content & face validity of translated scale was determined by experts’ opinions. Overall, 403 children aged 1 to 42 months were recruited from health centers of Tehran, during years of 2013-2014 for developmental assessment in cognitive, communicative (receptive & expressive) and motor (fine & gross) domains. Reliability of scale was calculated through three methods; internal consistency using Cronbach’s alpha coefficient, test-retest and interrater methods. Construct validity was calculated using factor analysis and comparison of the mean scores methods. Results Cultural and linguistic changes were made in items of all domains especially on communication subscale. Content and face validity of the test were approved by experts’ opinions. Cronbach’s alpha coefficient was above 0.74 in all domains. Pearson correlation coefficient in various domains, were ≥ 0.982 in test retest method, and ≥0.993 in inter-rater method. Construct validity of the test was approved by factor analysis. Moreover, the mean scores for the different age groups were compared and statistically significant differences were observed between mean scores of different age groups, that confirms validity of the test. Conclusion The Bayley Scales of Infant and Toddler Development is a valid and reliable tool for child developmental assessment in Persian language children. PMID:28277556
Reliability analysis of the objective structured clinical examination using generalizability theory.
Trejo-Mejía, Juan Andrés; Sánchez-Mendiola, Melchor; Méndez-Ramírez, Ignacio; Martínez-González, Adrián
2016-01-01
The objective structured clinical examination (OSCE) is a widely used method for assessing clinical competence in health sciences education. Studies using this method have shown evidence of validity and reliability. There are no published studies of OSCE reliability measurement with generalizability theory (G-theory) in Latin America. The aims of this study were to assess the reliability of an OSCE in medical students using G-theory and explore its usefulness for quality improvement. An observational cross-sectional study was conducted at National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City. A total of 278 fifth-year medical students were assessed with an 18-station OSCE in a summative end-of-career final examination. There were four exam versions. G-theory with a crossover random effects design was used to identify the main sources of variance. Examiners, standardized patients, and cases were considered as a single facet of analysis. The exam was applied to 278 medical students. The OSCE had a generalizability coefficient of 0.93. The major components of variance were stations, students, and residual error. The sites and the versions of the tests had minimum variance. Our study achieved a G coefficient similar to that found in other reports, which is acceptable for summative tests. G-theory allows the estimation of the magnitude of multiple sources of error and helps decision makers to determine the number of stations, test versions, and examiners needed to obtain reliable measurements.
Validation of the breast evaluation questionnaire for breast hypertrophy and breast reduction.
Lewin, Richard; Elander, Anna; Lundberg, Jonas; Hansson, Emma; Thorarinsson, Andri; Claudelin, Malin; Bladh, Helena; Lidén, Mattias
2018-06-13
There is a lack of published, validated questionnaires for evaluating psychosocial morbidity in patients with breast hypertrophy undergoing breast reduction surgery. To validate the breast evaluation questionnaire (BEQ), originally developed for the assessment of breast augmentation patients, for the assessment of psychosocial morbidity in patients with breast hypertrophy undergoing breast reduction surgery. Validation study Subjects: Women with macromastia Methods: The validation of the BEQ, adapted to breast reduction, was performed in several steps. Content validity, reliability, construct validity and responsiveness were assessed. The original version was adjusted according to the results for content validity and resulted in item reduction and a modified BEQ (mBEQ) that was then assessed for reliability, construct validity and responsiveness. Internal and external validation was performed for the modified BEQ. Convergent validity was tested against Breast-Q (reduction) and discriminate validity was tested against the SF-36. Known-groups validation revealed significant differences between the normal population and patients undergoing breast reduction surgery. The BEQ showed good reliability by test-re-test analysis and high responsiveness. The modified BEQ may be reliable, valid and responsive instrument for assessing women who undergo breast reduction.
The Outcome and Assessment Information Set (OASIS): A Review of Validity and Reliability
O’CONNOR, MELISSA; DAVITT, JOAN K.
2015-01-01
The Outcome and Assessment Information Set (OASIS) is the patient-specific, standardized assessment used in Medicare home health care to plan care, determine reimbursement, and measure quality. Since its inception in 1999, there has been debate over the reliability and validity of the OASIS as a research tool and outcome measure. A systematic literature review of English-language articles identified 12 studies published in the last 10 years examining the validity and reliability of the OASIS. Empirical findings indicate the validity and reliability of the OASIS range from low to moderate but vary depending on the item studied. Limitations in the existing research include: nonrepresentative samples; inconsistencies in methods used, items tested, measurement, and statistical procedures; and the changes to the OASIS itself over time. The inconsistencies suggest that these results are tentative at best; additional research is needed to confirm the value of the OASIS for measuring patient outcomes, research, and quality improvement. PMID:23216513
Barthassat, Emilienne; Afifi, Faik; Konala, Praveen; Rasch, Helmut; Hirschmann, Michael T
2017-05-08
It was the primary purpose of our study to evaluate the inter- and intra-observer reliability of a standardized SPECT/CT algorithm for evaluating patients with painful primary total hip arthroplasty (THA). The secondary purpose was a comparison of semi-quantitative and 3D volumetric quantification method for assessment of bone tracer uptake (BTU) in those patients. A novel SPECT/CT localization scheme consisting of 14 femoral and 4 acetabular regions on standardized axial and coronal slices was introduced and evaluated in terms of inter- and intra-observer reliability in 37 consecutive patients with hip pain after THA. BTU for each anatomical region was assessed semi-quantitatively using a color-coded Likert type scale (0-10) and volumetrically quantified using a validated software. Two observers interpreted the SPECT/CT findings in all patients two times with six weeks interval between interpretations in random order. Semi-quantitative and quantitative measurements were compared in terms of reliability. In addition, the values were correlated using Pearson`s correlation. A factorial cluster analysis of BTU was performed to identify clinically relevant regions, which should be grouped and analysed together. The localization scheme showed high inter- and intra-observer reliabilities for all femoral and acetabular regions independent of the measurement method used (semiquantitative versus 3D volumetric quantitative measurements). A high to moderate correlation between both measurement methods was shown for the distal femur, the proximal femur and the acetabular cup. The factorial cluster analysis showed that the anatomical regions might be summarized into three distinct anatomical regions. These were the proximal femur, the distal femur and the acetabular cup region. The SPECT/CT algorithm for assessment of patients with pain after THA is highly reliable independent from the measurement method used. Three clinically relevant anatomical regions (proximal femoral, distal femoral, acetabular) were identified.
Crockford, Christopher; Newton, Judith; Lonergan, Katie; Madden, Caoifa; Mays, Iain; O'Sullivan, Meabhdh; Costello, Emmet; Pinto-Grau, Marta; Vajda, Alice; Heverin, Mark; Pender, Niall; Al-Chalabi, Ammar; Hardiman, Orla; Abrahams, Sharon
2018-02-01
Cognitive impairment affects approximately 50% of people with amyotrophic lateral sclerosis (ALS). Research has indicated that impairment may worsen with disease progression. The Edinburgh Cognitive and Behavioural ALS Screen (ECAS) was designed to measure neuropsychological functioning in ALS, with its alternate forms (ECAS-A, B, and C) allowing for serial assessment over time. The aim of the present study was to establish reliable change scores for the alternate forms of the ECAS, and to explore practice effects and test-retest reliability of the ECAS's alternate forms. Eighty healthy participants were recruited, with 57 completing two and 51 completing three assessments. Participants were administered alternate versions of the ECAS serially (A-B-C) at four-month intervals. Intra-class correlation analysis was employed to explore test-retest reliability, while analysis of variance was used to examine the presence of practice effects. Reliable change indices (RCI) and regression-based methods were utilized to establish change scores for the ECAS alternate forms. Test-retest reliability was excellent for ALS Specific, ALS Non-Specific, and ECAS Total scores of the combined ECAS A, B, and C (all > .90). No significant practice effects were observed over the three testing sessions. RCI and regression-based methods produced similar change scores. The alternate forms of the ECAS possess excellent test-retest reliability in a healthy control sample, with no significant practice effects. The use of conservative RCI scores is recommended. Therefore, a change of ≥8, ≥4, and ≥9 for ALS Specific, ALS Non-Specific, and ECAS Total score is required for reliable change.
Subject-level reliability analysis of fast fMRI with application to epilepsy.
Hao, Yongfu; Khoo, Hui Ming; von Ellenrieder, Nicolas; Gotman, Jean
2017-07-01
Recent studies have applied the new magnetic resonance encephalography (MREG) sequence to the study of interictal epileptic discharges (IEDs) in the electroencephalogram (EEG) of epileptic patients. However, there are no criteria to quantitatively evaluate different processing methods, to properly use the new sequence. We evaluated different processing steps of this new sequence under the common generalized linear model (GLM) framework by assessing the reliability of results. A bootstrap sampling technique was first used to generate multiple replicated data sets; a GLM with different processing steps was then applied to obtain activation maps, and the reliability of these maps was assessed. We applied our analysis in an event-related GLM related to IEDs. A higher reliability was achieved by using a GLM with head motion confound regressor with 24 components rather than the usual 6, with an autoregressive model of order 5 and with a canonical hemodynamic response function (HRF) rather than variable latency or patient-specific HRFs. Comparison of activation with IED field also favored the canonical HRF, consistent with the reliability analysis. The reliability analysis helps to optimize the processing methods for this fast fMRI sequence, in a context in which we do not know the ground truth of activation areas. Magn Reson Med 78:370-382, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Short assessment of the Big Five: robust across survey methods except telephone interviewing.
Lang, Frieder R; John, Dennis; Lüdtke, Oliver; Schupp, Jürgen; Wagner, Gert G
2011-06-01
We examined measurement invariance and age-related robustness of a short 15-item Big Five Inventory (BFI-S) of personality dimensions, which is well suited for applications in large-scale multidisciplinary surveys. The BFI-S was assessed in three different interviewing conditions: computer-assisted or paper-assisted face-to-face interviewing, computer-assisted telephone interviewing, and a self-administered questionnaire. Randomized probability samples from a large-scale German panel survey and a related probability telephone study were used in order to test method effects on self-report measures of personality characteristics across early, middle, and late adulthood. Exploratory structural equation modeling was used in order to test for measurement invariance of the five-factor model of personality trait domains across different assessment methods. For the short inventory, findings suggest strong robustness of self-report measures of personality dimensions among young and middle-aged adults. In old age, telephone interviewing was associated with greater distortions in reliable personality assessment. It is concluded that the greater mental workload of telephone interviewing limits the reliability of self-report personality assessment. Face-to-face surveys and self-administrated questionnaire completion are clearly better suited than phone surveys when personality traits in age-heterogeneous samples are assessed.
Objective measurements of excess skin in post bariatric patients--inter-rater reliability.
Biörserud, Christina; Fagevik Olsén, Monika; Elander, Anna; Wiklund, Malin
2016-01-01
An ability to reliably assess excess skin after massive weight loss using well-described and transferrable methods is important. The aim of this trial was to evaluate inter-rater reliability of ptosis and circumference measurements in patients with excess skin after bariatric surgery. Twenty-five postbariatric patients were included in the study, and their excess skin was measured 18 months after surgery. A protocol was designed to measure excess skin in a standardised way. To evaluate the inter-rater reliability in the measuring protocol, all patients were measured twice, by a specialist nurse and a specialist physiotherapist. All circumference measurements on different body parts had an ICC > 0.9, indicating high reliability. Furthermore, all breast and abdominal ptosis measurements had high reliability. In contrast, visual evaluation of abdominal ptosis had poor reliability. Measurements of ptoses on different body parts had an ICC > 0.6. There were no systematic differences between the results of the two testers, except for measurements of the buttocks and maximal knee circumference. The measuring protocol presented in this study has high reliability and, therefore, represents a useful instrument to provide a consistent and objective assessment of excess skin in the postbariatric patient.
A probability-based approach for assessment of roadway safety hardware.
DOT National Transportation Integrated Search
2017-03-14
This report presents a general probability-based approach for assessment of roadway safety hardware (RSH). It was achieved using a reliability : analysis method and computational techniques. With the development of high-fidelity finite element (FE) m...
HUMAN EXPOSURE ASSESSMENT USING IMMUNOASSAY
The National Exposure Research Laboratory-Las Vegas is developing analytical methods for human exposure assessment studies. Critical exposure studies generate a large number of samples which must be analyzed in a reliable, cost-effective and timely manner. TCP (3,5,6-trichlor...
A Protocol for Advanced Psychometric Assessment of Surveys
Squires, Janet E.; Hayduk, Leslie; Hutchinson, Alison M.; Cranley, Lisa A.; Gierl, Mark; Cummings, Greta G.; Norton, Peter G.; Estabrooks, Carole A.
2013-01-01
Background and Purpose. In this paper, we present a protocol for advanced psychometric assessments of surveys based on the Standards for Educational and Psychological Testing. We use the Alberta Context Tool (ACT) as an exemplar survey to which this protocol can be applied. Methods. Data mapping, acceptability, reliability, and validity are addressed. Acceptability is assessed with missing data frequencies and the time required to complete the survey. Reliability is assessed with internal consistency coefficients and information functions. A unitary approach to validity consisting of accumulating evidence based on instrument content, response processes, internal structure, and relations to other variables is taken. We also address assessing performance of survey data when aggregated to higher levels (e.g., nursing unit). Discussion. In this paper we present a protocol for advanced psychometric assessment of survey data using the Alberta Context Tool (ACT) as an exemplar survey; application of the protocol to the ACT survey is underway. Psychometric assessment of any survey is essential to obtaining reliable and valid research findings. This protocol can be adapted for use with any nursing survey. PMID:23401759
Cervical motion assessment using virtual reality.
Sarig-Bahat, Hilla; Weiss, Patrice L; Laufer, Yocheved
2009-05-01
Repeated measures of cervical motion in asymptomatic subjects. To introduce a virtual reality (VR)-based assessment of cervical range of motion (ROM); to establish inter and intratester reliability of the VR-based assessment in comparison with conventional assessment in asymptomatic individuals; and to evaluate the effect of a single VR session on cervical ROM. Cervical ROM and clinical issues related to neck pain is frequently studied. A wide variety of methods is available for evaluation of cervical motion. To date, most methods rely on voluntary responses to an assessor's instructions. However, in day-to-day life, head movement is generally an involuntary response to multiple stimuli. Therefore, there is a need for a more functional assessment method, using sensory stimuli to elicit spontaneous neck motion. VR attributes may provide a methodology for achieving this goal. A novel method was developed for cervical motion assessment utilizing an electromagnetic tracking system and a VR game scenario displayed via a head mounted device. Thirty asymptomatic participants were assessed by both conventional and VR-based methods. Inter and intratester repeatability analyses were performed. The effect of a single VR session on ROM was evaluated. Both assessments showed non-biased results between tests and between testers (P > 0.1). Full-cycle repeatability coefficients ranged between 15.0 degrees and 29.2 degrees with smaller values for rotation and for the VR assessment. A single VR session significantly increased ROM, with largest effect found in the rotation direction. Inter and intratester reliability was supported for both the VR-based and the conventional methods. Results suggest better repeatability for the VR method, with rotation being more precise than flexion/extension. A single VR session was found to be effective in increasing cervical motion, possibly due to its motivating effect.
Structured implicit review: a new method for monitoring nursing care quality.
Pearson, M L; Lee, J L; Chang, B L; Elliott, M; Kahn, K L; Rubenstein, L V
2000-11-01
Nurses' independent decisions about assessment, treatment, and nursing interventions for hospitalized patients are important determinants of quality of care. Physician peer implicit review of medical records has been central to Medicare quality management and is considered the gold standard for reviewing physician care, but peer implicit review of nursing processes of care has not received similar attention. The objective of this study was to develop and evaluate nurse structured implicit review (SIR) methods. We developed SIR instruments for rating the quality of inpatient nursing care for congestive heart failure (CHF) and cerebrovascular accident (CVA). Nurse reviewers used the SIR form to rate a nationally representative sample of randomly selected medical records for each disease from 297 acute care hospitals in 5 states (collected by the RAND-HCFA Prospective Payment System study). The study subjects were elderly Medicare inpatients with CHF (n = 291) or CVA (n = 283). We developed and tested scales reflecting domains of nursing process, evaluated interrater and interitem reliability, and assessed the extent to which items and scales predicted overall ratings of the quality of nursing care. Interrater reliability for 14 of 16 scales (CHF) or 10 of 16 scales (CVA) was > or = 0.40. Interitem reliability was > 0.80 for all but 1 scale (both diseases). Functional Assessment, Physical Assessment, and Medication Tracking ratings were the strongest predictors of overall nursing quality ratings (P < 0.001 for each). Nurse peer review with SIR has adequate interrater and excellent scale reliabilities and can be a valuable tool for assessing nurse performance.
Reliability Assessment for COTS Components in Space Flight Applications
NASA Technical Reports Server (NTRS)
Krishnan, G. S.; Mazzuchi, Thomas A.
2001-01-01
Systems built for space flight applications usually demand very high degree of performance and a very high level of accuracy. Hence, the design engineers are often prone to selecting state-of-art technologies for inclusion in their system design. The shrinking budgets also necessitate use of COTS (Commercial Off-The-Shelf) components, which are construed as being less expensive. The performance and accuracy requirements for space flight applications are much more stringent than those for the commercial applications. The quantity of systems designed and developed for space applications are much lower in number than those produced for the commercial applications. With a given set of requirements, are these COTS components reliable? This paper presents a model for assessing the reliability of COTS components in space applications and the associated affect on the system reliability. We illustrate the method with a real application.
Karakuła-Juchnowicz, Hanna; Stecka, Mariola
2017-08-29
In view of unavailability in Poland of the standardized methods to measure PIQ, the aim of the work was to develop a Polish test to assess the premorbid level of intelligence - PART(Polish AdultReading Test) and to measureits psychometric properties, such as validity, reliability as well as standardization in the group of schizophrenia patients. The principles of PART construction were based on the idea of popular worldwide National Adult Reading Test by Hazel Nelson. The research comprised a group of 122 subjects (65 schizophrenia patients and 57 healthy people), aged 18-60 years, matched for age and gender. PART appears to be a method with high internal consistency and reliability measured by test-retest, inter-rater reliability, and the method with acceptable diagnostic and prognostic validity. The standardized procedures of PART have been investigated and described. Considering the psychometric values of PART and a short time of its performance, the test may be a useful diagnostic instrument in the assessment of premorbid level of intelligence in a group of schizophrenic patients.
NASA Astrophysics Data System (ADS)
Xu, Jun; Kong, Fan
2018-05-01
Extreme value distribution (EVD) evaluation is a critical topic in reliability analysis of nonlinear structural dynamic systems. In this paper, a new method is proposed to obtain the EVD. The maximum entropy method (MEM) with fractional moments as constraints is employed to derive the entire range of EVD. Then, an adaptive cubature formula is proposed for fractional moments assessment involved in MEM, which is closely related to the efficiency and accuracy for reliability analysis. Three point sets, which include a total of 2d2 + 1 integration points in the dimension d, are generated in the proposed formula. In this regard, the efficiency of the proposed formula is ensured. Besides, a "free" parameter is introduced, which makes the proposed formula adaptive with the dimension. The "free" parameter is determined by arranging one point set adjacent to the boundary of the hyper-sphere which contains the bulk of total probability. In this regard, the tail distribution may be better reproduced and the fractional moments could be evaluated with accuracy. Finally, the proposed method is applied to a ten-storey shear frame structure under seismic excitations, which exhibits strong nonlinearity. The numerical results demonstrate the efficacy of the proposed method.
An Assessment Instrument to Measure Geospatial Thinking Expertise
ERIC Educational Resources Information Center
Huynh, Niem Tu; Sharpe, Bob
2013-01-01
Spatial thinking is fundamental to the practice and theory of geography, however there are few valid and reliable assessment methods in geography to measure student performance in spatial thinking. This article presents the development and evaluation of a geospatial thinking assessment instrument to measure participant understanding of spatial…
ERIC Educational Resources Information Center
Dirlikov, Benjamin; Younes, Laurent; Nebel, Mary Beth; Martinelli, Mary Katherine; Tiedemann, Alyssa Nicole; Koch, Carolyn A.; Fiorilli, Diana; Bastian, Amy J.; Denckla, Martha Bridge; Miller, Michael I.; Mostofsky, Stewart H.
2017-01-01
This study presents construct validity for a novel automated morphometric and kinematic handwriting assessment, including (1) convergent validity, establishing reliability of automated measures with traditional manual-derived Minnesota Handwriting Assessment (MHA), and (2) discriminant validity, establishing that the automated methods distinguish…
Smith, Heidi A.B.; Gangopadhyay, Maalobeeka; Goben, Christina M.; Jacobowski, Natalie L.; Chestnut, Mary Hamilton; Savage, Shane; Rutherford, Michael T.; Denton, Danica; Thompson, Jennifer L.; Chandrasekhar, Rameela; Acton, Michelle; Newman, Jessica; Noori, Hannah P.; Terrell, Michelle K.; Williams, Stacey R.; Griffith, Katherine; Cooper, Timothy J.; Ely, E. Wesley; Fuchs, D. Catherine; Pandharipande, Pratik P.
2015-01-01
RATIONALE and OBJECTIVE Delirium assessments in critically ill infants and young children pose unique challenges due to evolution of cognitive and language skills. The objectives of this study were to determine the validity and reliability of a fundamentally objective and developmentally appropriate delirium assessment tool for critically ill infants and preschool-aged children, and to determine delirium prevalence. DESIGN and SETTING Prospective, observational cohort validation study of the PreSchool Confusion Assessment Method for the ICU (psCAM-ICU) in a tertiary medical center pediatric ICU. PATIENTS Participants aged 6 months to 5 years and admitted to the pediatric ICU regardless of admission diagnosis were enrolled. INTERVENTIONS, MEASUREMENTS and MAIN RESULTS An interdisciplinary team created the psCAM-ICU for pediatric delirium monitoring. To assess validity, patients were independently assessed for delirium daily by the research team using the psCAM-ICU and by a child psychiatrist using the Diagnostic and Statistical Manual of Mental Disorders criteria. Reliability was assessed using blinded, concurrent psCAM-ICU evaluations by research staff. A total of 530-paired delirium assessments were completed among 300 patients, with a median age of 20 months (IQR 11, 37) and 43% requiring mechanical ventilation. The psCAM-ICU demonstrated a specificity of 91% (95%CI 90, 93), sensitivity of 75% (72, 78), negative predictive value of 86% (84, 88), positive predictive value of 84% (81, 87), and a reliability kappa statistic of 0.79 (0.76, 0.83). Delirium prevalence was 44% using the psCAM-ICU and 47% by the reference-rater. The rates of delirium were 53% vs. 56% in patients < 2 years of age and 33% vs. 35% in patients ≥ 2 - 5 years of age using the psCAM-ICU and reference-rater respectively. The short-form psCAM-ICU maintained a high specificity (87%) and sensitivity (78%) in post-hoc analysis. CONCLUSIONS The psCAM-ICU is a highly valid and reliable delirium instrument for critically ill infants and preschool-aged children, in whom delirium is extremely prevalent. PMID:26565631
Methods for assessing the preventability of adverse drug events: a systematic review.
Hakkarainen, Katja Marja; Andersson Sundell, Karolina; Petzold, Max; Hägg, Staffan
2012-02-01
Preventable adverse drug events (ADEs) are common in both outpatient and inpatient settings. However, the proportion of preventable ADEs varies considerably in different studies, even when conducted in the same setting, and methods for assessing the preventability of ADEs are diverse. The aim of this article is to identify and systematically evaluate methods for assessing the preventability of ADEs. Seven databases (Cochrane, CINAHL, EMBASE, IPA, MEDLINE, PsycINFO and Web of Science) were searched in September 2010 utilizing the databases' index terms and other common terminology on preventable ADEs. No limits for the years of publication were set. Reference lists of included original articles and relevant review articles were also screened. After applying predetermined inclusion and exclusion criteria on 4161 unique citations, 142 (3.4%) original research articles were included in the review. One additional article was included from reference lists. Outcome measures of included studies had to include the frequency of ADEs and the assessment of their preventability. Studies were excluded if they focused on individuals with one specific type of treatment, medical condition, medical procedure or ADE. Measurement instruments for determining the preventability of ADEs in each article were extracted and unique instruments were compared. The process of assessing the preventability of ADEs was described based on reported actions taken to standardize and conduct the assessment, and on information about the reliability and validity of the assessment. Eighteen unique instruments for determining the preventability of ADEs were identified. They fell under the following four groups: (i) instruments using a definition of preventability only (n = 3); (ii) instruments with a definition of preventability and an assessment scale for determining preventability (n = 5); (iii) instruments with specific criteria for each preventability category (n = 3); and (iv) instruments with an algorithm for determining preventability (n = 7). Of actions to standardize the assessment process, performing a pilot study was reported in 21 (15%), and use of a standardized protocol was reported in 18 (13%), of the included 143 articles. Preventability was assessed by physicians in 86 (60%) articles and by pharmacists in 41 (29%) articles. In 29 (20%) articles, persons conducting the assessment were described as trained for or experienced in preventability assessment. In 94 (66%) articles, more than one person assessed the preventability of each case. Among these 94 articles, assessment was done independently in 73 (51%) articles. Procedures for managing conflicting assessments were diverse. The reliability of the preventability assessment was tested in 39 (27%) articles, and 16 (11%) articles referred to a previous reliability assessment. Reliability ranged from poor to excellent (kappa 0.19-0.98; overall agreement 26-97%). Four (3%) articles mentioned assessing validity, but no sensitivity or specificity analyses or negative or positive predictive values were presented. Instruments for assessing the preventability of ADEs vary from implicit instruments to explicit algorithms. There is limited evidence for the validity of the identified instruments, and instrument reliability varied significantly. The process of assessing the preventability of ADEs is also commonly imprecisely described, which hinders the interpretation and comparison of studies. For measuring the preventability of ADEs more accurately and precisely in future, we believe that existing instruments should be further studied and developed, or that one or more new instruments should be developed, and the validity and reliability of the existing and new instruments be established.
Phuong Hoa, Nguyen; Walker, Sue M.; Hill, Peter S.; Rao, Chalapati
2018-01-01
Background Mortality statistics form a crucial component of national Health Management Information Systems (HMIS). However, there are limitations in the availability and quality of mortality data at national level in Viet Nam. This study assessed the completeness of recorded deaths and the reliability of recorded causes of death (COD) in the A6 death registers in the national routine HMIS in Viet Nam. Methodology and findings 1477 identified deaths in 2014 were reviewed in two provinces. A capture-recapture method was applied to assess the completeness of the A6 death registers. 1365 household verbal autopsy (VA) interviews were successfully conducted, and these were reviewed by physicians who assigned multiple and underlying cause of death (UCOD). These UCODs from VA were then compared with the CODs recorded in the A6 death registers, using kappa scores to assess the reliability of the A6 death register diagnoses. The overall completeness of the A6 death registers in the two provinces was 89.3% (95%CI: 87.8–90.8). No COD recorded in the A6 death registers demonstrated good reliability. There is very low reliability in recording of cardiovascular deaths (kappa for stroke = 0.47 and kappa for ischaemic heart diseases = 0.42) and diabetes (kappa = 0.33). The reporting of deaths due to road traffic accidents, HIV and some cancers are at a moderate level of reliability with kappa scores ranging between 0.57–0.69 (p<0.01). VA methods identify more specific COD than the A6 death registers, and also allow identification of multiple CODs. Conclusions The study results suggest that data completeness in HMIS A6 death registers in the study sample of communes was relatively high (nearly 90%), but triangulation with death records from other sources would improve the completeness of this system. Further, there is an urgent need to enhance the reliability of COD recorded in the A6 death registers, for which VA methods could be effective. Focussed consultation among stakeholders is needed to develop a suitable mechanism and process for integrating VA methods into the national routine HMIS A6 death registers in Viet Nam. PMID:29370191
NASA Astrophysics Data System (ADS)
Bogachkov, I. V.; Lutchenko, S. S.
2018-05-01
The article deals with the method for the assessment of the fiber optic communication lines (FOCL) reliability taking into account the effect of the optical fiber tension, the temperature influence and the built-in diagnostic equipment errors of the first kind. The reliability is assessed in terms of the availability factor using the theory of Markov chains and probabilistic mathematical modeling. To obtain a mathematical model, the following steps are performed: the FOCL state is defined and validated; the state graph and system transitions are described; the system transition of states that occur at a certain point is specified; the real and the observed time of system presence in the considered states are identified. According to the permissible value of the availability factor, it is possible to determine the limiting frequency of FOCL maintenance.
Gamado, Kokouvi; Marion, Glenn; Porphyre, Thibaud
2017-01-01
Livestock epidemics have the potential to give rise to significant economic, welfare, and social costs. Incursions of emerging and re-emerging pathogens may lead to small and repeated outbreaks. Analysis of the resulting data is statistically challenging but can inform disease preparedness reducing potential future losses. We present a framework for spatial risk assessment of disease incursions based on data from small localized historic outbreaks. We focus on between-farm spread of livestock pathogens and illustrate our methods by application to data on the small outbreak of Classical Swine Fever (CSF) that occurred in 2000 in East Anglia, UK. We apply models based on continuous time semi-Markov processes, using data-augmentation Markov Chain Monte Carlo techniques within a Bayesian framework to infer disease dynamics and detection from incompletely observed outbreaks. The spatial transmission kernel describing pathogen spread between farms, and the distribution of times between infection and detection, is estimated alongside unobserved exposure times. Our results demonstrate inference is reliable even for relatively small outbreaks when the data-generating model is known. However, associated risk assessments depend strongly on the form of the fitted transmission kernel. Therefore, for real applications, methods are needed to select the most appropriate model in light of the data. We assess standard Deviance Information Criteria (DIC) model selection tools and recently introduced latent residual methods of model assessment, in selecting the functional form of the spatial transmission kernel. These methods are applied to the CSF data, and tested in simulated scenarios which represent field data, but assume the data generation mechanism is known. Analysis of simulated scenarios shows that latent residual methods enable reliable selection of the transmission kernel even for small outbreaks whereas the DIC is less reliable. Moreover, compared with DIC, model choice based on latent residual assessment correlated better with predicted risk. PMID:28293559
Takasaki, Hiroshi; Okuyama, Kousuke; Rosedale, Richard
2017-02-01
Mechanical Diagnosis and Therapy (MDT) is used in the treatment of extremity problems. Classifying clinical problems is one method of providing effective treatment to a target population. Classification reliability is a key factor to determine the precise clinical problem and to direct an appropriate intervention. To explore inter-examiner reliability of the MDT classification for extremity problems in three reliability designs: 1) vignette reliability using surveys with patient vignettes, 2) concurrent reliability, where multiple assessors decide a classification by observing someone's assessment, 3) successive reliability, where multiple assessors independently assess the same patient at different times. Systematic review with data synthesis in a quantitative format. Agreement of MDT subgroups was examined using the Kappa value, with the operational definition of acceptable reliability set at ≥ 0.6. The level of evidence was determined considering the methodological quality of the studies. Six studies were included and all studies met the criteria for high quality. Kappa values for the vignette reliability design (five studies) were ≥ 0.7. There was data from two cohorts in one study for the concurrent reliability design and the Kappa values ranged from 0.45 to 1.0. Kappa values for the successive reliability design (data from three cohorts in one study) were < 0.6. The current review found strong evidence of acceptable inter-examiner reliability of MDT classification for extremity problems in the vignette reliability design, limited evidence of acceptable reliability in the concurrent reliability design and unacceptable reliability in the successive reliability design. Copyright © 2017 Elsevier Ltd. All rights reserved.
Gorbett, Gregory E; Morris, Sarah M; Meacham, Brian J; Wood, Christopher B
2015-01-01
A new method to characterize the degree of fire damage to gypsum wallboard is introduced, implemented, and tested to determine the efficacy of its application among novices. The method was evaluated by comparing degree of fire damage assessments of novices with and without the method. Thirty-nine "novice" raters assessed damage to a gypsum wallboard surface, completing 66 ratings, first without the method, and then again using the method. The inter-rater reliability was evaluated for ratings of damage without and with the method. For novice fire investigators rating degree of damage without the aid of the method, ICC(1,2) = 0.277 with 95% CI (0.211, 0.365), and with the method, ICC(2,1) = 0.593 with 95% CI (0.509, 0.684). Results indicate that the raters were more reliable in their analysis of the degree of fire damage when using the method, which support the use of standardized processes to decrease the variability in data collection and interpretation. © 2014 American Academy of Forensic Sciences.
ERIC Educational Resources Information Center
Ding, Ding; Sallis, James F.; Norman, Gregory J.; Saelens, Brian E.; Harris, Sion Kim; Kerr, Jacqueline; Rosenberg, Dori; Durant, Nefertiti; Glanz, Karen
2012-01-01
Objectives: To determine (1) reliability of new food environment measures; (2) association between home food environment and fruit and vegetable (FV) intake; and (3) association between community and home food environment. Methods: In 2005, a cross-sectional survey was conducted with readministration to assess test-retest reliability. Adolescents,…
A Comparison of Three Methods for the Analysis of Skin Flap Viability: Reliability and Validity.
Tim, Carla Roberta; Martignago, Cintia Cristina Santi; da Silva, Viviane Ribeiro; Dos Santos, Estefany Camila Bonfim; Vieira, Fabiana Nascimento; Parizotto, Nivaldo Antonio; Liebano, Richard Eloin
2018-05-01
Objective: Technological advances have provided new alternatives to the analysis of skin flap viability in animal models; however, the interrater validity and reliability of these techniques have yet to be analyzed. The present study aimed to evaluate the interrater validity and reliability of three different methods: weight of paper template (WPT), paper template area (PTA), and photographic analysis. Approach: Sixteen male Wistar rats had their cranially based dorsal skin flap elevated. On the seventh postoperative day, the viable tissue area and the necrotic area of the skin flap were recorded using the paper template method and photo image. The evaluation of the percentage of viable tissue was performed using three methods, simultaneously and independently by two raters. The analysis of interrater reliability and viability was performed using the intraclass correlation coefficient and Bland Altman Plot Analysis was used to visualize the presence or absence of systematic bias in the evaluations of data validity. Results: The results showed that interrater reliability for WPT, measurement of PTA, and photographic analysis were 0.995, 0.990, and 0.982, respectively. For data validity, a correlation >0.90 was observed for all comparisons made between the three methods. In addition, Bland Altman Plot Analysis showed agreement between the comparisons of the methods and the presence of systematic bias was not observed. Innovation: Digital methods are an excellent choice for assessing skin flap viability; moreover, they make data use and storage easier. Conclusion: Independently from the method used, the interrater reliability and validity proved to be excellent for the analysis of skin flaps' viability.
Terashima, Taiko; Yoshimura, Sadako
2018-03-01
To determine whether nurses can accurately assess the skin colour of replanted fingers displayed as digital images on a computer screen. Colour measurement and clinical diagnostic methods for medical digital images have been studied, but reproducing skin colour on a computer screen remains difficult. The inter-rater reliability of skin colour assessment scores was evaluated. In May 2014, 21 nurses who worked on a trauma ward in Japan participated in testing. Six digital images with different skin colours were used. Colours were scored from both digital images and direct patient's observation. The score from a digital image was defined as the test score, and its difference from the direct assessment score as the difference score. Intraclass correlation coefficients were calculated. Nurses' opinions were classified and summarised. The intraclass correlation coefficients for the test scores were fair. Although the intraclass correlation coefficients for the difference scores were poor, they improved to good when three images that might have contributed to poor reliability were excluded. Most nurses stated that it is difficult to assess skin colour in digital images; they did not think it could be a substitute for direct visual assessment. However, most nurses were in favour of including images in nursing progress notes. Although the inter-rater reliability was fairly high, the reliability of colour reproduction in digital images as indicated by the difference scores was poor. Nevertheless, nurses expect the incorporation of digital images in nursing progress notes to be useful. This gap between the reliability of digital colour reproduction and nurses' expectations towards it must be addressed. High inter-rater reliability for digital images in nursing progress notes was not observed. Assessments of future improvements in colour reproduction technologies are required. Further digitisation and visualisation of nursing records might pose challenges. © 2017 John Wiley & Sons Ltd.
Delatour, Vincent; Lalere, Beatrice; Saint-Albin, Karène; Peignaux, Maryline; Hattchouel, Jean-Marc; Dumont, Gilles; De Graeve, Jacques; Vaslin-Reimann, Sophie; Gillery, Philippe
2012-11-20
The reliability of biological tests is a major issue for patient care in terms of public health that involves high economic stakes. Reference methods, as well as regular external quality assessment schemes (EQAS), are needed to monitor the analytical performance of field methods. However, control material commutability is a major concern to assess method accuracy. To overcome material non-commutability, we investigated the possibility of using lyophilized serum samples together with a limited number of frozen serum samples to assign matrix-corrected target values, taking the example of glucose assays. Trueness of the current glucose assays was first measured against a primary reference method by using human frozen sera. Methods using hexokinase and glucose oxidase with spectroreflectometric detection proved very accurate, with bias ranging between -2.2% and +2.3%. Bias of methods using glucose oxidase with spectrophotometric detection was +4.5%. Matrix-related bias of the lyophilized materials was then determined and ranged from +2.5% to -14.4%. Matrix-corrected target values were assigned and used to assess trueness of 22 sub-peer groups. We demonstrated that matrix-corrected target values can be a valuable tool to assess field method accuracy in large scale surveys where commutable materials are not available in sufficient amount with acceptable costs. Copyright © 2012 Elsevier B.V. All rights reserved.
Metrics for Assessing the Reliability of a Telemedicine Remote Monitoring System
Fox, Mark; Papadopoulos, Amy; Crump, Cindy
2013-01-01
Abstract Objective: The goal of this study was to assess using new metrics the reliability of a real-time health monitoring system in homes of older adults. Materials and Methods: The “MobileCare Monitor” system was installed into the homes of nine older adults >75 years of age for a 2-week period. The system consisted of a wireless wristwatch-based monitoring system containing sensors for location, temperature, and impacts and a “panic” button that was connected through a mesh network to third-party wireless devices (blood pressure cuff, pulse oximeter, weight scale, and a survey-administering device). To assess system reliability, daily phone calls instructed participants to conduct system tests and reminded them to fill out surveys and daily diaries. Phone reports and participant diary entries were checked against data received at a secure server. Results: Reliability metrics assessed overall system reliability, data concurrence, study effectiveness, and system usability. Except for the pulse oximeter, system reliability metrics varied between 73% and 92%. Data concurrence for proximal and distal readings exceeded 88%. System usability following the pulse oximeter firmware update varied between 82% and 97%. An estimate of watch-wearing adherence within the home was quite high, about 80%, although given the inability to assess watch-wearing when a participant left the house, adherence likely exceeded the 10 h/day requested time. In total, 3,436 of 3,906 potential measurements were obtained, indicating a study effectiveness of 88%. Conclusions: The system was quite effective in providing accurate remote health data. The different system reliability measures identify important error sources in remote monitoring systems. PMID:23611640
Validity and Reliability of the Turkish Chronic Pain Acceptance Questionnaire
Akmaz, Hazel Ekin; Uyar, Meltem; Kuzeyli Yıldırım, Yasemin; Akın Korhan, Esra
2018-01-01
Background: Pain acceptance is the process of giving up the struggle with pain and learning to live a worthwhile life despite it. In assessing patients with chronic pain in Turkey, making a diagnosis and tracking the effectiveness of treatment is done with scales that have been translated into Turkish. However, there is as yet no valid and reliable scale in Turkish to assess the acceptance of pain. Aims: To validate a Turkish version of the Chronic Pain Acceptance Questionnaire developed by McCracken and colleagues. Study Design: Methodological and cross sectional study. Methods: A simple randomized sampling method was used in selecting the study sample. The sample was composed of 201 patients, more than 10 times the number of items examined for validity and reliability in the study, which totaled 20. A patient identification form, the Chronic Pain Acceptance Questionnaire, and the Brief Pain Inventory were used to collect data. Data were collected by face-to-face interviews. In the validity testing, the content validity index was used to evaluate linguistic equivalence, content validity, construct validity, and expert views. In reliability testing of the scale, Cronbach’s α coefficient was calculated, and item analysis and split-test reliability methods were used. Principal component analysis and varimax rotation were used in factor analysis and to examine factor structure for construct concept validity. Results: The item analysis established that the scale, all items, and item-total correlations were satisfactory. The mean total score of the scale was 21.78. The internal consistency coefficient was 0.94, and the correlation between the two halves of the scale was 0.89. Conclusion: The Chronic Pain Acceptance Questionnaire, which is intended to be used in Turkey upon confirmation of its validity and reliability, is an evaluation instrument with sufficient validity and reliability, and it can be reliably used to examine patients’ acceptance of chronic pain. PMID:29843496
Development and application of basis database for materials life cycle assessment in china
NASA Astrophysics Data System (ADS)
Li, Xiaoqing; Gong, Xianzheng; Liu, Yu
2017-03-01
As the data intensive method, high quality environmental burden data is an important premise of carrying out materials life cycle assessment (MLCA), and the reliability of data directly influences the reliability of the assessment results and its application performance. Therefore, building Chinese MLCA database is the basic data needs and technical supports for carrying out and improving LCA practice. Firstly, some new progress on database which related to materials life cycle assessment research and development are introduced. Secondly, according to requirement of ISO 14040 series standards, the database framework and main datasets of the materials life cycle assessment are studied. Thirdly, MLCA data platform based on big data is developed. Finally, the future research works were proposed and discussed.
Hofmann, Elisabeth; Robold, Matthias; Proff, Peter; Kirschneck, Christian
2017-03-01
The method published in 1973 by Demirjian et al. to assess age based on the mineralisation stage of permanent teeth is standard practice in forensic and orthodontic diagnostics. From age 14 onwards, however, this method is only applicable to third molars. No current epidemiological data on third molar mineralisation are available for Caucasian Central-Europeans. Thus, a method for assessing age in this population based on third molar mineralisation is presented, taking into account possible topographic and gender-specific differences. The study included 486 Caucasian Central-European orthodontic patients (9-24 years) with unaffected dental development. In an anonymized, randomized, and blinded manner, one orthopantomogram of each patient at either start, mid or end of treatment was visually analysed regarding the mineralisation stage of the third molars according to the method by Demirjian et al. Corresponding topographic and gender-specific point scores were determined and added to form a dental maturity score. Prediction equations for age assessment were derived by linear regression analysis with chronological age and checked for reliability within the study population. Mineralisation of the lower third molars was slower than mineralisation of the upper third molars, whereas no jaw-side-specific differences were detected. Gender-specific differences were relatively small, but girls reached mineralisation stage C earlier than boys, whereas boys showed an accelerated mineralisation between the ages of 15 and 16. The global equation generated by regression analysis (age = -1.103 + 0.268 × dental maturity score 18 + 28 + 38 + 48) is sufficiently accurate and reliable for clinical use. Age assessment only based on either maxilla or mandible also shows good prognostic reliability.
Baylis, Adriane; Chapman, Kathy; Whitehill, Tara L; Group, The Americleft Speech
2015-11-01
To investigate the validity and reliability of multiple listener judgments of hypernasality and audible nasal emission, in children with repaired cleft palate, using visual analog scaling (VAS) and equal-appearing interval (EAI) scaling. Prospective comparative study of multiple listener ratings of hypernasality and audible nasal emission. Multisite institutional. Five trained and experienced speech-language pathologist listeners from the Americleft Speech Project. Average VAS and EAI ratings of hypernasality and audible nasal emission/turbulence for 12 video-recorded speech samples from the Americleft Speech Project. Intrarater and interrater reliability was computed, as well as linear and polynomial models of best fit. Intrarater and interrater reliability was acceptable for both rating methods; however, reliability was higher for VAS as compared to EAI ratings. When VAS ratings were plotted against EAI ratings, results revealed a stronger curvilinear relationship. The results of this study provide additional evidence that alternate rating methods such as VAS may offer improved validity and reliability over EAI ratings of speech. VAS should be considered a viable method for rating hypernasality and nasal emission in speech in children with repaired cleft palate.
New methods for analyzing semantic graph based assessments in science education
NASA Astrophysics Data System (ADS)
Vikaros, Lance Steven
This research investigated how the scoring of semantic graphs (known by many as concept maps) could be improved and automated in order to address issues of inter-rater reliability and scalability. As part of the NSF funded SENSE-IT project to introduce secondary school science students to sensor networks (NSF Grant No. 0833440), semantic graphs illustrating how temperature change affects water ecology were collected from 221 students across 16 schools. The graphing task did not constrain students' use of terms, as is often done with semantic graph based assessment due to coding and scoring concerns. The graphing software used provided real-time feedback to help students learn how to construct graphs, stay on topic and effectively communicate ideas. The collected graphs were scored by human raters using assessment methods expected to boost reliability, which included adaptations of traditional holistic and propositional scoring methods, use of expert raters, topical rubrics, and criterion graphs. High levels of inter-rater reliability were achieved, demonstrating that vocabulary constraints may not be necessary after all. To investigate a new approach to automating the scoring of graphs, thirty-two different graph features characterizing graphs' structure, semantics, configuration and process of construction were then used to predict human raters' scoring of graphs in order to identify feature patterns correlated to raters' evaluations of graphs' topical accuracy and complexity. Results led to the development of a regression model able to predict raters' scoring with 77% accuracy, with 46% accuracy expected when used to score new sets of graphs, as estimated via cross-validation tests. Although such performance is comparable to other graph and essay based scoring systems, cross-context testing of the model and methods used to develop it would be needed before it could be recommended for widespread use. Still, the findings suggest techniques for improving the reliability and scalability of semantic graph based assessments without requiring constraint of how ideas are expressed.
Assessing local instrument reliability and validity: a field-based example from northern Uganda.
Betancourt, Theresa S; Bass, Judith; Borisova, Ivelina; Neugebauer, Richard; Speelman, Liesbeth; Onyango, Grace; Bolton, Paul
2009-08-01
This paper presents an approach for evaluating the reliability and validity of mental health measures in non-Western field settings. We describe this approach using the example of our development of the Acholi psychosocial assessment instrument (APAI), which is designed to assess depression-like (two tam, par and kumu), anxiety-like (ma lwor) and conduct problems (kwo maraco) among war-affected adolescents in northern Uganda. To examine the criterion validity of this measure in the absence of a traditional gold standard, we derived local syndrome terms from qualitative data and used self reports of these syndromes by indigenous people as a reference point for determining caseness. Reliability was examined using standard test-retest and inter-rater methods. Each of the subscale scores for the depression-like syndromes exhibited strong internal reliability ranging from alpha = 0.84-0.87. Internal reliability was good for anxiety (0.70), conduct problems (0.83), and the pro-social attitudes and behaviors (0.70) subscales. Combined inter-rater reliability and test-retest reliability were good for most subscales except for the conduct problem scale and prosocial scales. The pattern of significant mean differences in the corresponding APAI problem scale score between self-reported cases vs. noncases on local syndrome terms was confirmed in the data for all of the three depression-like syndromes, but not for the anxiety-like syndrome ma lwor or the conduct problem kwo maraco.
Lee, Hoe C.; Yanting Chee, Derserri; Selander, Helena; Falkmer, Torbjorn
2012-01-01
Background Current methods of determining licence retainment or cancellation is through on-road driving tests. Previous research has shown that occupational therapists frequently assess drivers’ visual attention while sitting in the back seat on the opposite side of the driver. Since the eyes of the driver are not always visible, assessment by eye contact becomes problematic. Such procedural drawbacks may challenge validity and reliability of the visual attention assessments. In terms of correctly classified attention, the aim of the study was to establish the accuracy and the inter-rater reliability of driving assessments of visual attention from the back seat. Furthermore, by establishing eye contact between the assessor and the driver through an additional mirror on the wind screen, the present study aimed to establish how much such an intervention would enhance the accuracy of the visual attention assessment. Methods Two drivers with Parkinson's disease (PD) and six control drivers drove a fixed route in a driving simulator while wearing a head mounted eye tracker. The eye tracker data showed where the foveal visual attention actually was directed. These data were time stamped and compared with the simultaneous manual scoring of the visual attention of the drivers. In four of the drivers, one with Parkinson's disease, a mirror on the windscreen was set up to arrange for eye contact between the driver and the assessor. Inter-rater reliability was performed with one of the Parkinson drivers driving, but without the mirror. Results Without mirror, the overall accuracy was 56% when assessing the three control drivers and with mirror 83%. However, for the PD driver without mirror the accuracy was 94%, whereas for the PD driver with a mirror the accuracy was 90%. With respect to the inter-rater reliability, a 73% agreement was found. Conclusion If the final outcome of a driving assessment is dependent on the subcategory of a protocol assessing visual attention, we suggest the use of an additional mirror to establish eye contact between the assessor and the driver. The clinicians’ observations on-road should not be a standalone assessment in driving assessments. Instead, eye trackers should be employed for further analyses and correlation in cases where there is doubt about a driver's attention. PMID:22461850
Reliability of a survey tool for measuring consumer nutrition environment in urban food stores.
Hosler, Akiko S; Dharssi, Aliza
2011-01-01
Despite the increase in the volume and importance of food environment research, there is a general lack of reliable measurement tools. This study presents the development and reliability assessment of a tool for measuring consumer nutrition environment in urban food stores. Cross-sectional design. A racially diverse downtown portion (6 ZIP code areas) in Albany, New York. A sample of 39 food stores was visited by our research team in 2009 to 2010. These stores were randomly selected from 123 eligible food stores identified through multiple government lists and ground-truthing. The Food Retail Outlet Survey Tool was developed to assess the presence of selected food and nonfood items, placement, milk prices, physical characteristics of the store, policy implementation, and advertisements on outside windows. For in-store items, agreement of observations between experienced and lightly trained surveyors was assessed. For window advertisement assessments, inter-method agreement (on-site sketch vs digital photo), and inter-rater agreement (both on-site) among lightly trained surveyors were evaluated. Percent agreement, Kappa, and prevalence-adjusted bias-adjusted kappa were calculated for in-store observations. Interclass correlation coefficients were calculated for window observations. Twenty-seven of the 47 in-store items had 100% agreement. The prevalence-adjusted bias-adjusted kappa indicated excellent agreement (≥0.90) on all items, except aisle width (0.74) and dark-green/orange colored fresh vegetables (0.85). The store type (nonconvenience store), the order of visits (first half), and the time to complete survey (>10 minutes) were associated with lower reliability in these 2 items. Both the inter-method and inter-rater agreements for window advertisements were uniformly high (intraclass correlation coefficient ranged 0.94-1.00), indicating high reliability. The Food Retail Outlet Survey Tool is a reliable tool for quickly measuring consumer nutrition environment. It can be effectively used by an individual who attended a 30-minute group briefing and practiced with 3 to 4 stores.
Nguyen, Anh-Dung; Boling, Michelle C; Slye, Carrie A; Hartley, Emily M; Parisi, Gina L
2013-01-01
Accurate, efficient, and reliable measurement methods are essential to prospectively identify risk factors for knee injuries in large cohorts. To determine tester reliability using digital photographs for the measurement of static lower extremity alignment (LEA) and whether values quantified with an electromagnetic motion-tracking system are in agreement with those quantified with clinical methods and digital photographs. Descriptive laboratory study. Laboratory. Thirty-three individuals participated and included 17 (10 women, 7 men; age = 21.7 ± 2.7 years, height = 163.4 ± 6.4 cm, mass = 59.7 ± 7.8 kg, body mass index = 23.7 ± 2.6 kg/m2) in study 1, in which we examined the reliability between clinical measures and digital photographs in 1 trained and 1 novice investigator, and 16 (11 women, 5 men; age = 22.3 ± 1.6 years, height = 170.3 ± 6.9 cm, mass = 72.9 ± 16.4 kg, body mass index = 25.2 ± 5.4 kg/m2) in study 2, in which we examined the agreement among clinical measures, digital photographs, and an electromagnetic tracking system. We evaluated measures of pelvic angle, quadriceps angle, tibiofemoral angle, genu recurvatum, femur length, and tibia length. Clinical measures were assessed using clinically accepted methods. Frontal- and sagittal-plane digital images were captured and imported into a computer software program. Anatomic landmarks were digitized using an electromagnetic tracking system to calculate static LEA. Intraclass correlation coefficients and standard errors of measurement were calculated to examine tester reliability. We calculated 95% limits of agreement and used Bland-Altman plots to examine agreement among clinical measures, digital photographs, and an electromagnetic tracking system. Using digital photographs, fair to excellent intratester (intraclass correlation coefficient range = 0.70-0.99) and intertester (intraclass correlation coefficient range = 0.75-0.97) reliability were observed for static knee alignment and limb-length measures. An acceptable level of agreement was observed between clinical measures and digital pictures for limb-length measures. When comparing clinical measures and digital photographs with the electromagnetic tracking system, an acceptable level of agreement was observed in measures of static knee angles and limb-length measures. The use of digital photographs and an electromagnetic tracking system appears to be an efficient and reliable method to assess static knee alignment and limb-length measurements.
Lo, Wing-Sze; Ho, Sai-Yin; Wong, Bonny Yee-Man; Mak, Kwok-Kei; Lam, Tai-Hing
2011-06-01
The reliability and validity of Stunkard's Figure Rating Scale (FRS) as a measure of current body size (CBS) was established in Western adolescent girls but not in non-Western population. We examined the validity and test-retest reliability of Stunkard's FRS in assessing CBS among Chinese adolescents. Methods. In a school-based survey in Hong Kong, 5666 adolescents (boys: 45.1%; mean age 14.7 years) provided data on self-reported height and weight, CBS, perceived weight status, and health-related quality of life using the Medical Outcomes Study Short-Form version 2 (SF-12v2). Height and weight were also objectively measured. Spearman's correlation was used to assess construct validity, concurrent validity and test-retest reliability. Convergent and discriminant validity were good: CBS correlated strongly with weight and self-reported/measured BMI, but only weakly with SF-12v2. CBS correlated strongly with perceived weight status, showing concurrent validity. Spearman's correlation (r) for CBS was 0.78 for girls and 0.72 for boys indicating good test-retest reliability. Validity and reliability results did not differ significantly between senior and junior grade adolescents. Our findings support the use of Stunkard's FRS to measure body size among Chinese adolescents.
Test-retest reliability of the safe driving behavior measure for community-dwelling elderly drivers.
Song, Chiang-Soon; Lee, Joo-Hyun; Han, Sang-Woo
2016-06-01
[Purpose] The Safe Driving Behavior Measure (SDBM) is a self-report measurement tools that assesses the safe-driving behaviors of the elderly. The purpose of this study was to evaluate the test-retest reliability of the SDBM among community-dwelling elderly drivers. [Subjects and Methods] A total of sixty-one community-dwelling elderly were enrolled to investigate the reliability of the SDBM. The SDBM was assessed in two sessions that were conducted three days apart in a quiet and well-organized assessment room. That test-retest reliability of overall scores and three domain scores of the SDBM were statistically evaluated using intraclass correlation coefficients [ICC (2.1)]. Pearson correlation coefficients were used to quantify bivariate associations among the three domains of the SDBM. [Results] The SDBM demonstrated excellent rest-retest reliability for community-dwelling elderly drivers. The Cronbach alpha coefficients of the three domains of person-vehicle (0.979), person-environment (0.944), and person-vehicle-environment (0.971) of the SDBM indicate high internal consistency. [Conclusion] The results of this study suggest that the SDBM is a reliable measure for evaluating the safe- driving of automobiles by community-dwelling elderly, and is adequate for detecting changes in scores in clinical settings.
Examining the reliability of ADAS-Cog change scores.
Grochowalski, Joseph H; Liu, Ying; Siedlecki, Karen L
2016-09-01
The purpose of this study was to estimate and examine ways to improve the reliability of change scores on the Alzheimer's Disease Assessment Scale, Cognitive Subtest (ADAS-Cog). The sample, provided by the Alzheimer's Disease Neuroimaging Initiative, included individuals with Alzheimer's disease (AD) (n = 153) and individuals with mild cognitive impairment (MCI) (n = 352). All participants were administered the ADAS-Cog at baseline and 1 year, and change scores were calculated as the difference in scores over the 1-year period. Three types of change score reliabilities were estimated using multivariate generalizability. Two methods to increase change score reliability were evaluated: reweighting the subtests of the scale and adding more subtests. Reliability of ADAS-Cog change scores over 1 year was low for both the AD sample (ranging from .53 to .64) and the MCI sample (.39 to .61). Reweighting the change scores from the AD sample improved reliability (.68 to .76), but lengthening provided no useful improvement for either sample. The MCI change scores had low reliability, even with reweighting and adding additional subtests. The ADAS-Cog scores had low reliability for measuring change. Researchers using the ADAS-Cog should estimate and report reliability for their use of the change scores. The ADAS-Cog change scores are not recommended for assessment of meaningful clinical change.
Reliability analysis and initial requirements for FC systems and stacks
NASA Astrophysics Data System (ADS)
Åström, K.; Fontell, E.; Virtanen, S.
In the year 2000 Wärtsilä Corporation started an R&D program to develop SOFC systems for CHP applications. The program aims to bring to the market highly efficient, clean and cost competitive fuel cell systems with rated power output in the range of 50-250 kW for distributed generation and marine applications. In the program Wärtsilä focuses on system integration and development. System reliability and availability are key issues determining the competitiveness of the SOFC technology. In Wärtsilä, methods have been implemented for analysing the system in respect to reliability and safety as well as for defining reliability requirements for system components. A fault tree representation is used as the basis for reliability prediction analysis. A dynamic simulation technique has been developed to allow for non-static properties in the fault tree logic modelling. Special emphasis has been placed on reliability analysis of the fuel cell stacks in the system. A method for assessing reliability and critical failure predictability requirements for fuel cell stacks in a system consisting of several stacks has been developed. The method is based on a qualitative model of the stack configuration where each stack can be in a functional, partially failed or critically failed state, each of the states having different failure rates and effects on the system behaviour. The main purpose of the method is to understand the effect of stack reliability, critical failure predictability and operating strategy on the system reliability and availability. An example configuration, consisting of 5 × 5 stacks (series of 5 sets of 5 parallel stacks) is analysed in respect to stack reliability requirements as a function of predictability of critical failures and Weibull shape factor of failure rate distributions.
Tooth-size discrepancy: A comparison between manual and digital methods
Correia, Gabriele Dória Cabral; Habib, Fernando Antonio Lima; Vogel, Carlos Jorge
2014-01-01
Introduction Technological advances in Dentistry have emerged primarily in the area of diagnostic tools. One example is the 3D scanner, which can transform plaster models into three-dimensional digital models. Objective This study aimed to assess the reliability of tooth size-arch length discrepancy analysis measurements performed on three-dimensional digital models, and compare these measurements with those obtained from plaster models. Material and Methods To this end, plaster models of lower dental arches and their corresponding three-dimensional digital models acquired with a 3Shape R700T scanner were used. All of them had lower permanent dentition. Four different tooth size-arch length discrepancy calculations were performed on each model, two of which by manual methods using calipers and brass wire, and two by digital methods using linear measurements and parabolas. Results Data were statistically assessed using Friedman test and no statistically significant differences were found between the two methods (P > 0.05), except for values found by the linear digital method which revealed a slight, non-significant statistical difference. Conclusions Based on the results, it is reasonable to assert that any of these resources used by orthodontists to clinically assess tooth size-arch length discrepancy can be considered reliable. PMID:25279529
Intraday and Interday Reliability of Ultra-Short-Term Heart Rate Variability in Rugby Union Players.
Nakamura, Fábio Y; Pereira, Lucas A; Esco, Michael R; Flatt, Andrew A; Moraes, José E; Cal Abad, Cesar C; Loturco, Irineu
2017-02-01
Nakamura, FY, Pereira, LA, Esco, MR, Flatt, AA, Moraes, JE, Cal Abad, CC, and Loturco, I. Intraday and interday reliability of ultra-short-term heart rate variability in rugby union players. J Strength Cond Res 31(2): 548-551, 2017-The aim of this study was to examine the intraday and interday reliability of ultra-short-term vagal-related heart rate variability (HRV) in elite rugby union players. Forty players from the Brazilian National Rugby Team volunteered to participate in this study. The natural log of the root mean square of successive RR interval differences (lnRMSSD) assessments were performed on 4 different days. The HRV was assessed twice (intraday reliability) on the first day and once per day on the following 3 days (interday reliability). The RR interval recordings were obtained from 2-minute recordings using a portable heart rate monitor. The relative reliability of intraday and interday lnRMSSD measures was analyzed using the intraclass correlation coefficient (ICC). The typical error of measurement (absolute reliability) of intraday and interday lnRMSSD assessments was analyzed using the coefficient of variation (CV). Both intraday (ICC = 0.96; CV = 3.99%) and interday (ICC = 0.90; CV = 7.65%) measures were highly reliable. The ultra-short-term lnRMSSD is a consistent measure for evaluating elite rugby union players, in both intraday and interday settings. This study provides further validity to using this shortened method in practical field conditions with highly trained team sports athletes.
Methods for assessment of keel bone damage in poultry.
Casey-Trott, T; Heerkens, J L T; Petrik, M; Regmi, P; Schrader, L; Toscano, M J; Widowski, T
2015-10-01
Keel bone damage (KBD) is a critical issue facing the laying hen industry today as a result of the likely pain leading to compromised welfare and the potential for reduced productivity. Recent reports suggest that damage, while highly variable and likely dependent on a host of factors, extends to all systems (including battery cages, furnished cages, and non-cage systems), genetic lines, and management styles. Despite the extent of the problem, the research community remains uncertain as to the causes and influencing factors of KBD. Although progress has been made investigating these factors, the overall effort is hindered by several issues related to the assessment of KBD, including quality and variation in the methods used between research groups. These issues prevent effective comparison of studies, as well as difficulties in identifying the presence of damage leading to poor accuracy and reliability. The current manuscript seeks to resolve these issues by offering precise definitions for types of KBD, reviewing methods for assessment, and providing recommendations that can improve the accuracy and reliability of those assessments. © 2015 Poultry Science Association Inc.
Assessing the Clinical Skills of Dental Students: A Review of the Literature
ERIC Educational Resources Information Center
Taylor, Carly L.; Grey, Nick; Satterthwaite, Julian D.
2013-01-01
Education, from a student perspective, is largely driven by assessment. An effective assessment tool should be both valid and reliable, yet this is often not achieved. The aim of this literature review is to identify and appraise the evidence base for assessment tools used primarily in evaluating clinical skills of dental students. Methods:…
USDA-ARS?s Scientific Manuscript database
Accurate assessment of dietary intake of children can be challenging due to the limited reliability of current dietary assessment methods in children. While plasma carotenoid concentrations has been used to assess fruit and vegetable intake, this testing is rarely conducted in school settings in chi...
North Carolina Family Assessment Scale: Measurement Properties for Youth Mental Health Services
ERIC Educational Resources Information Center
Lee, Bethany R.; Lindsey, Michael A.
2010-01-01
Objective: The purpose of this study is to assess the reliability and validity of the North Carolina Family Assessment Scale (NCFAS) among families involved with youth mental health services. Methods: Using NCFAS data collected by child mental health intake workers with 158 families, factor analysis was conducted to assess factor structure, and…
The Future Value of Serious Games for Assessment: Where Do We Go Now?
ERIC Educational Resources Information Center
de Klerk, Sebastiaan; Kato, Pamela M.
2017-01-01
Game-based assessments will most likely be an increasing part of testing programs in future generations because they provide promising possibilities for more valid and reliable measurement of students' skills as compared to the traditional methods of assessment like paper-and-pencil tests or performance-based assessments. The current status of…
Constructing a Grounded Theory of E-Learning Assessment
ERIC Educational Resources Information Center
Alonso-Díaz, Laura; Yuste-Tosina, Rocío
2015-01-01
This study traces the development of a grounded theory of assessment in e-learning environments, a field in need of research to establish the parameters of an assessment that is both reliable and worthy of higher learning accreditation. Using grounded theory as a research method, we studied an e-assessment model that does not require physical…
AlBarakati, SF; Kula, KS; Ghoneima, AA
2012-01-01
Objective The aim of this study was to assess the reliability and reproducibility of angular and linear measurements of conventional and digital cephalometric methods. Methods A total of 13 landmarks and 16 skeletal and dental parameters were defined and measured on pre-treatment cephalometric radiographs of 30 patients. The conventional and digital tracings and measurements were performed twice by the same examiner with a 6 week interval between measurements. The reliability within the method was determined using Pearson's correlation coefficient (r2). The reproducibility between methods was calculated by paired t-test. The level of statistical significance was set at p < 0.05. Results All measurements for each method were above 0.90 r2 (strong correlation) except maxillary length, which had a correlation of 0.82 for conventional tracing. Significant differences between the two methods were observed in most angular and linear measurements except for ANB angle (p = 0.5), angle of convexity (p = 0.09), anterior cranial base (p = 0.3) and the lower anterior facial height (p = 0.6). Conclusion In general, both methods of conventional and digital cephalometric analysis are highly reliable. Although the reproducibility of the two methods showed some statistically significant differences, most differences were not clinically significant. PMID:22184624
Bijani, Ali; Esmaili, Haleh; Ghadimi, Reza; Babazadeh, Atekeh; Rezaei, Reyhaneh; G Cumming, Robert; Hosseini, Seyed Reza
2018-01-01
Background: The study was conducted to assess reliability of modified semi-quantitative food frequency questionnaire (SQFFQ) as a part of the Amirkola Health and Aging Project (AHAP). Methods: The study was carried out in a sample of 200 men and women aged 60 years and older. A 138-item SQFFQ and two 24-hour dietary recalls were completed. The reliability of SQFFQ was evaluated by comparing eighteen food groups, energy and nutrient intakes derived from both methods using Spearman and Pearson’s correlation coefficients for food groups and nutrients, respectively. Bland-Altman plots and Pitman’s tests were applied to compare the two dietary assessment methods. Results: The mean (SD) age of subjects was 68.16 (6.56) years. The average energy intake from 24-hour dietary recalls and the SQFFQ were 1470.2 and 1535.4 kcal/day, respectively. Spearman correlation coefficients, comparing food groups intake based on two dietary assessment methods ranged from 0.25 (meat) to 0.62 (tea and coffee) in men and from 0.39 (whole grains) to 0.60 (sugars) in women. Pearson correlation coefficients for energy and macronutrients were 0.53 for energy to 0.21 for zinc in male and 0.71 for energy to 0.26 for vitamin C in females. The Pitman’s test reflected the reasonable agreement between the mean energy and macronutrients of the SQFFQ and 24-hour recalls. Conclusions: The modified SQFFQ that was designed for the AHAP was found to be reliable for assessing the intake of several food groups, energy, micro-and macronutrients. PMID:29387324
Life Cycle Assessment for desalination: a review on methodology feasibility and reliability.
Zhou, Jin; Chang, Victor W-C; Fane, Anthony G
2014-09-15
As concerns of natural resource depletion and environmental degradation caused by desalination increase, research studies of the environmental sustainability of desalination are growing in importance. Life Cycle Assessment (LCA) is an ISO standardized method and is widely applied to evaluate the environmental performance of desalination. This study reviews more than 30 desalination LCA studies since 2000s and identifies two major issues in need of improvement. The first is feasibility, covering three elements that support the implementation of the LCA to desalination, including accounting methods, supporting databases, and life cycle impact assessment approaches. The second is reliability, addressing three essential aspects that drive uncertainty in results, including the incompleteness of the system boundary, the unrepresentativeness of the database, and the omission of uncertainty analysis. This work can serve as a preliminary LCA reference for desalination specialists, but will also strengthen LCA as an effective method to evaluate the environment footprint of desalination alternatives. Copyright © 2014 Elsevier Ltd. All rights reserved.
Assessing Reliability of Medical Record Reviews for the Detection of Hospital Adverse Events.
Ock, Minsu; Lee, Sang-il; Jo, Min-Woo; Lee, Jin Yong; Kim, Seon-Ha
2015-09-01
The purpose of this study was to assess the inter-rater reliability and intra-rater reliability of medical record review for the detection of hospital adverse events. We conducted two stages retrospective medical records review of a random sample of 96 patients from one acute-care general hospital. The first stage was an explicit patient record review by two nurses to detect the presence of 41 screening criteria (SC). The second stage was an implicit structured review by two physicians to identify the occurrence of adverse events from the positive cases on the SC. The inter-rater reliability of two nurses and that of two physicians were assessed. The intra-rater reliability was also evaluated by using test-retest method at approximately two weeks later. In 84.2% of the patient medical records, the nurses agreed as to the necessity for the second stage review (kappa, 0.68; 95% confidence interval [CI], 0.54 to 0.83). In 93.0% of the patient medical records screened by nurses, the physicians agreed about the absence or presence of adverse events (kappa, 0.71; 95% CI, 0.44 to 0.97). When assessing intra-rater reliability, the kappa indices of two nurses were 0.54 (95% CI, 0.31 to 0.77) and 0.67 (95% CI, 0.47 to 0.87), whereas those of two physicians were 0.87 (95% CI, 0.62 to 1.00) and 0.37 (95% CI, -0.16 to 0.89). In this study, the medical record review for detecting adverse events showed intermediate to good level of inter-rater and intra-rater reliability. Well organized training program for reviewers and clearly defining SC are required to get more reliable results in the hospital adverse event study.
2011-01-01
Background Insight in children's energy balance-related behaviours (EBRBs) and their determinants is important to inform obesity prevention research. Therefore, reliable and valid tools to measure these variables in large-scale population research are needed. Objective To examine the test-retest reliability and construct validity of the child questionnaire used in the ENERGY-project, measuring EBRBs and their potential determinants among 10-12 year old children. Methods We collected data among 10-12 year old children (n = 730 in the test-retest reliability study; n = 96 in the construct validity study) in six European countries, i.e. Belgium, Greece, Hungary, the Netherlands, Norway, and Spain. Test-retest reliability was assessed using the intra-class correlation coefficient (ICC) and percentage agreement comparing scores from two measurements, administered one week apart. To assess construct validity, the agreement between questionnaire responses and a subsequent face-to-face interview was assessed using ICC and percentage agreement. Results Of the 150 questionnaire items, 115 (77%) showed good to excellent test-retest reliability as indicated by ICCs > .60 or percentage agreement ≥ 75%. Test-retest reliability was moderate for 34 items (23%) and poor for one item. Construct validity appeared to be good to excellent for 70 (47%) of the 150 items, as indicated by ICCs > .60 or percentage agreement ≥ 75%. From the other 80 items, construct validity was moderate for 39 (26%) and poor for 41 items (27%). Conclusions Our results demonstrate that the ENERGY-child questionnaire, assessing EBRBs of the child as well as personal, family, and school-environmental determinants related to these EBRBs, has good test-retest reliability and moderate to good construct validity for the large majority of items. PMID:22152048
Radiological Determination of Postoperative Cervical Fusion: A Systematic Review.
Rhee, John M; Chapman, Jens R; Norvell, Daniel C; Smith, Justin; Sherry, Ned A; Riew, K Daniel
2015-07-01
Systematic review. To determine best criteria for radiological determination of postoperative subaxial cervical fusion to be applied to current clinical practice and ongoing future research assessing fusion to standardize assessment and improve comparability. Despite availability of multiple imaging modalities and criteria, there remains no method of determining cervical fusion with absolute certainty, nor clear consensus on specific criteria to be applied. A systematic search in MEDLINE/Cochrane Collaboration Library (through March 2014). Included studies assessed C2 to C7 via anterior or posterior approach, at 12 weeks or more postoperative, with any graft or implant. Overall body of evidence with respect to 6 posited key questions was determined using Grading of Recommendations Assessment, Development and Evaluation and Agency for Healthcare Research and Quality precepts. Of plain radiographical modalities, there is moderate evidence that the interspinous process motion method (<1 mm) is more accurate than the Cobb angle method for assessing anterior cervical fusion. Of the advanced imaging modalities, there is moderate evidence that computed tomography (CT) is more accurate and reliable than magnetic resonance imaging in assessing anterior cervical fusion. There is insufficient evidence regarding the optimal modality and criteria for assessing posterior cervical fusions and insufficient evidence to support a single time point after surgery as being optimal for determining fusion, although some evidence suggest that reliability of radiography and CT improves with increasing time postoperatively. We recommend using less than 1-mm motion as the initial modality for determining anterior cervical arthrodesis for both clinical and research applications. If further imaging is needed because of indeterminate radiographical evaluation, we recommend CT, which has relatively high accuracy and reliability, but due to greater radiation exposure and cost, it is not routinely suggested. We recommend that plain radiographs also be the initial method of determining posterior cervical fusion but suggest a lower threshold for obtaining CT scans because dynamic radiographs may not be as useful if spinous processes have been removed by laminectomy. 1.
Tomita, Machiko R; Saharan, Sumandeep; Rajendran, Sheela; Nochajski, Susan M; Schweitzer, Jo A
2014-01-01
OBJECTIVE. To identify psychometric properties of the Home Safety Self-Assessment Tool (HSSAT) to prevent falls in community-dwelling older adults. METHOD. We tested content validity, test-retest reliability, interrater reliability, construct validity, convergent and discriminant validity, and responsiveness to change. RESULTS. The content validity index was .98, the intraclass correlation coefficient for test-retest reliability was .97, and the interrater reliability was .89. The difference on identified risk factors between the use and nonuse of the HSSAT was significant (p = .005). Convergent validity with the Centers for Disease Control and Prevention Home Safety Checklist was high (r = .65), and discriminant validity with fear of falling was very low (r = .10). The responsiveness to change was moderate (standardized response mean = 0.57). CONCLUSION. The HSSAT is a reliable and valid instrument to identify fall risks in a home environment, and the HSSAT booklet is effective as educational material leading to improvement in home safety. Copyright © 2014 by the American Occupational Therapy Association, Inc.
Tabrizi, Yousef Moghadas; Zangiabadi, Nasser; Mazhari, Shahrzad; Zolala, Farzaneh
2013-01-01
Objective Motor imagery (MI) has been recently considered as an adjunct to physical rehabilitation in patients with multiple sclerosis (MS). It is necessary to assess MI abilities and benefits in patients with MS by using a reliable tool. The Kinesthetic and Visual Imagery Questionnaire (KVIQ) was recently developed to assess MI ability in patients with stroke and other disabilities. Considering the different underlying pathologies, the present study aimed to examine the validity and reliability of the KVIQ in MS patients. Method Fifteen MS patients were assessed using the KVIQ in 2 sessions (5-14days apart) by the same examiner. In the second session, the participants also completed a revised MI questionnaire (MIQ-R) as the gold standard. Intra-class correlation coefficients (ICCs) were measured to determine test-retest reliability. Spearman's correlation analysis was performed to assess concurrent validity with the MIQ-R. Furthermore, the internal consistency (Cronbach's alpha) and factorial structure of the KVIQ were studied. Results The test-retest reliability for the KVIQ was good (ICCs: total KVIQ=0.89, visual KVIQ=0.85, and kinesthetic KVIQ=0.93), and the concurrent validity between the KVIQ and MIQ-R was good (r=0.79). The KVIQ had good internal consistency, with high Cronbach's alpha (alpha=0.84). Factorial analysis showed the bi-factorial structure of the KVIQ, which was explained by visual=57.6% and kinesthetic=32.4%. Conclusions The results of the present study revealed that the KVIQ is a valid and reliable tool for assessing MI in MS patients. PMID:24271091
Validity and reliability assessment of a peer evaluation method in team-based learning classes.
Yoon, Hyun Bae; Park, Wan Beom; Myung, Sun-Jung; Moon, Sang Hui; Park, Jun-Bean
2018-03-01
Team-based learning (TBL) is increasingly employed in medical education because of its potential to promote active group learning. In TBL, learners are usually asked to assess the contributions of peers within their group to ensure accountability. The purpose of this study is to assess the validity and reliability of a peer evaluation instrument that was used in TBL classes in a single medical school. A total of 141 students were divided into 18 groups in 11 TBL classes. The students were asked to evaluate their peers in the group based on evaluation criteria that were provided to them. We analyzed the comments that were written for the highest and lowest achievers to assess the validity of the peer evaluation instrument. The reliability of the instrument was assessed by examining the agreement among peer ratings within each group of students via intraclass correlation coefficient (ICC) analysis. Most of the students provided reasonable and understandable comments for the high and low achievers within their group, and most of those comments were compatible with the evaluation criteria. The average ICC of each group ranged from 0.390 to 0.863, and the overall average was 0.659. There was no significant difference in inter-rater reliability according to the number of members in the group or the timing of the evaluation within the course. The peer evaluation instrument that was used in the TBL classes was valid and reliable. Providing evaluation criteria and rules seemed to improve the validity and reliability of the instrument.
Wickstrom, Jordan; Stergiou, Nick; Kyvelidou, Anastasia
2017-07-01
Cerebral palsy (CP) impairs an individual's ability to move and control one's posture. Unfortunately, the signs and symptoms of CP may not be apparent before age two. Evaluating sitting posture is a potential way to assess the developing mechanisms that contribute to CP. The purpose of this project was to determine the reliability of linear and nonlinear measures, including inter- and intrastage reliability, when used to analyze the center of pressure (COP) time series during the stages of sitting development in children with typical development (TD) and with/at-risk for cerebral palsy (CP). We hypothesized that nonlinear tools would be more reliable than linear tools in assessing childrens' sitting development, and reliability would increase with development. COP data was recorded for three trials at eight sessions. Linear parameters used were root mean square, range of sway for the anterior-posterior (AP) and medial-lateral (ML) directions, and sway path. Nonlinear parameters used were Approximate Entropy, the largest Lyapunov Exponent, and Correlation Dimension for the AP and ML direction. Participants consisted of 33 children with TD and 26 children with/at-risk for CP. Our results determined that COP is a moderately reliable method for assessing the development of sitting postural control in stages in both groups. Thus, clinicians may be able to use measures from COP data across stages to assess the efficacy of therapeutic interventions that are intended to improve sitting postural abilities in children with/at-risk for CP. Copyright © 2017 Elsevier B.V. All rights reserved.
Hawkins, Keith A; Jennings, Danna; Vincent, Andrea S; Gilliland, Kirby; West, Adrienne; Marek, Kenneth
2012-08-01
The automated neuropsychological assessment metrics battery-4 for PD offers the promise of a computerized approach to cognitive assessment. To assess its utility, the ANAM4-PD was administered to 72 PD patients and 24 controls along with a traditional battery. Reliability was assessed by retesting 26 patients. The cognitive efficiency score (CES; a global score) exhibited high reliability (r = 0.86). Constituent variables exhibited lower reliability. The CES correlated strongly with the traditional battery global score, but displayed weaker relationships to UPDRS scores than the traditional score. Multivariate analysis of variance revealed a significant difference between the patient and control groups in ANAM4-PD performance, with three ANAM4-PD tests, math, tower, and pursuit tracking, displaying sizeable differences. In discriminant analyses these variables were as effective as the total ANAM4-PD in classifying cases designated as impaired based on traditional variables. Principal components analyses uncovered fewer factors in the ANAM4-PD relative to the traditional battery. ANAM4-PD variables correlated at higher levels with traditional motor and processing speed variables than with untimed executive, intellectual or memory variables. The ANAM4-PD displays high global reliability, but variable subtest reliability. The battery assesses a narrower range of cognitive functions than traditional tests, and discriminates between patients and controls less effectively. Three ANAM4-PD tests, pursuit tracking, math, and tower performed as well as the total ANAM4-PD in classifying patients as cognitively impaired. These findings could guide the refinement of the ANAM4-PD as an efficient method of screening for mild to moderate cognitive deficits in PD patients. Copyright © 2012 Elsevier Ltd. All rights reserved.
Sekir, U; Yildiz, Y; Hazneci, B; Ors, F; Saka, T; Aydin, T
2008-12-01
In contrast to the single evaluation methods used in the past, the combination of multiple tests allows one to obtain a global assessment of the ankle joint. The aim of this study was to determine the reliability of the different tests in a functional test battery. Twenty-four male recreational athletes with unilateral functional ankle instability (FAI) were recruited for this study. One component of the test battery included five different functional ability tests. These tests included a single limb hopping course, single-legged and triple-legged hop for distance, and six and cross six meter hop for time. The ankle joint position sense and one leg standing test were used for evaluation of proprioception and sensorimotor control. The isokinetic strengths of the ankle invertor and evertor muscles were evaluated at a velocity of 120 degrees /s. The reliability of the test battery was assessed by calculating the intraclass correlation coefficient (ICC). Each subject was tested two times, with an interval of 3-5 days between the test sessions. The ICCs for ankle functional and proprioceptive ability showed high reliability (ICCs ranging from 0.94 to 0.98). Additionally, isokinetic ankle joint inversion and eversion strength measurements represented good to high reliability (ICCs between 0.82 and 0.98). The functional test battery investigated in this study proved to be a reliable tool for the assessment of athletes with functional ankle instability. Therefore, clinicians may obtain reliable information from the functional test battery during the assessment of ankle joint performance in patients with functional ankle instability.
Bois, Aaron J; Fening, Stephen D; Polster, Josh; Jones, Morgan H; Miniaci, Anthony
2012-11-01
Glenoid support is critical for stability of the glenohumeral joint. An accepted noninvasive method of quantifying glenoid bone loss does not exist. To perform independent evaluations of the reliability and accuracy of standard 2-dimensional (2-D) and 3-dimensional (3-D) computed tomography (CT) measurements of glenoid bone deficiency. Descriptive laboratory study. Two sawbone models were used; one served as a model for 2 anterior glenoid defects and the other for 2 anteroinferior defects. For each scapular model, predefect and defect data were collected for a total of 6 data sets. Each sample underwent 3-D laser scanning followed by CT scanning. Six physicians measured linear indicators of bone loss (defect length and width-to-length ratio) on both 2-D and 3-D CT and quantified bone loss using the glenoid index method on 2-D CT and using the glenoid index, ratio, and Pico methods on 3-D CT. The intraclass correlation coefficient (ICC) was used to assess agreement, and percentage error was used to compare radiographic and true measurements. With use of 2-D CT, the glenoid index and defect length measurements had the least percentage error (-4.13% and 7.68%, respectively); agreement was very good (ICC, .81) for defect length only. With use of 3-D CT, defect length (0.29%) and the Pico(1) method (4.93%) had the least percentage error. Agreement was very good for all linear indicators of bone loss (range, .85-.90) and for the ratio linear and Pico surface area methods used to quantify bone loss (range, .84-.98). Overall, 3-D CT results demonstrated better agreement and accuracy compared to 2-D CT. None of the methods assessed in this study using 2-D CT was found to be valid, and therefore, 2-D CT is not recommended for these methods. However, the length of glenoid defects can be reliably and accurately measured on 3-D CT. The Pico and ratio techniques are most reliable; however, the Pico(1) method accurately quantifies glenoid bone loss in both the anterior and anteroinferior locations. Future work is required to implement valid imaging techniques of glenoid bone loss into clinical practice. This is one of the only studies to date that has investigated both the reliability and accuracy of multiple indicators and quantification methods that evaluate glenoid bone loss in anterior glenohumeral instability. These data are critical to ensure valid methods are used for preoperative assessment and to determine when a glenoid bone augmentation procedure is indicated.
Griessenauer, Christoph J; Foreman, Paul; Shoja, Mohammadali M; Kicielinski, Kimberly P; Deveikis, John P; Walters, Beverly C; Harrigan, Mark R
2015-04-01
Traumatic aneurysms occur in up to 20% of blunt traumatic extracranial carotid artery injuries. Currently there is no standardized method for characterization of traumatic aneurysms. For the carotid and vertebral injury study (CAVIS), a prospective study of traumatic cerebrovascular injury, we established a method for aneurysm characterization and tested its reliability. Saccular aneurysm size was defined as the greatest linear distance between the expected location of the normal artery wall and the outer edge of the aneurysm lumen ("depth"). Fusiform aneurysm size was defined as the "depth" and longitudinal distance ("length") paralleling the normal artery. The size of the aneurysm relative to the normal artery was also assessed. Reliability measurements were made using four raters who independently reviewed 15 computed tomographic angiograms (CTAs) and 13 digital subtraction angiograms (DSAs) demonstrating a traumatic aneurysm of the internal carotid artery. Raters categorized the aneurysms as either "saccular" or "fusiform" and made measurements. Five scans of each imaging modality were repeated to evaluate intra-rater reliability. Fleiss's free-marginal multi-rater kappa (κ), Cohen's kappa (κ), and interclass correlation coefficient (ICC) determined inter- and intra-rater reliability. Inter-rater agreement as to the aneurysm "shape" was almost perfect for CTA (κ = 0.82) and DSA (κ = 0.897). Agreements on aneurysm "depth," "length," "aneurysm plus parent artery," and "parent artery" for CTA and DSA were excellent (ICC > 0.75). Intra-rater agreement as to aneurysm "shape" was substantial to almost perfect (κ > 0.60). The CAVIS method of traumatic aneurysm characterization has remarkable inter- and intra-rater reliability and will facilitate further studies of the natural history and management of extracranial cerebrovascular traumatic aneurysms. © The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Cut set-based risk and reliability analysis for arbitrarily interconnected networks
Wyss, Gregory D.
2000-01-01
Method for computing all-terminal reliability for arbitrarily interconnected networks such as the United States public switched telephone network. The method includes an efficient search algorithm to generate minimal cut sets for nonhierarchical networks directly from the network connectivity diagram. Efficiency of the search algorithm stems in part from its basis on only link failures. The method also includes a novel quantification scheme that likewise reduces computational effort associated with assessing network reliability based on traditional risk importance measures. Vast reductions in computational effort are realized since combinatorial expansion and subsequent Boolean reduction steps are eliminated through analysis of network segmentations using a technique of assuming node failures to occur on only one side of a break in the network, and repeating the technique for all minimal cut sets generated with the search algorithm. The method functions equally well for planar and non-planar networks.
Makris, Susan L.; Raffaele, Kathleen; Allen, Sandra; Bowers, Wayne J.; Hass, Ulla; Alleva, Enrico; Calamandrei, Gemma; Sheets, Larry; Amcoff, Patric; Delrue, Nathalie; Crofton, Kevin M.
2009-01-01
Objective We conducted a review of the history and performance of developmental neurotoxicity (DNT) testing in support of the finalization and implementation of Organisation of Economic Co-operation and Development (OECD) DNT test guideline 426 (TG 426). Information sources and analysis In this review we summarize extensive scientific efforts that form the foundation for this testing paradigm, including basic neurotoxicology research, interlaboratory collaborative studies, expert workshops, and validation studies, and we address the relevance, applicability, and use of the DNT study in risk assessment. Conclusions The OECD DNT guideline represents the best available science for assessing the potential for DNT in human health risk assessment, and data generated with this protocol are relevant and reliable for the assessment of these end points. The test methods used have been subjected to an extensive history of international validation, peer review, and evaluation, which is contained in the public record. The reproducibility, reliability, and sensitivity of these methods have been demonstrated, using a wide variety of test substances, in accordance with OECD guidance on the validation and international acceptance of new or updated test methods for hazard characterization. Multiple independent, expert scientific peer reviews affirm these conclusions. PMID:19165382
Stewart, Regan W; Tuerk, Peter W; Metzger, Isha W; Davidson, Tatiana M; Young, John
2016-02-01
Structured diagnostic interviews are widely considered to be the optimal method of assessing symptoms of posttraumatic stress; however, few clinicians report using structured assessments to guide clinical practice. One commonly cited impediment to these assessment approaches is the amount of time required for test administration and interpretation. Empirically keyed methods to reduce the administration time of structured assessments may be a viable solution to increase the use of standardized and reliable diagnostic tools. Thus, the present research conducted an initial feasibility study using a sample of treatment-seeking military veterans (N = 1,517) to develop a truncated assessment protocol based on the Clinician-Administered Posttraumatic Stress Disorder (PTSD) Scale (CAPS). Decision-tree analysis was utilized to identify a subset of predictor variables among the CAPS items that were most predictive of a diagnosis of PTSD. The algorithm-driven, atheoretical sequence of questions reduced the number of items administered by more than 75% and classified the validation sample at 92% accuracy. These results demonstrated the feasibility of developing a protocol to assess PTSD in a way that imposes little assessment burden while still providing a reliable categorization. (c) 2016 APA, all rights reserved).
Frost, Rachael; Levati, Sara; McClurg, Doreen; Brady, Marian; Williams, Brian
2017-06-01
To systematically review methods for measuring adherence used in home-based rehabilitation trials and to evaluate their validity, reliability, and acceptability. In phase 1 we searched the CENTRAL database, NHS Economic Evaluation Database, and Health Technology Assessment Database (January 2000 to April 2013) to identify adherence measures used in randomized controlled trials of allied health professional home-based rehabilitation interventions. In phase 2 we searched the databases of MEDLINE, Embase, CINAHL, Allied and Complementary Medicine Database, PsycINFO, CENTRAL, ProQuest Nursing and Allied Health, and Web of Science (inception to April 2015) for measurement property assessments for each measure. Studies assessing the validity, reliability, or acceptability of adherence measures. Two reviewers independently extracted data on participant and measure characteristics, measurement properties evaluated, evaluation methods, and outcome statistics and assessed study quality using the COnsensus-based Standards for the selection of health Measurement INstruments checklist. In phase 1 we included 8 adherence measures (56 trials). In phase 2, from the 222 measurement property assessments identified in 109 studies, 22 high-quality measurement property assessments were narratively synthesized. Low-quality studies were used as supporting data. StepWatch Activity Monitor validly and acceptably measured short-term step count adherence. The Problematic Experiences of Therapy Scale validly and reliably assessed adherence to vestibular rehabilitation exercises. Adherence diaries had moderately high validity and acceptability across limited populations. The Borg 6 to 20 scale, Bassett and Prapavessis scale, and Yamax CW series had insufficient validity. Low-quality evidence supported use of the Joint Protection Behaviour Assessment. Polar A1 series heart monitors were considered acceptable by 1 study. Current rehabilitation adherence measures are limited. Some possess promising validity and acceptability for certain parameters of adherence, situations, and populations and should be used in these situations. Rigorous evaluation of adherence measures in a broader range of populations is needed. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Bean, Melanie K; Raynor, Hollie A; Thornton, Laura M; Sova, Alexandra; Dunne Stewart, Mary; Mazzeo, Suzanne E
2018-04-12
Scientifically sound methods for investigating dietary consumption patterns from self-serve salad bars are needed to inform school policies and programs. To examine the reliability and validity of digital imagery for determining starting portions and plate waste of self-serve salad bar vegetables (which have variable starting portions) compared with manual weights. In a laboratory setting, 30 mock salads with 73 vegetables were made, and consumption was simulated. Each component (initial and removed portion) was weighed; photographs of weighed reference portions and pre- and post-consumption mock salads were taken. Seven trained independent raters visually assessed images to estimate starting portions to the nearest ¼ cup and percentage consumed in 20% increments. These values were converted to grams for comparison with weighed values. Intraclass correlations between weighed and digital imagery-assessed portions and plate waste were used to assess interrater reliability and validity. Pearson's correlations between weights and digital imagery assessments were also examined. Paired samples t tests were used to evaluate mean differences (in grams) between digital imagery-assessed portions and measured weights. Interrater reliabilities were excellent for starting portions and plate waste with digital imagery. For accuracy, intraclass correlations were moderate, with lower accuracy for determining starting portions of leafy greens compared with other vegetables. However, accuracy of digital imagery-assessed plate waste was excellent. Digital imagery assessments were not significantly different from measured weights for estimating overall vegetable starting portions or waste; however, digital imagery assessments slightly underestimated starting portions (by 3.5 g) and waste (by 2.1 g) of leafy greens. This investigation provides preliminary support for use of digital imagery in estimating starting portions and plate waste from school salad bars. Results might inform methods used in empirical investigations of dietary intake in schools with self-serve salad bars. Copyright © 2018 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
Human Reliability Assessments: Using the Past (Shuttle) to Predict the Future (Orion)
NASA Technical Reports Server (NTRS)
DeMott, Diana L.; Bigler, Mark A.
2017-01-01
NASA (National Aeronautics and Space Administration) Johnson Space Center (JSC) Safety and Mission Assurance (S&MA) uses two human reliability analysis (HRA) methodologies. The first is a simplified method which is based on how much time is available to complete the action, with consideration included for environmental and personal factors that could influence the human's reliability. This method is expected to provide a conservative value or placeholder as a preliminary estimate. This preliminary estimate or screening value is used to determine which placeholder needs a more detailed assessment. The second methodology is used to develop a more detailed human reliability assessment on the performance of critical human actions. This assessment needs to consider more than the time available, this would include factors such as: the importance of the action, the context, environmental factors, potential human stresses, previous experience, training, physical design interfaces, available procedures/checklists and internal human stresses. The more detailed assessment is expected to be more realistic than that based primarily on time available. When performing an HRA on a system or process that has an operational history, we have information specific to the task based on this history and experience. In the case of a Probabilistic Risk Assessment (PRA) that is based on a new design and has no operational history, providing a "reasonable" assessment of potential crew actions becomes more challenging. To determine what is expected of future operational parameters, the experience from individuals who had relevant experience and were familiar with the system and process previously implemented by NASA was used to provide the "best" available data. Personnel from Flight Operations, Flight Directors, Launch Test Directors, Control Room Console Operators, and Astronauts were all interviewed to provide a comprehensive picture of previous NASA operations. Verification of the assumptions and expectations expressed in the assessments will be needed when the procedures, flight rules, and operational requirements are developed and then finalized.
Human Reliability Assessments: Using the Past (Shuttle) to Predict the Future (Orion)
NASA Technical Reports Server (NTRS)
DeMott, Diana; Bigler, Mark
2016-01-01
NASA (National Aeronautics and Space Administration) Johnson Space Center (JSC) Safety and Mission Assurance (S&MA) uses two human reliability analysis (HRA) methodologies. The first is a simplified method which is based on how much time is available to complete the action, with consideration included for environmental and personal factors that could influence the human's reliability. This method is expected to provide a conservative value or placeholder as a preliminary estimate. This preliminary estimate or screening value is used to determine which placeholder needs a more detailed assessment. The second methodology is used to develop a more detailed human reliability assessment on the performance of critical human actions. This assessment needs to consider more than the time available, this would include factors such as: the importance of the action, the context, environmental factors, potential human stresses, previous experience, training, physical design interfaces, available procedures/checklists and internal human stresses. The more detailed assessment is expected to be more realistic than that based primarily on time available. When performing an HRA on a system or process that has an operational history, we have information specific to the task based on this history and experience. In the case of a Probabilistic Risk Assessment (PRA) that is based on a new design and has no operational history, providing a "reasonable" assessment of potential crew actions becomes more challenging. In order to determine what is expected of future operational parameters, the experience from individuals who had relevant experience and were familiar with the system and process previously implemented by NASA was used to provide the "best" available data. Personnel from Flight Operations, Flight Directors, Launch Test Directors, Control Room Console Operators and Astronauts were all interviewed to provide a comprehensive picture of previous NASA operations. Verification of the assumptions and expectations expressed in the assessments will be needed when the procedures, flight rules and operational requirements are developed and then finalized.
Organizational readiness for implementing change: a psychometric assessment of a new measure.
Shea, Christopher M; Jacobs, Sara R; Esserman, Denise A; Bruce, Kerry; Weiner, Bryan J
2014-01-10
Organizational readiness for change in healthcare settings is an important factor in successful implementation of new policies, programs, and practices. However, research on the topic is hindered by the absence of a brief, reliable, and valid measure. Until such a measure is developed, we cannot advance scientific knowledge about readiness or provide evidence-based guidance to organizational leaders about how to increase readiness. This article presents results of a psychometric assessment of a new measure called Organizational Readiness for Implementing Change (ORIC), which we developed based on Weiner's theory of organizational readiness for change. We conducted four studies to assess the psychometric properties of ORIC. In study one, we assessed the content adequacy of the new measure using quantitative methods. In study two, we examined the measure's factor structure and reliability in a laboratory simulation. In study three, we assessed the reliability and validity of an organization-level measure of readiness based on aggregated individual-level data from study two. In study four, we conducted a small field study utilizing the same analytic methods as in study three. Content adequacy assessment indicated that the items developed to measure change commitment and change efficacy reflected the theoretical content of these two facets of organizational readiness and distinguished the facets from hypothesized determinants of readiness. Exploratory and confirmatory factor analysis in the lab and field studies revealed two correlated factors, as expected, with good model fit and high item loadings. Reliability analysis in the lab and field studies showed high inter-item consistency for the resulting individual-level scales for change commitment and change efficacy. Inter-rater reliability and inter-rater agreement statistics supported the aggregation of individual level readiness perceptions to the organizational level of analysis. This article provides evidence in support of the ORIC measure. We believe this measure will enable testing of theories about determinants and consequences of organizational readiness and, ultimately, assist healthcare leaders to reduce the number of health organization change efforts that do not achieve desired benefits. Although ORIC shows promise, further assessment is needed to test for convergent, discriminant, and predictive validity.
Monitoring Energy Balance in Breast Cancer Survivors Using a Mobile App: Reliability Study
Lozano-Lozano, Mario; Galiano-Castillo, Noelia; Martín-Martín, Lydia; Pace-Bedetti, Nicolás; Fernández-Lao, Carolina; Cantarero-Villanueva, Irene
2018-01-01
Background The majority of breast cancer survivors do not meet recommendations in terms of diet and physical activity. To address this problem, we developed a mobile health (mHealth) app for assessing and monitoring healthy lifestyles in breast cancer survivors, called the Energy Balance on Cancer (BENECA) mHealth system. The BENECA mHealth system is a novel and interactive mHealth app, which allows breast cancer survivors to engage themselves in their energy balance monitoring. BENECA was designed to facilitate adherence to healthy lifestyles in an easy and intuitive way. Objective The objective of the study was to assess the concurrent validity and test-retest reliability between the BENECA mHealth system and the gold standard assessment methods for diet and physical activity. Methods A reliability study was conducted with 20 breast cancer survivors. In the study, tri-axial accelerometers (ActiGraphGT3X+) were used as gold standard for 8 consecutive days, in addition to 2, 24-hour dietary recalls, 4 dietary records, and sociodemographic questionnaires. Two-way random effect intraclass correlation coefficients, a linear regression-analysis, and a Passing-Bablok regression were calculated. Results The reliability estimates were very high for all variables (alpha≥.90). The lowest reliability was found in fruit and vegetable intakes (alpha=.94). The reliability between the accelerometer and the dietary assessment instruments against the BENECA system was very high (intraclass correlation coefficient=.90). We found a mean match rate of 93.51% between instruments and a mean phantom rate of 3.35%. The Passing-Bablok regression analysis did not show considerable bias in fat percentage, portions of fruits and vegetables, or minutes of moderate to vigorous physical activity. Conclusions The BENECA mHealth app could be a new tool to measure energy balance in breast cancer survivors in a reliable and simple way. Our results support the use of this technology to not only to encourage changes in breast cancer survivors' lifestyles, but also to remotely monitor energy balance. Trial Registration ClinicalTrials.gov NCT02817724; https://clinicaltrials.gov/ct2/show/NCT02817724 (Archived by WebCite at http://www.webcitation.org/6xVY1buCc) PMID:29588273
ERIC Educational Resources Information Center
Miciak, Jeremy; Fletcher, Jack M.; Stuebing, Karla K.; Vaughn, Sharon; Tolar, Tammy D.
2014-01-01
Few empirical investigations have evaluated learning disabilities (LD) identification methods based on a pattern of cognitive strengths and weaknesses (PSW). This study investigated the reliability and validity of two proposed PSW methods: the concordance/discordance method (C/DM) and cross battery assessment (XBA) method. Cognitive assessment…
Mahmoud, Asmaa; Abundo, Paolo; Basile, Luisanna; Albensi, Caterina; Marasco, Morena; Bellizzi, Letizia; Galasso, Franco; Foti, Calogero
2017-01-01
Summary Background In spite the instinct social&financial impact of Leg Length Discrepancy (LLD), controversial and conflicting results still exist regarding a reliable assessment/correction method. For proper management it’s essential to discriminate between anatomical&functional Leg Length Discrepancy (FLLD). With the newly invented NPoS (New Postural Solution), under the umbrella of the collaboration of PRM Department, Tor Vergata University with Baro Postural Instruments srl, positive results were observed in both measuring& compensating the hemi-pelvic antero-medial rotation in FLLD through personalized bilateral heel raise using two NPoS components: Foot Image System (FIS) and Postural Optimizer System (POS). This led our research interest to test the validity of NPoS as a preliminary step before evaluating its implementations in postural disorders. Methods After clinical evaluation, 4 subjects with FLLD have been assessed by NPoS. Over a period of 2 months, every subject was evaluated 12 times by two different operators, 48 measurements in total, results have been verified in correlation to BTS GaitLab results. Results Intra-Operator&inter-operator variability analysis showed statistically insignificant differences, while inter-method variability between NPoS and BTS parameters expressed a linear correlation. Conclusion Results suggest a significant validity of NPoS in assessment&correction of FLLD, with high degree of reproducibility with minimal operator dependency. This can be considered a base for promising clinical implications of NPoS as a reliable cost effective postural assessment/corrective tool. Level of evidence V. PMID:29264341
Lalanne, Christophe; Goujard, Cécile; Herrmann, Susan; Cheung-Lung, Christian; Brosseau, Jean-Paul; Schwartz, Yannick; Chassany, Olivier
2014-01-01
Background Electronic patient-reported outcomes (PRO) provide quick and usually reliable assessments of patients’ health-related quality of life (HRQL). Objective An electronic version of the Patient-Reported Outcomes Quality of Life-human immunodeficiency virus (PROQOL-HIV) questionnaire was developed, and its face validity and reliability were assessed using standard psychometric methods. Methods A sample of 80 French outpatients (66% male, 52/79; mean age 46.7 years, SD 10.9) were recruited. Paper-based and electronic questionnaires were completed in a randomized crossover design (2-7 day interval). Biomedical data were collected. Questionnaire version and order effects were tested on full-scale scores in a 2-way ANOVA with patients as random effects. Test-retest reliability was evaluated using Pearson and intraclass correlation coefficients (ICC, with 95% confidence interval) for each dimension. Usability testing was carried out from patients’ survey reports, specifically, general satisfaction, ease of completion, quality and clarity of user interface, and motivation to participate in follow-up PROQOL-HIV electronic assessments. Results Questionnaire version and administration order effects (N=59 complete cases) were not significant at the 5% level, and no interaction was found between these 2 factors (P=.94). Reliability indexes were acceptable, with Pearson correlations greater than .7 and ICCs ranging from .708 to .939; scores were not statistically different between the two versions. A total of 63 (79%) complete patients’ survey reports were available, and 55% of patients (30/55) reported being satisfied and interested in electronic assessment of their HRQL in clinical follow-up. Individual ratings of PROQOL-HIV user interface (85%-100% of positive responses) confirmed user interface clarity and usability. Conclusions The electronic PROQOL-HIV introduces minor modifications to the original paper-based version, following International Society for Pharmacoeconomics and Outcomes Research (ISPOR) ePRO Task Force guidelines, and shows good reliability and face validity. Patients can complete the computerized PROQOL-HIV questionnaire and the scores from the paper or electronic versions share comparable accuracy and interpretation. PMID:24769643
System Analysis by Mapping a Fault-tree into a Bayesian-network
NASA Astrophysics Data System (ADS)
Sheng, B.; Deng, C.; Wang, Y. H.; Tang, L. H.
2018-05-01
In view of the limitations of fault tree analysis in reliability assessment, Bayesian Network (BN) has been studied as an alternative technology. After a brief introduction to the method for mapping a Fault Tree (FT) into an equivalent BN, equations used to calculate the structure importance degree, the probability importance degree and the critical importance degree are presented. Furthermore, the correctness of these equations is proved mathematically. Combining with an aircraft landing gear’s FT, an equivalent BN is developed and analysed. The results show that richer and more accurate information have been achieved through the BN method than the FT, which demonstrates that the BN is a superior technique in both reliability assessment and fault diagnosis.
The impact of symptom stability on time frame and recall reliability in CFS.
Evans, Meredyth; Jason, Leonard A
This study is an investigation of the potential impact of perceived symptom stability on the recall reliability of symptom severity and frequency as reported by individuals with chronic fatigue syndrome (CFS). Symptoms were recalled using three different recall timeframes (the past week, the past month, and the past six months) and at two assessment points (with one week in between each assessment). Participants were 51 adults (45 women and 6 men), between the ages of 29 and 66 with a current diagnosis of CFS. Multilevel Model (MLM) Analyses were used to determine the optimal recall timeframe (in terms of test-retest reliability) for reporting symptoms perceived as variable and as stable over time. Headaches were recalled more reliably when they were reported as stable over time. Furthermore, the optimal timeframe in terms of test-retest reliability for stable symptoms was highly uniform, such that all Fukuda 1 CFS symptoms were more reliably recalled at the six month timeframe. Furthermore, the optimal timeframe for CFS symptoms perceived as variable, differed across symptoms. Symptom stability and recall timeframe are important to consider in order to improve the accuracy and reliability of the current methods for diagnosing this illness.
Carlson, Jim; Min, Elana; Bridges, Diane
2009-01-01
Methodology to train team behavior during simulation has received increased attention, but standard performance measures are lacking, especially at the undergraduate level. Our purposes were to develop a reliable team behavior measurement tool and explore the relationship between team behavior and the delivery of an appropriate standard of care specific to the simulated case. Authors developed a unique team measurement tool based on previous work. Trainees participated in a simulated event involving the presentation of acute dyspnea. Performance was rated by separate raters using the team behavior measurement tool. Interrater reliability was assessed. The relationship between team behavior and the standard of care delivered was explored. The instrument proved to be reliable for this case and group of raters. Team behaviors had a positive relationship with the standard of medical care delivered specific to the simulated case. The methods used provide a possible method for training and assessing team performance during simulation.
Wielenga, J M; De Vos, R; de Leeuw, R; De Haan, R J
2004-01-01
Assessment of clinimetric properties and diagnostic quality of a stress measurement scale (COMFORT scale). Sample of an open population. Neonatology department (Neonatal Intensive Care Unit), Academic Medical Centre/Emma Children's Hospital, Amsterdam, The Netherlands. One clinical expert and 9 observers observed ventilated premature born babies simultaneously. Criterion validity was assessed by correlating the COMFORT scale with the clinical judgment regarding the amount of stress. Interobserver reliability was assessed on the clinical judgment as well as on the COMFORT scale. Diagnostic qualities were evaluated with a ROC curve. On 19 ventilated prematurely born babies (mean gestational age 30 weeks, mean birth weight 1385 gm), one clinical expert and 9 observers made 30 paired observations. The criterion validity of the COMFORT scale was good (Pearson's r of 0.84). The interobserver reliability of the clinical judgment was very good (weighted Kappa 0.84). The interobserver reliability of each item varied from good to almost perfect (weighted Kappa of 0.64 for muscle tone to 1.00 on heart rate). The reliability of the total COMFORT scale score was satisfying (intra-class correlation coefficient of 0.94). The diagnostic quality of the COMFORT scale was excellent, at a cut-off point of 20 the sensitivity was 100 percent, the specificity was 77 percent, and the area under the curve (AUC) of 0.95. In this first evaluation, the COMFORT scale appears to be a valid and reliable measurement tool to assess the stress of ventilated prematurely born babies.
ERIC Educational Resources Information Center
Montgomery, Gregory P. J.; Crockford, David N.; Hecker, Kent
2010-01-01
Objective: The Coordinators of Psychiatric Education (COPE) Residency In-Training Exam is a formative exam for Canadian psychiatric residents that was reconstructed using assessment best practices. An assessment of psychometric properties was subsequently performed on the exam to ensure preliminary validity and reliability. Methods: An exam…
School Psychologists and the Assessment of Culturally and Linguistically Diverse Students
ERIC Educational Resources Information Center
Vega, Desireé; Lasser, Jon; Afifi, Amanda F. M.
2016-01-01
In recent years, school psychologists have increasingly recognized the importance of using valid and reliable methods to assess culturally and linguistically diverse (CLD) students for special education eligibility. However, little is known about their assessment practices or preparation in this area. To address these questions, a Web-based survey…
To assess the hazards and risks of possible endocrine active chemicals (EACs) there is a need for robust, validated test methods that detect perturbation of endocrine pathways of concern and provide insights reliable information as to assess to potential adverse effects on apical...
ERIC Educational Resources Information Center
Virués-Ortega, Javier; Pritchard, Kristen; Grant, Robin L.; North, Sebastian; Hurtado-Parrado, Camilo; Lee, May S. H.; Temple, Bev; Julio, Flavia; Yu, C. T.
2014-01-01
Individuals with intellectual or developmental disabilities are able to reliably express their likes and dislikes through direct preference assessment. Preferred items tend to function as rewards and can therefore be used to facilitate the acquisition of new skills and promote task engagement. A number of preference assessment methods are…
ERIC Educational Resources Information Center
Bogo, Marion; Regehr, Cheryl; Logie, Carmen; Katz, Ellen; Mylopoulos, Maria; Regehr, Glenn
2011-01-01
The development of standardized, valid, and reliable methods for assessment of students' practice competence continues to be a challenge for social work educators. In this study, the Objective Structured Clinical Examination (OSCE), originally used in medicine to assess performance through simulated interviews, was adapted for social work to…
Evaluating the Use of Criteria for Assessing Profession-Specific Communication Skills in Pharmacy
ERIC Educational Resources Information Center
Hyvarinen, Marja-Leena; Tanskanen, Paavo; Katajavuori, Nina; Isotalus, Pekka
2012-01-01
One central task in higher education is to provide students with interpersonal communication competence in their profession. To achieve this, specialised training, based on an understanding of disciplinary communication practices and appropriate assessment methods, is needed. However, there is a lack of reliable assessment instruments which are…
Reliability of real-time ultrasound for the assessment of transversus abdominis function.
Kidd, Adrian W; Magee, Scott; Richardson, Carolyn A
2002-07-01
Transversus abdominis (TrA) has now been established as a key muscle for the stabilization of the lumbar spine and sacroiliac joints. Significantly, dysfunction of this muscle has also been implicated in low back pain. Real-time ultrasound (US) is a non-invasive procedure that has the potential to evaluate objectively the function of TrA. To investigate M-mode US as a reliable method of assessing TrA function. M-mode US was used to measure the width of TrA as subjects drew in their lower abdominal wall at a controlled speed to a target depth. Eleven subjects were imaged. the measures of TrA width were reliable and ranged between 3.14mm relaxed and 6.35mm contracted. The standard error of measurement ranged between 0.18mm and 0.57mm. M-mode US provides a reliable non-invasive measure of a controlled contraction of TrA.
Cross-cultural Adaptation of the "Functional Activities Questionnaire - FAQ" for use in Brazil
Sanchez, Maria Angélica dos Santos; Correa, Pricila Cristina Ribeiro; Lourenço, Roberto Alves
2011-01-01
Objective The aim of this paper was to present the results of the first stage of cross-cultural adaptation of the Functional Activities Questionnaire (FAQ). Methods The tool was subjected to translation and re-translation, and the test-retest reliability of a proposed version for use in Brazil was analyzed. Results Of the 548 questionnaire respondents, a convenience sample of 68 informants was selected for retesting. Internal consistency was measured by Cronbach's alpha (0.95) while test-retest reliability was assessed using intra-class correlation (0.97). The findings have shown that FAQ is brief - averaging seven minutes to apply, easily understood and has good intra-rater test-retest reliability. Conclusion Our results suggest this adapted version of the FAQ is a reliable and stable tool which may be useful for assessing function in Brazilian elderly. Notwithstanding, the version should be subjected to further analysis with the aim of reaching functional equivalence. PMID:29213759
Oster, Natalia V; Carney, Patricia A; Allison, Kimberly H; Weaver, Donald L; Reisch, Lisa M; Longton, Gary; Onega, Tracy; Pepe, Margaret; Geller, Berta M; Nelson, Heidi D; Ross, Tyler R; Tosteson, Aanna N A; Elmore, Joann G
2013-02-05
Diagnostic test sets are a valuable research tool that contributes importantly to the validity and reliability of studies that assess agreement in breast pathology. In order to fully understand the strengths and weaknesses of any agreement and reliability study, however, the methods should be fully reported. In this paper we provide a step-by-step description of the methods used to create four complex test sets for a study of diagnostic agreement among pathologists interpreting breast biopsy specimens. We use the newly developed Guidelines for Reporting Reliability and Agreement Studies (GRRAS) as a basis to report these methods. Breast tissue biopsies were selected from the National Cancer Institute-funded Breast Cancer Surveillance Consortium sites. We used a random sampling stratified according to woman's age (40-49 vs. ≥50), parenchymal breast density (low vs. high) and interpretation of the original pathologist. A 3-member panel of expert breast pathologists first independently interpreted each case using five primary diagnostic categories (non-proliferative changes, proliferative changes without atypia, atypical ductal hyperplasia, ductal carcinoma in situ, and invasive carcinoma). When the experts did not unanimously agree on a case diagnosis a modified Delphi method was used to determine the reference standard consensus diagnosis. The final test cases were stratified and randomly assigned into one of four unique test sets. We found GRRAS recommendations to be very useful in reporting diagnostic test set development and recommend inclusion of two additional criteria: 1) characterizing the study population and 2) describing the methods for reference diagnosis, when applicable.
Pruitt, Sandi L; Jeffe, Donna B; Yan, Yan; Schootman, Mario
2012-04-01
Limited psychometric research has examined the reliability of self-reported measures of neighbourhood conditions, the effect of measurement error on associations between neighbourhood conditions and health, and potential differences in the reliabilities between neighbourhood strata (urban vs rural and low vs high poverty). We assessed overall and stratified reliability of self-reported perceived neighbourhood conditions using five scales (social and physical disorder, social control, social cohesion, fear) and four single items (multidimensional neighbouring). We also assessed measurement error-corrected associations of these conditions with self-rated health. Using random-digit dialling, 367 women without breast cancer (matched controls from a larger study) were interviewed twice, 2-3 weeks apart. Test-retest (intraclass correlation coefficients (ICC)/weighted κ) and internal consistency reliability (Cronbach's α) were assessed. Differences in reliability across neighbourhood strata were tested using bootstrap methods. Regression calibration corrected estimates for measurement error. All measures demonstrated satisfactory internal consistency (α ≥ 0.70) and either moderate (ICC/κ=0.41-0.60) or substantial (ICC/κ=0.61-0.80) test-retest reliability in the full sample. Internal consistency did not differ by neighbourhood strata. Test-retest reliability was significantly lower among rural (vs urban) residents for two scales (social control, physical disorder) and two multidimensional neighbouring items; test-retest reliability was higher for physical disorder and lower for one multidimensional neighbouring item among the high (vs low) poverty strata. After measurement error correction, the magnitude of associations between neighbourhood conditions and self-rated health were larger, particularly in the rural population. Research is needed to develop and test reliable measures of perceived neighbourhood conditions relevant to the health of rural populations.
Choosing a reliability inspection plan for interval censored data
Lu, Lu; Anderson-Cook, Christine Michaela
2017-04-19
Reliability test plans are important for producing precise and accurate assessment of reliability characteristics. This paper explores different strategies for choosing between possible inspection plans for interval censored data given a fixed testing timeframe and budget. A new general cost structure is proposed for guiding precise quantification of total cost in inspection test plan. Multiple summaries of reliability are considered and compared as the criteria for choosing the best plans using an easily adapted method. Different cost structures and representative true underlying reliability curves demonstrate how to assess different strategies given the logistical constraints and nature of the problem. Resultsmore » show several general patterns exist across a wide variety of scenarios. Given the fixed total cost, plans that inspect more units with less frequency based on equally spaced time points are favored due to the ease of implementation and consistent good performance across a large number of case study scenarios. Plans with inspection times chosen based on equally spaced probabilities offer improved reliability estimates for the shape of the distribution, mean lifetime, and failure time for a small fraction of population only for applications with high infant mortality rates. The paper uses a Monte Carlo simulation based approach in addition to the common evaluation based on the asymptotic variance and offers comparison and recommendation for different applications with different objectives. Additionally, the paper outlines a variety of different reliability metrics to use as criteria for optimization, presents a general method for evaluating different alternatives, as well as provides case study results for different common scenarios.« less
Cruz, Jonas P; Baldacchino, Donia R; Alquwez, Nahed
2016-06-01
Patients often resort to religious and spiritual activities to cope with physical and mental challenges. The effect of spiritual coping on overall health, adaptation and health-related quality of life among patients undergoing haemodialysis (HD) is well documented. Thus, it is essential to establish a valid and reliable instrument that can assess both the religious and non-religious coping methods in patients undergoing HD. This study aimed to assess the validity and reliability of the Spiritual Coping Strategies Scale Arabic version (SCS-A) in Saudi patients undergoing HD. A convenience sample of 60 Saudi patients undergoing HD was recruited for this descriptive, cross-sectional study. Data were collected between May and June 2015. Forward-backward translation was used to formulate the SCS-A. The SCS-A, Muslim Religiosity Scale and the Quality of Life Index Dialysis Version III were used to procure the data. Internal consistency reliability, stability reliability, factor analysis and construct validity tests were performed. Analyses were set at the 0.05 level of significance. The SCS-A showed an acceptable internal consistency and strong stability reliability over time. The EFA produced two factors (non-religious and religious coping). Satisfactory construct validity was established by the convergent and divergent validity and known-groups method. The SCS-A is a reliable and valid tool that can be used to measure the religious and non-religious coping strategies of patients undergoing HD in Saudi Arabia and other Muslim and Arabic-speaking countries. © 2016 European Dialysis and Transplant Nurses Association/European Renal Care Association.
Choosing a reliability inspection plan for interval censored data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Lu; Anderson-Cook, Christine Michaela
Reliability test plans are important for producing precise and accurate assessment of reliability characteristics. This paper explores different strategies for choosing between possible inspection plans for interval censored data given a fixed testing timeframe and budget. A new general cost structure is proposed for guiding precise quantification of total cost in inspection test plan. Multiple summaries of reliability are considered and compared as the criteria for choosing the best plans using an easily adapted method. Different cost structures and representative true underlying reliability curves demonstrate how to assess different strategies given the logistical constraints and nature of the problem. Resultsmore » show several general patterns exist across a wide variety of scenarios. Given the fixed total cost, plans that inspect more units with less frequency based on equally spaced time points are favored due to the ease of implementation and consistent good performance across a large number of case study scenarios. Plans with inspection times chosen based on equally spaced probabilities offer improved reliability estimates for the shape of the distribution, mean lifetime, and failure time for a small fraction of population only for applications with high infant mortality rates. The paper uses a Monte Carlo simulation based approach in addition to the common evaluation based on the asymptotic variance and offers comparison and recommendation for different applications with different objectives. Additionally, the paper outlines a variety of different reliability metrics to use as criteria for optimization, presents a general method for evaluating different alternatives, as well as provides case study results for different common scenarios.« less
Validation and reliability of the Turkish Utian Quality-of-Life Scale in postmenopausal women.
Abay, Halime; Kaplan, Sena
2016-04-01
There are a limited number of menopause-specific quality-of-life scales for the Turkish population. This study was conducted to evaluate the validity and reliability of the Turkish Utian Quality-of-Life Scale in postmenopausal women. The study group was comprised of 250 postmenopausal women who applied to a training and research hospital's menopause clinic in Turkey. A survey form and the Turkish Utian quality-of-Life Scale were used to collect data, and the Turkish version of Short Form-36 was used to evaluate reliability with an equivalent form. Language-validity, content-validity, and construct-validity methods were used to assess the validity of the scale, and Cronbach's α coefficient calculation and the equivalent-form reliability methods were used to assess the reliability of the scale. The Turkish Utian Quality-of-Life Scale was determined to be a valid and reliable instrument for measuring the quality of life of postmenopausal women. Confirmatory factor analysis demonstrates that the instrument fits well with 23 items and a four-factor model. The Cronbach's α coefficient for the quality-of-life domains were as follows: 0.88 overall, 0.79 health, 0.78 emotional, 0.76 sexual, and 0.75 occupational. Reliability of the instrument was confirmed through significant correlations between scores on the Turkish version of the Utian Quality-of-Life Scale and the Turkish version of the Short Form-36 (r = 0.745, P < 0.001). This research emphasizes that the Turkish Utian Quality-of-Life Scale is reliable and valid in postmenopausal women-it is a useful instrument for measuring quality of life during menopause.
ERIC Educational Resources Information Center
Lung, For-Wey; Chiang, Tung-Liang; Lin, Shio-Jean; Feng, Jui-Ying; Chen, Po-Fei; Shu, Bih-Ching
2011-01-01
The parental report instrument is the most efficient developmental detection method and has shown high validity with professional assessment instruments. The reliability and validity of the Taiwan Birth Cohort Study (TBCS) 6-, 18- and 36-month scales have already been established. In this study, the reliability and validity of the 60-month scale…
ERIC Educational Resources Information Center
Albanese, Mark A.; Jacobs, Richard M.
Preliminary psychometric data assessing the reliability and validity of a method used to measure the diagnostic reasoning and problem-solving skills of predoctoral students in orthodontia are described. The measurement approach consisted of sets of patient demographic data and dental photos and x-rays, accompanied by a set of 33 multiple-choice…
Development of Creative Behavior Observation Form: A Study on Validity and Reliability
ERIC Educational Resources Information Center
Dere, Zeynep; Ömeroglu, Esra
2018-01-01
This study, Creative Behavior Observation Form was developed to assess creativity of the children. While the study group on the reliability and validity of Creative Behavior Observation Form was being developed, 257 children in total who were at the ages of 5-6 were used as samples with stratified sampling method. Content Validity Index (CVI) and…
Q-sort assessment vs visual analog scale in the evaluation of smile esthetics.
Schabel, Brian J; McNamara, James A; Franchi, Lorenzo; Baccetti, Tiziano
2009-04-01
This study was designed to compare the reliability of the Q-sort and visual analog scale (VAS) methods for the assessment of smile esthetics. Furthermore, agreement between orthodontists and parents of orthodontic patients, and between male and female raters, was assessed in terms of subjective evaluation of the smile. Clinical photographs and digital video captures of 48 orthodontically treated patients were rated by 2 panels: 25 experienced orthodontists (15 men, 10 women) and 20 parents of the patients (8 men, 12 women). Interrater reliability of the Q-sort and VAS methods was evaluated by using single-measure and average-measure intraclass correlation (ICC). Kappa agreement and the McNemar test were used to evaluate agreement between orthodontists and parents, and between men and women, for "attractive" and "unattractive" images of smiles captured with clinical photography. The single-measure ICC coefficients showed fair to good reliability of the Q-sort and poor reliability of the VAS for measuring esthetic preferences of an individual orthodontist or parent. Both rating groups agreed significantly (P >0.05) on the total percentage of "attractive" images of smiles captured with clinical photography. Men and women, however, significantly disagreed on the total percentages of "attractive" and "unattractive" smiles. Women rated higher percentages of both image groups as "attractive" than did their male counterparts. The Q-sort was more reliable than the VAS for measuring smile esthetics. Orthodontists and parents of orthodontic patients agreed with respect to "attractive" and "unattractive" smiles. Men and women agreed poorly with respect to "attractive" and "unattractive" smiles.
A method for assessing fidelity of delivery of telephone behavioral support for smoking cessation.
Lorencatto, Fabiana; West, Robert; Bruguera, Carla; Michie, Susan
2014-06-01
Behavioral support for smoking cessation is delivered through different modalities, often guided by treatment manuals. Recently developed methods for assessing fidelity of delivery have shown that face-to-face behavioral support is often not delivered as specified in the service treatment manual. This study aimed to extend this method to evaluate fidelity of telephone-delivered behavioral support. A treatment manual and transcripts of 75 audio-recorded behavioral support sessions were obtained from the United Kingdom's national Quitline service and coded into component behavior change techniques (BCTs) using a taxonomy of 45 smoking cessation BCTs. Interrater reliability was assessed using percentage agreement. Fidelity was assessed by comparing the number of BCTs identified in the manual with those delivered in telephone sessions by 4 counselors. Fidelity was assessed according to session type, duration, counselor, and BCT. Differences between self-reported and actual BCT use were examined. Average coding reliability was high (81%). On average, 41.8% of manual-specified BCTs were delivered per session (SD = 16.2), with fidelity varying by counselor from 32% to 49%. Fidelity was highest in pre-quit sessions (46%) and for BCT "give options for additional support" (95%). Fidelity was lowest for quit-day sessions (35%) and BCT "set graded tasks" (0%). Session duration was positively correlated with fidelity (r = .585; p < .01). Significantly fewer BCTs were used than were reported as being used, t(15) = -5.52, p < .001. The content of telephone-delivered behavioral support can be reliably coded in terms of BCTs. This can be used to assess fidelity to treatment manuals and to in turn identify training needs. The observed low fidelity underlines the need to establish routine procedures for monitoring delivery of behavioral support. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-11-12
... asked to (1) identify the methods used to assess exposure or outcome, (2) discuss the validity and reliability of the methods used to classify exposure (such as job title, job exposure matrix, biomonitoring...
Three-dimensional implicit lambda methods
NASA Technical Reports Server (NTRS)
Napolitano, M.; Dadone, A.
1983-01-01
This paper derives the three dimensional lambda-formulation equations for a general orthogonal curvilinear coordinate system and provides various block-explicit and block-implicit methods for solving them, numerically. Three model problems, characterized by subsonic, supersonic and transonic flow conditions, are used to assess the reliability and compare the efficiency of the proposed methods.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-03-22
... on rigorous scientifically based research methods to assess the effectiveness of a particular... activities and programs; and (B) Includes research that-- (i) Employs systematic, empirical methods that draw... or observational methods that provide reliable and valid data across evaluators and observers, across...
Development of a reliable method to assess footwear comfort during running.
Mündermann, Anne; Nigg, Benno M; Stefanyshyn, Darren J; Humble, R Neil
2002-08-01
The purposes of this study were: (a) to determine whether subjects are able to distinguish between differences in footwear with respect to footwear comfort; and (b) to determine how reliably footwear comfort can be assessed using a visual analogue scale (VAS) and a protocol including a control condition during running. Intraclass correlation coefficients (ICCs) between comfort ratings for repeated conditions were high (ICC = 0.799). Differences in comfort ratings between the insert conditions were significant. A paired t-test revealed a significant difference in overall comfort ratings for the control insert when tested after the soft insert compared to when tested after the hard insert (P = 0.008). The results of this study showed that VASs provide a reliable measure to assess footwear comfort during running under the conditions that: (a) a control condition is included; and (b) the average comfort rating of sessions 4-6 is used. Copyright 2002 Elsevier Science B.V.
Reliability and Validity Evidence of Multiple Balance Assessments in Athletes With a Concussion
Murray, Nicholas; Salvatore, Anthony; Powell, Douglas; Reed-Jones, Rebecca
2014-01-01
Context: An estimated 300 000 sport-related concussion injuries occur in the United States annually. Approximately 30% of individuals with concussions experience balance disturbances. Common methods of balance assessment include the Clinical Test of Sensory Organization and Balance (CTSIB), the Sensory Organization Test (SOT), the Balance Error Scoring System (BESS), and the Romberg test; however, the National Collegiate Athletic Association recommended the Wii Fit as an alternative measure of balance in athletes with a concussion. A central concern regarding the implementation of the Wii Fit is whether it is reliable and valid for measuring balance disturbance in athletes with concussion. Objective: To examine the reliability and validity evidence for the CTSIB, SOT, BESS, Romberg test, and Wii Fit for detecting balance disturbance in athletes with a concussion. Data Sources: Literature considered for review included publications with reliability and validity data for the assessments of balance (CTSIB, SOT, BESS, Romberg test, and Wii Fit) from PubMed, PsycINFO, and CINAHL. Data Extraction: We identified 63 relevant articles for consideration in the review. Of the 63 articles, 28 were considered appropriate for inclusion and 35 were excluded. Data Synthesis: No current reliability or validity information supports the use of the CTSIB, SOT, Romberg test, or Wii Fit for balance assessment in athletes with a concussion. The BESS demonstrated moderate to high reliability (interclass correlation coefficient = 0.87) and low to moderate validity (sensitivity = 34%, specificity = 87%). However, the Romberg test and Wii Fit have been shown to be reliable tools in the assessment of balance in Parkinson patients. Conclusions: The BESS can evaluate balance problems after a concussion. However, it lacks the ability to detect balance problems after the third day of recovery. Further investigation is needed to establish the use of the CTSIB, SOT, Romberg test, and Wii Fit for assessing balance in athletes with concussions. PMID:24933431
Reliability and validity of the Youth Leisure-time Sedentary Behavior Questionnaire (YLSBQ).
Cabanas-Sánchez, Verónica; Martínez-Gómez, David; Esteban-Cornejo, Irene; Castro-Piñero, José; Conde-Caveda, Julio; Veiga, Óscar L
2018-01-01
To develop a questionnaire able to assess time spent by youth in a wide range of leisure-time sedentary behaviors (SB) and evaluate its test-retest reliability and criterion validity. Cross-sectional observational. The reliability sample included 194 youth, aged 10-18 years, who completed the questionnaire twice, separated by one-week interval. The validity study comprised 1207 participants aged 8-18 years. Participants wore an accelerometer for 7 consecutive days. The questionnaire was designed to assess the amount of time spent in twelve different SB during weekdays and weekends, separately. In order to avoid usual phenomenon of time over reporting, values were adjusted to real available leisure-time (LT) for each participant. Reliability was assessed by using Intraclass Correlation Coefficients (ICC) and weighted (quadratic) kappa (k), and validity was assessed by using Pearson correlation and Bland-Altman plots. The reliability of questionnaire showed a moderate-to-substantial agreement for the most (91%) of items (k=0.43-0.74; ICC=0.41-0.79) with three items (4%) reaching an almost perfect agreement (ICC=0.82-0.83). Only 'sitting and talking' evidenced fair-to-moderate reliability (k=0.27-0.39; ICC=0.34-0.46). The relationship between average sedentary time assessed by the questionnaire and accelerometry was moderate (r=0.36; p<0.001). Systematic biases were not found between questionnaire and accelerometer sedentary time for average day (r=0.05; p=0.11) but Bland-Altman plots suggest moderate discrepancies between both methods of SB measurement (mean=19.86; limits of agreement=-280.04 to 319.76). The questionnaire showed moderate to good test-retest reliability and a moderate level of validity for assessing SB in youth, similar or slightly better to previously published in this population. Copyright © 2017 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
The Development and Validation of a Rapid Assessment Tool of Primary Care in China
Mei, Jie; Liang, Yuan; Shi, LeiYu; Zhao, JingGe; Wang, YuTan; Kuang, Li
2016-01-01
Introduction. With Chinese health care reform increasingly emphasizing the importance of primary care, the need for a tool to evaluate primary care performance and service delivery is clear. This study presents a methodology for a rapid assessment of primary care organizations and service delivery in China. Methods. The study translated and adapted the Primary Care Assessment Tool-Adult Edition (PCAT-AE) into a Chinese version to measure core dimensions of primary care, namely, first contact, continuity, comprehensiveness, and coordination. A cross-sectional survey was conducted to assess the validity and reliability of the Chinese Rapid Primary Care Assessment Tool (CR-PCAT). Eight community health centers in Guangdong province have been selected to participate in the survey. Results. A total of 1465 effective samples were included for data analysis. Eight items were eliminated following principal component analysis and reliability testing. The principal component analysis extracted five multiple-item scales (first contact utilization, first contact accessibility, ongoing care, comprehensiveness, and coordination). The tests of scaling assumptions were basically met. Conclusion. The standard psychometric evaluation indicates that the scales have achieved relatively good reliability and validity. The CR-PCAT provides a rapid and reliable measure of four core dimensions of primary care, which could be applied in various scenarios. PMID:26885509
Optimized Vertex Method and Hybrid Reliability
NASA Technical Reports Server (NTRS)
Smith, Steven A.; Krishnamurthy, T.; Mason, B. H.
2002-01-01
A method of calculating the fuzzy response of a system is presented. This method, called the Optimized Vertex Method (OVM), is based upon the vertex method but requires considerably fewer function evaluations. The method is demonstrated by calculating the response membership function of strain-energy release rate for a bonded joint with a crack. The possibility of failure of the bonded joint was determined over a range of loads. After completing the possibilistic analysis, the possibilistic (fuzzy) membership functions were transformed to probability density functions and the probability of failure of the bonded joint was calculated. This approach is called a possibility-based hybrid reliability assessment. The possibility and probability of failure are presented and compared to a Monte Carlo Simulation (MCS) of the bonded joint.
Chow, Clara K.; Corsi, Daniel J.; Lock, Karen; Madhavan, Manisha; Mackie, Pam; Li, Wei; Yi, Sun; Wang, Yang; Swaminathan, Sumathi; Lopez-Jaramillo, Patricio; Gomez-Arbelaez, Diego; Avezum, Álvaro; Lear, Scott A.; Dagenais, Gilles; Teo, Koon; McKee, Martin; Yusuf, Salim
2014-01-01
Background Previous research has shown that environments with features that encourage walking are associated with increased physical activity. Existing methods to assess the built environment using geographical information systems (GIS) data, direct audit or large surveys of the residents face constraints, such as data availability and comparability, when used to study communities in countries in diverse parts of the world. The aim of this study was to develop a method to evaluate features of the built environment of communities using a standard set of photos. In this report we describe the method of photo collection, photo analysis instrument development and inter-rater reliability of the instrument. Methods/Principal Findings A minimum of 5 photos were taken per community in 86 communities in 5 countries according to a standard set of instructions from a designated central point of each community by researchers at each site. A standard pro forma derived from reviewing existing instruments to assess the built environment was developed and used to score the characteristics of each community. Photo sets from each community were assessed independently by three observers in the central research office according to the pro forma and the inter-rater reliability was compared by intra-class correlation (ICC). Overall 87% (53 of 60) items had an ICC of ≥0.70, 7% (4 of 60) had an ICC between 0.60 and 0.70 and 5% (3 of 60) items had an ICC ≤0.50. Conclusions/Significance Analysis of photos using a standardized protocol as described in this study offers a means to obtain reliable and reproducible information on the built environment in communities in very diverse locations around the world. The collection of the photographic data required minimal training and the analysis demonstrated high reliability for the majority of items of interest. PMID:25369366
3D photography is as accurate as digital planimetry tracing in determining burn wound area.
Stockton, K A; McMillan, C M; Storey, K J; David, M C; Kimble, R M
2015-02-01
In the paediatric population careful attention needs to be made concerning techniques utilised for wound assessment to minimise discomfort and stress to the child. To investigate whether 3D photography is a valid measure of burn wound area in children compared to the current clinical gold standard method of digital planimetry using Visitrak™. Twenty-five children presenting to the Stuart Pegg Paediatric Burn Centre for burn dressing change following acute burn injury were included in the study. Burn wound area measurement was undertaken using both digital planimetry (Visitrak™ system) and 3D camera analysis. Inter-rater reliability of the 3D camera software was determined by three investigators independently assessing the burn wound area. A comparison of wound area was assessed using intraclass correlation co-efficients (ICC) which demonstrated excellent agreement 0.994 (CI 0.986, 0.997). Inter-rater reliability measured using ICC 0.989 (95% CI 0.979, 0.995) demonstrated excellent inter-rater reliability. Time taken to map the wound was significantly quicker using the camera at bedside compared to Visitrak™ 14.68 (7.00)s versus 36.84 (23.51)s (p<0.001). In contrast, analysing wound area was significantly quicker using the Visitrak™ tablet compared to Dermapix(®) software for the 3D Images 31.36 (19.67)s versus 179.48 (56.86)s (p<0.001). This study demonstrates that images taken with the 3D LifeViz™ camera and assessed with Dermapix(®) software is a reliable method for wound area assessment in the acute paediatric burn setting. Copyright © 2014 Elsevier Ltd and ISBI. All rights reserved.
Do you see what I see? Mobile eye-tracker contextual analysis and inter-rater reliability.
Stuart, S; Hunt, D; Nell, J; Godfrey, A; Hausdorff, J M; Rochester, L; Alcock, L
2018-02-01
Mobile eye-trackers are currently used during real-world tasks (e.g. gait) to monitor visual and cognitive processes, particularly in ageing and Parkinson's disease (PD). However, contextual analysis involving fixation locations during such tasks is rarely performed due to its complexity. This study adapted a validated algorithm and developed a classification method to semi-automate contextual analysis of mobile eye-tracking data. We further assessed inter-rater reliability of the proposed classification method. A mobile eye-tracker recorded eye-movements during walking in five healthy older adult controls (HC) and five people with PD. Fixations were identified using a previously validated algorithm, which was adapted to provide still images of fixation locations (n = 116). The fixation location was manually identified by two raters (DH, JN), who classified the locations. Cohen's kappa correlation coefficients determined the inter-rater reliability. The algorithm successfully provided still images for each fixation, allowing manual contextual analysis to be performed. The inter-rater reliability for classifying the fixation location was high for both PD (kappa = 0.80, 95% agreement) and HC groups (kappa = 0.80, 91% agreement), which indicated a reliable classification method. This study developed a reliable semi-automated contextual analysis method for gait studies in HC and PD. Future studies could adapt this methodology for various gait-related eye-tracking studies.
ERIC Educational Resources Information Center
Liu, Xueman Lucy; de Villiers, Jill; Ning, Chunyan; Rolfhus, Eric; Hutchings, Teresa; Lee, Wendy; Jiang, Fan; Zhang, Yi Wen
2017-01-01
Purpose: With no existing gold standard for comparison, challenges arise for establishing the validity of a new standardized Mandarin language assessment normed in mainland China. Method: A new assessment, Diagnostic Receptive and Expressive Assessment of Mandarin (DREAM), was normed with a stratified sample of 969 children ages 2;6 (years;months)…
Fernandes, Marcelo José; Ruta, Danny Adolph; Ogden, Graham Richard; Pitts, Nigel Berry; Ogston, Simon Alexander
2006-02-01
To validate the Oral Health Impact Profile (OHIP)-14 in a sample of patients attending general dental practice. Patients with pathology-free impacted wisdom teeth were recruited from six general dental practices in Tayside, Scotland, and followed for a year to assess the development of problems related to impaction. The OHIP-14 was completed at baseline and at 1-year follow-up, and analysed using three different scoring methods: a summary score, a weighted and standardized score and the total number of problems reported. Instrument reliability was measured by assessing internal consistency and test-retest reliability. Construct validity was assessed using a number of variables. Linear regression was then used to model the relationship between OHIP-14 and all significantly correlated variables. Responsiveness was measured using the standardized response mean (SRM). Adjusted R(2)s and SRMs were calculated for each of the three scoring methods. Estimates for the differences between adjusted R(2)s and the differences between SRMs were obtained with 95% confidence intervals. A total of 278 and 169 patients completed the questionnaire at baseline and follow-up, respectively. Reliability - Cronbach's alpha coefficients ranged from 0.30 to 0.75. Alpha coefficients for all 14 items were 0.88 and 0.87 for baseline and follow-up, respectively. Test-retest coefficients ranged from 0.72 to 0.78. Validity - OHIP-14 scores were significantly correlated with number of teeth, education, main activity, the use of mouthwash, frequency of seeing a dentist, the reason for the last dental appointment, smoking, alcohol intake, pain and symptoms. Adjusted R(2)s ranged from 0.123 to 0.202 and there were no statistically significant differences between those for the three different scoring methods. Responsiveness - The SRMs ranged from 0.37 to 0.56 and there was a statistically significant difference between the summary scores method and the total number of problems method for symptomatic patients. The OHIP-14 is a valid and reliable measure of oral health-related quality of life in general dental practice and is responsive to third molar clinical change. The summary score method demonstrated performance as good as, or better than, the other methods studied.
Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.
2012-01-01
The purpose of this article is to help researchers avoid common pitfalls associated with reliability including incorrectly assuming that (a) measurement error always attenuates observed score correlations, (b) different sources of measurement error originate from the same source, and (c) reliability is a function of instrumentation. To accomplish our purpose, we first describe what reliability is and why researchers should care about it with focus on its impact on effect sizes. Second, we review how reliability is assessed with comment on the consequences of cumulative measurement error. Third, we consider how researchers can use reliability generalization as a prescriptive method when designing their research studies to form hypotheses about whether or not reliability estimates will be acceptable given their sample and testing conditions. Finally, we discuss options that researchers may consider when faced with analyzing unreliable data. PMID:22518107
An Empirical Framework for ePortfolio Assessment
ERIC Educational Resources Information Center
Kelly-Riley, Diane; Elliot, Norbert; Rudniy, Alex
2016-01-01
This research focuses on ePortfolio assessment strategies that yield important accountability and reporting information. Under foundational categories of reliability, validity, and fairness, we present methods of gathering evidence from ePortfolio scores and their relationship to demographic information (gender, race/ethnicity, and socio-economic…
The Assessment of Motivation within Maslow's Framework.
ERIC Educational Resources Information Center
Haymes, Michael; Green, Logan
1982-01-01
Reports progress in the development of the Needsort, a research tool, for the assessment of the three developmentally earliest, within Maslow's framework, conative needs (physiological, safety, belongingness). Discusses item analyses, item selection methods, reliability studies, and validation studies across a broad range of populations. (Author)
Howard, Steven J.; Melhuish, Edward
2016-01-01
Several methods of assessing executive function (EF), self-regulation, language development, and social development in young children have been developed over previous decades. Yet new technologies make available methods of assessment not previously considered. In resolving conceptual and pragmatic limitations of existing tools, the Early Years Toolbox (EYT) offers substantial advantages for early assessment of language, EF, self-regulation, and social development. In the current study, results of our large-scale administration of this toolbox to 1,764 preschool and early primary school students indicated very good reliability, convergent validity with existing measures, and developmental sensitivity. Results were also suggestive of better capture of children’s emerging abilities relative to comparison measures. Preliminary norms are presented, showing a clear developmental trajectory across half-year age groups. The accessibility of the EYT, as well as its advantages over existing measures, offers considerably enhanced opportunities for objective measurement of young children’s abilities to enable research and educational applications. PMID:28503022
NASA Astrophysics Data System (ADS)
Lauer, Eric A.; Corner, Brian D.; Li, Peng; Beecher, Robert M.; Deutsch, Curtis
2002-03-01
Traditionally, medical geneticists have employed visual inspection (anthroposcopy) to clinically evaluate dysmorphology. In the last 20 years, there has been an increasing trend towards quantitative assessment to render diagnosis of anomalies more objective and reliable. These methods have focused on direct anthropometry, using a combination of classical physical anthropology tools and new instruments tailor-made to describe craniofacial morphometry. These methods are painstaking and require that the patient remain still for extended periods of time. Most recently, semiautomated techniques (e.g., structured light scanning) have been developed to capture the geometry of the face in a matter of seconds. In this paper, we establish that direct anthropometry and structured light scanning yield reliable measurements, with remarkably high levels of inter-rater and intra-rater reliability, as well as validity (contrasting the two methods).
Chen, Qi; Chen, Quan; Luo, Xiaobing
2014-09-01
In recent years, due to the fast development of high power light-emitting diode (LED), its lifetime prediction and assessment have become a crucial issue. Although the in situ measurement has been widely used for reliability testing in laser diode community, it has not been applied commonly in LED community. In this paper, an online testing method for LED life projection under accelerated reliability test was proposed and the prototype was built. The optical parametric data were collected. The systematic error and the measuring uncertainty were calculated to be within 0.2% and within 2%, respectively. With this online testing method, experimental data can be acquired continuously and sufficient amount of data can be gathered. Thus, the projection fitting accuracy can be improved (r(2) = 0.954) and testing duration can be shortened.
Robot-aided assessment of lower extremity functions: a review.
Maggioni, Serena; Melendez-Calderon, Alejandro; van Asseldonk, Edwin; Klamroth-Marganska, Verena; Lünenburger, Lars; Riener, Robert; van der Kooij, Herman
2016-08-02
The assessment of sensorimotor functions is extremely important to understand the health status of a patient and its change over time. Assessments are necessary to plan and adjust the therapy in order to maximize the chances of individual recovery. Nowadays, however, assessments are seldom used in clinical practice due to administrative constraints or to inadequate validity, reliability and responsiveness. In clinical trials, more sensitive and reliable measurement scales could unmask changes in physiological variables that would not be visible with existing clinical scores.In the last decades robotic devices have become available for neurorehabilitation training in clinical centers. Besides training, robotic devices can overcome some of the limitations in traditional clinical assessments by providing more objective, sensitive, reliable and time-efficient measurements. However, it is necessary to understand the clinical needs to be able to develop novel robot-aided assessment methods that can be integrated in clinical practice.This paper aims at providing researchers and developers in the field of robotic neurorehabilitation with a comprehensive review of assessment methods for the lower extremities. Among the ICF domains, we included those related to lower extremities sensorimotor functions and walking; for each chapter we present and discuss existing assessments used in routine clinical practice and contrast those to state-of-the-art instrumented and robot-aided technologies. Based on the shortcomings of current assessments, on the identified clinical needs and on the opportunities offered by robotic devices, we propose future directions for research in rehabilitation robotics. The review and recommendations provided in this paper aim to guide the design of the next generation of robot-aided functional assessments, their validation and their translation to clinical practice.
Reliability Assessment for Low-cost Unmanned Aerial Vehicles
NASA Astrophysics Data System (ADS)
Freeman, Paul Michael
Existing low-cost unmanned aerospace systems are unreliable, and engineers must blend reliability analysis with fault-tolerant control in novel ways. This dissertation introduces the University of Minnesota unmanned aerial vehicle flight research platform, a comprehensive simulation and flight test facility for reliability and fault-tolerance research. An industry-standard reliability assessment technique, the failure modes and effects analysis, is performed for an unmanned aircraft. Particular attention is afforded to the control surface and servo-actuation subsystem. Maintaining effector health is essential for safe flight; failures may lead to loss of control incidents. Failure likelihood, severity, and risk are qualitatively assessed for several effector failure modes. Design changes are recommended to improve aircraft reliability based on this analysis. Most notably, the control surfaces are split, providing independent actuation and dual-redundancy. The simulation models for control surface aerodynamic effects are updated to reflect the split surfaces using a first-principles geometric analysis. The failure modes and effects analysis is extended by using a high-fidelity nonlinear aircraft simulation. A trim state discovery is performed to identify the achievable steady, wings-level flight envelope of the healthy and damaged vehicle. Tolerance of elevator actuator failures is studied using familiar tools from linear systems analysis. This analysis reveals significant inherent performance limitations for candidate adaptive/reconfigurable control algorithms used for the vehicle. Moreover, it demonstrates how these tools can be applied in a design feedback loop to make safety-critical unmanned systems more reliable. Control surface impairments that do occur must be quickly and accurately detected. This dissertation also considers fault detection and identification for an unmanned aerial vehicle using model-based and model-free approaches and applies those algorithms to experimental faulted and unfaulted flight test data. Flight tests are conducted with actuator faults that affect the plant input and sensor faults that affect the vehicle state measurements. A model-based detection strategy is designed and uses robust linear filtering methods to reject exogenous disturbances, e.g. wind, while providing robustness to model variation. A data-driven algorithm is developed to operate exclusively on raw flight test data without physical model knowledge. The fault detection and identification performance of these complementary but different methods is compared. Together, enhanced reliability assessment and multi-pronged fault detection and identification techniques can help to bring about the next generation of reliable low-cost unmanned aircraft.
Pain, Liza A M; Baker, Ross; Sohail, Qazi Zain; Richardson, Denyse; Zabjek, Karl; Mogk, Jeremy P M; Agur, Anne M R
2018-03-23
Altered three-dimensional (3D) joint kinematics can contribute to shoulder pathology, including post-stroke shoulder pain. Reliable assessment methods enable comparative studies between asymptomatic shoulders of healthy subjects and painful shoulders of post-stroke subjects, and could inform treatment planning for post-stroke shoulder pain. The study purpose was to establish intra-rater test-retest reliability and within-subject repeatability of a palpation/digitization protocol, which assesses 3D clavicular/scapular/humeral rotations, in asymptomatic and painful post-stroke shoulders. Repeated measurements of 3D clavicular/scapular/humeral joint/segment rotations were obtained using palpation/digitization in 32 asymptomatic and six painful post-stroke shoulders during four reaching postures (rest/flexion/abduction/external rotation). Intra-class correlation coefficients (ICCs), standard error of the measurement and 95% confidence intervals were calculated. All ICC values indicated high to very high test-retest reliability (≥0.70), with lower reliability for scapular anterior/posterior tilt during external rotation in asymptomatic subjects, and scapular medial/lateral rotation, humeral horizontal abduction/adduction and axial rotation during abduction in post-stroke subjects. All standard error of measurement values demonstrated within-subject repeatability error ≤5° for all clavicular/scapular/humeral joint/segment rotations (asymptomatic ≤3.75°; post-stroke ≤5.0°), except for humeral axial rotation (asymptomatic ≤5°; post-stroke ≤15°). This noninvasive, clinically feasible palpation/digitization protocol was reliable and repeatable in asymptomatic shoulders, and in a smaller sample of painful post-stroke shoulders. Implications for Rehabilitation In the clinical setting, a reliable and repeatable noninvasive method for assessment of three-dimensional (3D) clavicular/scapular/humeral joint orientation and range of motion (ROM) is currently required. The established reliability and repeatability of this proposed palpation/digitization protocol will enable comparative 3D ROM studies between asymptomatic and post-stroke shoulders, which will further inform treatment planning. Intra-rater test-retest repeatability, which is measured by the standard error of the measure, indicates the range of error associated with a single test measure. Therefore, clinicians can use the standard error of the measure to determine the "true" differences between pre-treatment and post-treatment test scores.
NASA Astrophysics Data System (ADS)
Tamura, Yoshinobu; Yamada, Shigeru
OSS (open source software) systems which serve as key components of critical infrastructures in our social life are still ever-expanding now. Especially, embedded OSS systems have been gaining a lot of attention in the embedded system area, i.e., Android, BusyBox, TRON, etc. However, the poor handling of quality problem and customer support prohibit the progress of embedded OSS. Also, it is difficult for developers to assess the reliability and portability of embedded OSS on a single-board computer. In this paper, we propose a method of software reliability assessment based on flexible hazard rates for the embedded OSS. Also, we analyze actual data of software failure-occurrence time-intervals to show numerical examples of software reliability assessment for the embedded OSS. Moreover, we compare the proposed hazard rate model for the embedded OSS with the typical conventional hazard rate models by using the comparison criteria of goodness-of-fit. Furthermore, we discuss the optimal software release problem for the porting-phase based on the total expected software maintenance cost.
Lin, Yu-Hua; Wang, Liching Sung
2010-08-01
The purpose of this study was to assess the reliability and validity of a Chinese version of the revised nurses professional values scale (NPVS-R). The convenient sampling method, including senior undergraduate nursing students (n=110) and clinical nurses (n=223), was applied to recruit appropriate samples from southern Taiwan. The revised nurses professional values scale (NPVS-R) was used in this study. Content validity, construct validity, internal consistency, and reliability were assessed. The final sample consisted of 286 subjects. three factors were detected in the results, accounting for 60.12% of the explained variance. The first factor was titled professionalism, and included 13 items. The second factor was named caring, and consisted of seven items. Activism was the third factor, which included six items. Overall Cronbach's alpha coefficient was 0.90, taken from values for each of the three factors of 0.88, 0.90, and 0.81, respectively. The Chinese version of the NPVS-R can be considered a reliable and valid scale for assigning values that can mark professionalism in Taiwanese nurses. Copyright 2009 Elsevier Ltd. All rights reserved.
A sensitive and reliable test instrument to assess swimming in rats with spinal cord injury.
Xu, Ning; Åkesson, Elisabet; Holmberg, Lena; Sundström, Erik
2015-09-15
For clinical translation of experimental spinal cord injury (SCI) research, evaluation of animal SCI models should include several sensorimotor functions. Validated and reliable assessment tools should be applicable to a wide range of injury severity. The BBB scale is the most widely used test instrument, but similar to most others it is used to assess open field ambulation. We have developed an assessment tool for swimming in rats with SCI, with high discriminative power and sensitivity to functional recovery after mild and severe injuries, without need for advanced test equipment. We studied various parameters of swimming in four groups of rats with thoracic SCI of different severity and a control group, for 8 weeks after surgery. Six parameters were combined in a multiple item scale, the Karolinska Institutet Swim Assessment Tool (KSAT). KSAT scores for all SCI groups showed consistent functional improvement after injury, and significant differences between the five experimental groups. The internal consistency, the inter-rater and the test-retest reliability were very high. The KSAT score was highly correlated to the cross-section area of white matter spared at the injury epicenter. Importantly, even after 8 weeks of recovery the KSAT score reliably discriminated normal animals from those inflicted by the mildest injury, and also displayed the recovery of the most severely injured rats. We conclude that this swim scale is an efficient and reliable tool to assess motor activity during swimming, and an important addition to the methods available for evaluating rat models of SCI. Copyright © 2015 Elsevier B.V. All rights reserved.
Economos, Christina D; Sacheck, Jennifer M; Kwan Ho Chui, Kenneth; Irizarry, Laura; Irizzary, Laura; Guillemont, Juliette; Collins, Jessica J; Hyatt, Raymond R
2008-04-01
Interventions aiming to modify the dietary and physical activity behaviors of young children require precise and accurate measurement tools. As part of a larger community-based project, three school-based questionnaires were developed to assess (a) fruit and vegetable intake, (b) physical activity and television (TV) viewing, and (c) perceived parental support for diet and physical activity. Test-retest reliability was performed on all questionnaires and validity was measured for fruit and vegetable intake, physical activity, and TV viewing. Eighty-four school children (8.3+/-1.1 years) were studied. Test-retest reliability was performed by administering questionnaires twice, 1 to 2 hours apart. Validity of the fruit and vegetable questionnaire was measured by direct observation, while the physical activity and TV questionnaire was validated by a parent phone interview. All three questionnaires yielded excellent test-retest reliability (P<0.001). The majority of fruit and vegetable questions and the questions regarding specific physical activities and TV viewing were valid. Low validity scores were found for questions on watching TV during breakfast or dinner. These questionnaires are reliable and valid tools to assess fruit and vegetable intake, physical activity, and TV viewing behaviors in early elementary school-aged children. Methods for assessment of children's TV viewing during meals should be further investigated because of parent-child discrepancies.
Improving the Validity and Reliability of a Health Promotion Survey for Physical Therapists
Stephens, Jaca L.; Lowman, John D.; Graham, Cecilia L.; Morris, David M.; Kohler, Connie L.; Waugh, Jonathan B.
2013-01-01
Purpose Physical therapists (PTs) have a unique opportunity to intervene in the area of health promotion. However, no instrument has been validated to measure PTs’ views on health promotion in physical therapy practice. The purpose of this study was to evaluate the content validity and test-retest reliability of a health promotion survey designed for PTs. Methods An expert panel of PTs assessed the content validity of “The Role of Health Promotion in Physical Therapy Survey” and provided suggestions for revision. Item content validity was assessed using the content validity ratio (CVR) as well as the modified kappa statistic. Therapists then participated in the test-retest reliability assessment of the revised health promotion survey, which was assessed using a weighted kappa statistic. Results Based on feedback from the expert panelists, significant revisions were made to the original survey. The expert panel reached at least a majority consensus agreement for all items in the revised survey and the survey-CVR improved from 0.44 to 0.66. Only one item on the revised survey had substantial test-retest agreement, with 55% of the items having moderate agreement and 43% poor agreement. Conclusions All items on the revised health promotion survey demonstrated at least fair validity, but few items had reasonable test-retest reliability. Further modifications should be made to strengthen the validity and improve the reliability of this survey. PMID:23754935
Gómez-Cabello, Alba; Vicente-Rodríguez, Germán; Albers, Ulrike; Mata, Esmeralda; Rodriguez-Marroyo, Jose A.; Olivares, Pedro R.; Gusi, Narcis; Villa, Gerardo; Aznar, Susana; Gonzalez-Gross, Marcela; Casajús, Jose A.; Ara, Ignacio
2012-01-01
Background The elderly EXERNET multi-centre study aims to collect normative anthropometric data for old functionally independent adults living in Spain. Purpose To describe the standardization process and reliability of the anthropometric measurements carried out in the pilot study and during the final workshop, examining both intra- and inter-rater errors for measurements. Materials and Methods A total of 98 elderly from five different regions participated in the intra-rater error assessment, and 10 different seniors living in the city of Toledo (Spain) participated in the inter-rater assessment. We examined both intra- and inter-rater errors for heights and circumferences. Results For height, intra-rater technical errors of measurement (TEMs) were smaller than 0.25 cm. For circumferences and knee height, TEMs were smaller than 1 cm, except for waist circumference in the city of Cáceres. Reliability for heights and circumferences was greater than 98% in all cases. Inter-rater TEMs were 0.61 cm for height, 0.75 cm for knee-height and ranged between 2.70 and 3.09 cm for the circumferences measured. Inter-rater reliabilities for anthropometric measurements were always higher than 90%. Conclusion The harmonization process, including the workshop and pilot study, guarantee the quality of the anthropometric measurements in the elderly EXERNET multi-centre study. High reliability and low TEM may be expected when assessing anthropometry in elderly population. PMID:22860013
Griew, Pippa; Hillsdon, Melvyn; Foster, Charlie; Coombes, Emma; Jones, Andy; Wilkinson, Paul
2013-08-23
Walking for physical activity is associated with substantial health benefits for adults. Increasingly research has focused on associations between walking behaviours and neighbourhood environments including street characteristics such as pavement availability and aesthetics. Nevertheless, objective assessment of street-level data is challenging. This research investigates the reliability of a new street characteristic audit tool designed for use with Google Street View, and assesses levels of agreement between computer-based and on-site auditing. The Forty Area STudy street VIEW (FASTVIEW) tool, a Google Street View based audit tool, was developed incorporating nine categories of street characteristics. Using the tool, desk-based audits were conducted by trained researchers across one large UK town during 2011. Both inter and intra-rater reliability were assessed. On-site street audits were also completed to test the criterion validity of the method. All reliability scores were assessed by percentage agreement and the kappa statistic. Within-rater agreement was high for each category of street characteristic (range: 66.7%-90.0%) and good to high between raters (range: 51.3%-89.1%). A high level of agreement was found between the Google Street View audits and those conducted in-person across the nine categories examined (range: 75.0%-96.7%). The audit tool was found to provide a reliable and valid measure of street characteristics. The use of Google Street View to capture street characteristic data is recommended as an efficient method that could substantially increase potential for large-scale objective data collection.
de Vasconcelos, Rodrigo Antunes; Bevilaqua-Grossi, Débora; Shimano, Antonio Carlos; Paccola, Cleber Jansen; Salvini, Tânia Fátima; Prado, Christiane Lanatovits; Junior, Wilson A. Mello
2015-01-01
Objectives: The aim of this study was to evaluate the reliability and validity of a modified isometric dynamometer (MID) in performance deficits of the knee extensor and flexor muscles in normal individuals and in those with ACL reconstructions. Methods: Sixty male subjects were invited to participate of the study, being divided into three groups with 20 subjects each: control group (GC), group of individuals with ACL reconstruction with patellar tendon graft (GTP, and group of individuals with ACL reconstruction with hamstrings graft (GTF). All individuals performed isometric tests in the MID, muscular strength deficits collected were subsequently compared to the tests performed on the Biodex System 3 operating in the isometric and isokinetic mode at speeds of 60°/s and 180o/s. Intraclass ICC correlation calculations were done in order to assess MID reliability, specificity, sensitivity and Kappa's consistency coefficient calculations, respectively, for assessing the MID's validity in detecting muscular deficits and intra- and intergroup comparisons when performing the four strength tests using the ANOVA method. Results: The modified isometric dynamometer (MID) showed excellent reliability and good validity in the assessment of the performance of the knee extensor and flexor muscles groups. In the comparison between groups, the GTP showed significantly greater deficits as compared to the GTF and GC groups. Conclusion: Isometric dynamometers connected to mechanotherapy equipments could be an alternative option to collect data concerning performance deficits of the extensor and flexor muscles groups of the knee in subjects with ACL reconstruction. PMID:27004175
A Comprehensive Critique and Review of Published Measures of Acne Severity
Furber, Gareth; Leach, Matthew; Segal, Leonie
2016-01-01
Objective: Acne vulgaris is a dynamic, complex condition that is notoriously difficult to evaluate. The authors set out to critically evaluate currently available measures of acne severity, particularly in terms of suitability for use in clinical trials. Design: A systematic review was conducted to identify methods used to measure acne severity, using MEDLINE, CINAHL, Scopus, and Wiley Online. Each method was critically reviewed and given a score out of 13 based on eight quality criteria under two broad groupings of psychometric testing and suitability for research and evaluation. Results: Twenty-four methods for assessing acne severity were identified. Four scales received a quality score of zero, and 11 scored ≤3. The highest rated scales achieved a total score of 6. Six scales reported strong inter-rater reliability (ICC>0.75), and four reported strong intra-rater reliability (ICC>0.75). The poor overall performance of most scales, largely characterized by the absence of reliability testing or evidence for independent assessment and validation indicates that generally, their application in clinical trials is not supported. Conclusion: This review and appraisal of instruments for measuring acne severity supports previously identified concerns regarding the quality of published measures. It highlights the need for a valid and reliable acne severity scale, especially for use in research and evaluation. The ideal scale would demonstrate adequate validation and reliability and be easily implemented for third-party analysis. The development of such a scale is critical to interpreting results of trials and facilitating the pooling of results for systematic reviews and meta-analyses. PMID:27672410
Bergamin, Marco; Gobbo, Stefano; Bullo, Valentina; Vendramin, Barbara; Duregon, Federica; Frizziero, Antonio; Di Blasio, Andrea; Cugusi, Lucia; Zaccaria, Marco; Ermolao, Andrea
2017-01-01
Summary Background Lower extremity muscle mass, strength, power, and physical performance are critical determinants of independent functioning in later life. Isokinetic dynamometers are becoming very common in assessing different features of muscle strength, in both research and clinical practice; however, reliability studies are still needed to support the extended use of those devices. Objective The purpose of this study is to assess the test-retest reliability of knee and ankle isokinetic and isometric strength testing protocols in a sample of older healthy subjects, using a new and untested isokinetic multi-joint evaluation system. Methods Sixteen male and fourteen female older adults (mean age 65.2 ± 4.6 years) were assessed in two testing sessions. Each participant performed a randomized testing procedure that includes different isometric and isokinetic tests for knee and ankle joints. Results All participants concluded the trial safety and no subject reported any discomfort throughout the overall assessment. Coefficients of correlation between measures were calculated showing moderate to strong effects among all test-retest assessments and paired-sample t test showed only one significant difference (p<0.05) in the maximal isokinetic bilateral knee flexion torque. Conclusions The multi-joint evaluation system for the assessment of knee and ankle isokinetic and isometric strength provided reliable test-retest measures in healthy older adults. Level of evidence Ib. PMID:29264344
Jayaprakash, Paul T
2015-01-01
Establishing identification during skull-photo superimposition relies on correlating the salient morphological features of an unidentified skull with those of a face-image of a suspected dead individual using image overlay processes. Technical progression in the process of overlay has included the incorporation of video cameras, image-mixing devices and software that enables real-time vision-mixing. Conceptual transitions occur in the superimposition methods that involve 'life-size' images, that achieve orientation of the skull to the posture of the face in the photograph and that assess the extent of match. A recent report on the reliability of identification using the superimposition method adopted the currently prevalent methods and suggested an increased rate of failures when skulls were compared with related and unrelated face images. The reported reduction in the reliability of the superimposition method prompted a review of the transition in the concepts that are involved in skull-photo superimposition. The prevalent popular methods for visualizing the superimposed images at less than 'life-size', overlaying skull-face images by relying on the cranial and facial landmarks in the frontal plane when orienting the skull for matching and evaluating the match on a morphological basis by relying on mix-mode alone are the major departures in the methodology that may have reduced the identification reliability. The need to reassess the reliability of the method that incorporates the concepts which have been considered appropriate by the practitioners is stressed. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
A Reliable Method to Measure Lip Height Using Photogrammetry in Unilateral Cleft Lip Patients.
van der Zeeuw, Frederique; Murabit, Amera; Volcano, Johnny; Torensma, Bart; Patel, Brijesh; Hay, Norman; Thorburn, Guy; Morris, Paul; Sommerlad, Brian; Gnarra, Maria; van der Horst, Chantal; Kangesu, Loshan
2015-09-01
There is still no reliable tool to determine the outcome of the repaired unilateral cleft lip (UCL). The aim of this study was therefore to develop an accurate, reliable tool to measure vertical lip height from photographs. The authors measured the vertical height of the cutaneous and vermilion parts of the lip in 72 anterior-posterior view photographs of 17 patients with repairs to a UCL. Points on the lip's white roll and vermillion were marked on both the cleft and the noncleft sides on each image. Two new concepts were tested. First, photographs were standardized using the horizontal (medial to lateral) eye fissure width (EFW) for calibration. Second, the authors tested the interpupillary line (IPL) and the alar base line (ABL) for their reliability as horizontal lines of reference. Measurements were taken by 2 independent researchers, at 2 different time points each. Overall 2304 data points were obtained and analyzed. Results showed that the method was very effective in measuring the height of the lip on the cleft side with the noncleft side. When using the IPL, inter- and intra-rater reliability was 0.99 to 1.0, with the ABL it varied from 0.91 to 0.99 with one exception at 0.84. The IPL was easier to define because in some subjects the overhanging nasal tip obscured the alar base and gave more consistent measurements possibly because the reconstructed alar base was sometimes indistinct. However, measurements from the IPL can only give the percentage difference between the left and right sides of the lip, whereas those from the ABL can also give exact measurements. Patient examples were given that show how the measurements correlate with clinical assessment. The authors propose this method of photogrammetry with the innovative use of the IPL as a reliable horizontal plane and use of the EFW for calibration as a useful and reliable tool to assess the outcome of UCL repair.
WEIGHT OF EVIDENCE IN ECOLOGICAL ASSESSMENT
This document provides guidance on methods for weighing ecological evidence using a a standard framework consisting of three steps: assemble evidence, weight evidence and weigh the body of evidence. Use of the methods will improve the consistency and reliability of WoE-based asse...
Dudley, Lisa A.; Smith, Craig A.; Olson, Brandon K.; Chimera, Nicole J.
2013-01-01
Objective. The Tuck Jump Assessment (TJA), a clinical plyometric assessment, identifies 10 jumping and landing technique flaws. The study objective was to investigate TJA interrater and intrarater reliability with raters of different educational and clinical backgrounds. Methods. 40 participants were video recorded performing the TJA using published protocol and instructions. Five raters of varied educational and clinical backgrounds scored the TJA. Each score of the 10 technique flaws was summed for the total TJA score. Approximately one month later, 3 raters scored the videos again. Intraclass correlation coefficients determined interrater (5 and 3 raters for first and second session, resp.) and intrarater (3 raters) reliability. Results. Interrater reliability with 5 raters was poor (ICC = 0.47; 95% confidence intervals (CI) 0.33–0.62). Interrater reliability between 3 raters who completed 2 scoring sessions improved from 0.52 (95% CI 0.35–0.68) for session one to 0.69 (95% CI 0.55–0.81) for session two. Intrarater reliability was poor to moderate, ranging from 0.44 (95% CI 0.22–0.68) to 0.72 (95% CI 0.55–0.84). Conclusion. Published protocol and training of raters were insufficient to allow consistent TJA scoring. There may be a learned effect with the TJA since interrater reliability improved with repetition. TJA instructions and training should be modified and enhanced before clinical implementation. PMID:26464881
Kandasamy, Ram; Lee, Andrea T; Morgan, Michael M
2017-12-01
The development of new anti-migraine treatments is limited by the difficulty inassessing migraine pain in laboratory animals. Depression of activity is one of the few diagnostic criteria formigraine that can be mimicked in rats. The goal of the present study was to test the hypothesis thatdepression of home cage wheel running is a reliable and clinically relevant method to assess migraine painin rats. Adult female rats were implanted with a cannula to inject allyl isothiocyanate (AITC) onto the dura to induce migraine pain, as has been shown before. Rats recovered from implantation surgery for 8 days in cages containing a running wheel. Home cage wheel running was recorded 23 h a day. AITC and the migraine medication sumatriptan were administered in the hour prior to onset of the dark phase. Administration of AITC caused a concentration-dependent decrease in wheel running that lasted 3 h. The duration and magnitude of AITC-induced depression of wheel running was consistent following three repeated injections spaced 48 h apart. Administration of sumatriptan attenuated AITC-induced depressionof wheel running when a large dose (1 mg/kg) was administered immediately following AITC administration. Wheel running patterns did not change when sumatriptan was given to naïve rats. These data indicate that home cage wheel running is a sensitive, reliable, and clinically relevant method to assess migraine pain in the rat.
Unicomb, Rachael; Colyvas, Kim; Harrison, Elisabeth; Hewat, Sally
2015-06-01
Case-study methodology studying change is often used in the field of speech-language pathology, but it can be criticized for not being statistically robust. Yet with the heterogeneous nature of many communication disorders, case studies allow clinicians and researchers to closely observe and report on change. Such information is valuable and can further inform large-scale experimental designs. In this research note, a statistical analysis for case-study data is outlined that employs a modification to the Reliable Change Index (Jacobson & Truax, 1991). The relationship between reliable change and clinical significance is discussed. Example data are used to guide the reader through the use and application of this analysis. A method of analysis is detailed that is suitable for assessing change in measures with binary categorical outcomes. The analysis is illustrated using data from one individual, measured before and after treatment for stuttering. The application of this approach to assess change in categorical, binary data has potential application in speech-language pathology. It enables clinicians and researchers to analyze results from case studies for their statistical and clinical significance. This new method addresses a gap in the research design literature, that is, the lack of analysis methods for noncontinuous data (such as counts, rates, proportions of events) that may be used in case-study designs.
Kenyon, Lisa K.; Elliott, James M; Cheng, M. Samuel
2016-01-01
Purpose/Background Despite the availability of various field-tests for many competitive sports, a reliable and valid test specifically developed for use in men's gymnastics has not yet been developed. The Men's Gymnastics Functional Measurement Tool (MGFMT) was designed to assess sport-specific physical abilities in male competitive gymnasts. The purpose of this study was to develop the MGFMT by establishing a scoring system for individual test items and to initiate the process of establishing test-retest reliability and construct validity. Methods A total of 83 competitive male gymnasts ages 7-18 underwent testing using the MGFMT. Thirty of these subjects underwent re-testing one week later in order to assess test-retest reliability. Construct validity was assessed using a simple regression analysis between total MGFMT scores and the gymnasts’ USA-Gymnastics competitive level to calculate the coefficient of determination (r2). Test-retest reliability was analyzed using Model 1 Intraclass correlation coefficients (ICC). Statistical significance was set at the p<0.05 level. Results The relationship between total MGFMT scores and subjects’ current USA-Gymnastics competitive level was found to be good (r2 = 0.63). Reliability testing of the MGFMT composite test score showed excellent test-retest reliability over a one-week period (ICC = 0.97). Test-retest reliability of the individual component tests ranged from good to excellent (ICC = 0.75-0.97). Conclusions The results of this study provide initial support for the construct validity and test-retest reliability of the MGFMT. Level of Evidence Level 3 PMID:27999723
Intersession reliability of fMRI activation for heat pain and motor tasks
Quiton, Raimi L.; Keaser, Michael L.; Zhuo, Jiachen; Gullapalli, Rao P.; Greenspan, Joel D.
2014-01-01
As the practice of conducting longitudinal fMRI studies to assess mechanisms of pain-reducing interventions becomes more common, there is a great need to assess the test–retest reliability of the pain-related BOLD fMRI signal across repeated sessions. This study quantitatively evaluated the reliability of heat pain-related BOLD fMRI brain responses in healthy volunteers across 3 sessions conducted on separate days using two measures: (1) intraclass correlation coefficients (ICC) calculated based on signal amplitude and (2) spatial overlap. The ICC analysis of pain-related BOLD fMRI responses showed fair-to-moderate intersession reliability in brain areas regarded as part of the cortical pain network. Areas with the highest intersession reliability based on the ICC analysis included the anterior midcingulate cortex, anterior insula, and second somatosensory cortex. Areas with the lowest intersession reliability based on the ICC analysis also showed low spatial reliability; these regions included pregenual anterior cingulate cortex, primary somatosensory cortex, and posterior insula. Thus, this study found regional differences in pain-related BOLD fMRI response reliability, which may provide useful information to guide longitudinal pain studies. A simple motor task (finger-thumb opposition) was performed by the same subjects in the same sessions as the painful heat stimuli were delivered. Intersession reliability of fMRI activation in cortical motor areas was comparable to previously published findings for both spatial overlap and ICC measures, providing support for the validity of the analytical approach used to assess intersession reliability of pain-related fMRI activation. A secondary finding of this study is that the use of standard ICC alone as a measure of reliability may not be sufficient, as the underlying variance structure of an fMRI dataset can result in inappropriately high ICC values; a method to eliminate these false positive results was used in this study and is recommended for future studies of test–retest reliability. PMID:25161897
Climie, Rachel E D; Schultz, Martin G; Nikolic, Sonja B; Ahuja, Kiran D K; Fell, James W; Sharman, James E
2012-04-01
Noninvasive central blood pressure (BP) independently predicts mortality, but current methods are operator-dependent, requiring skill to obtain quality recordings. The aims of this study were first, to determine the validity of an automatic, upper arm oscillometric cuff method for estimating central BP (O(CBP)) by comparison with the noninvasive reference standard of radial tonometry (T(CBP)). Second, we determined the intratest and intertest reliability of O(CBP). To assess validity, central BP was estimated by O(CBP) (Pulsecor R6.5B monitor) and compared with T(CBP) (SphygmoCor) in 47 participants free from cardiovascular disease (aged 57 ± 9 years) in supine, seated, and standing positions. Brachial mean arterial pressure (MAP) and diastolic BP (DBP) from the O(CBP) device were used to calibrate in both devices. Duplicate measures were recorded in each position on the same day to assess intratest reliability, and participants returned within 10 ± 7 days for repeat measurements to assess intertest reliability. There was a strong intraclass correlation (ICC = 0.987, P < 0.001) and small mean difference (1.2 ± 2.2 mm Hg) for central systolic BP (SBP) determined by O(CBP) compared with T(CBP). Ninety-six percent of all comparisons (n = 495 acceptable recordings) were within 5 mm Hg. With respect to reliability, there were strong correlations but higher limits of agreement for the intratest (ICC = 0.975, P < 0.001, mean difference 0.6 ± 4.5 mm Hg) and intertest (ICC = 0.895, P < 0.001, mean difference 4.3 ± 8.0 mm Hg) comparisons. Estimation of central SBP using cuff oscillometry is comparable to radial tonometry and has good reproducibility. As a noninvasive, relatively operator-independent method, O(CBP) may be as useful as T(CBP) for estimating central BP in clinical practice.
Methods of Measurement in epidemiology: Sedentary Behaviour
Atkin, Andrew J; Gorely, Trish; Clemes, Stacy A; Yates, Thomas; Edwardson, Charlotte; Brage, Soren; Salmon, Jo; Marshall, Simon J; Biddle, Stuart JH
2012-01-01
Background Research examining sedentary behaviour as a potentially independent risk factor for chronic disease morbidity and mortality has expanded rapidly in recent years. Methods We present a narrative overview of the sedentary behaviour measurement literature. Subjective and objective methods of measuring sedentary behaviour suitable for use in population-based research with children and adults are examined. The validity and reliability of each method is considered, gaps in the literature specific to each method identified and potential future directions discussed. Results To date, subjective approaches to sedentary behaviour measurement, e.g. questionnaires, have focused predominantly on TV viewing or other screen-based behaviours. Typically, such measures demonstrate moderate reliability but slight to moderate validity. Accelerometry is increasingly being used for sedentary behaviour assessments; this approach overcomes some of the limitations of subjective methods, but detection of specific postures and postural changes by this method is somewhat limited. Instruments developed specifically for the assessment of body posture have demonstrated good reliability and validity in the limited research conducted to date. Miniaturization of monitoring devices, interoperability between measurement and communication technologies and advanced analytical approaches are potential avenues for future developments in this field. Conclusions High-quality measurement is essential in all elements of sedentary behaviour epidemiology, from determining associations with health outcomes to the development and evaluation of behaviour change interventions. Sedentary behaviour measurement remains relatively under-developed, although new instruments, both objective and subjective, show considerable promise and warrant further testing. PMID:23045206
ASSOCIATIONS BETWEEN THREE CLINICAL ASSESSMENT TOOLS FOR POSTURAL STABILITY
Saxion, Casie E.; Cameron, Kenneth L.; Gerber, J. Parry
2010-01-01
Study Design: Clinical Measurement, Correlation, Reliability Objectives: To assess the relationship between the Single Leg Balance (SLB), modified Balance Error Scoring System (mBESS), and modified Star Excursion Balance (mSEBT) tests and secondarily to assess inter-rater and test-retest reliability of these tests. Background: Ankle sprains often result in chronic instability and dysfunction. Several clinical tests assess postural deficits as a potential cause of this dysfunction; however, limited information exists pertaining to the relationship that these tests have with one another. Methods: Two independent examiners measured the performance of 34 healthy participants completing the SLB Test, mBESS test, and mSEBT at two different time periods. The relationship between tests was assessed using the Pearson Correlation and Fisher's Exact Tests. Inter-rater and test-retest reliability were assessed using the intraclass correlation coefficient (ICC) and Kappa statistics. Results: A significant correlation (r = -0.35) was observed between the mSEBT and the mBESS. Fisher's Exact Test showed a significant association between the SLB Test and mBESS (P = .048), but no association between the SLB and mSEBT (P = 1.000). Inter-rater reliability was excellent for the mSEBT and fair for the mBESS (ICCs of .91 and .61 respectively). Excellent agreement was observed between raters for the SLB test (k = 1.00). Test-retest reliability was excellent for the mSEBT (ICC = 0.98) and fair for the mBESS (ICC = 0.74). There was poor test-retest agreement for the SLB test (k = .211). Conclusion: There was a significant relationship observed between the SLB Test, mBESS test, and mSEBT: however; strength of association measures showed limited overlap between these tests. This suggests that these tests are interrelated but may not assess equal components of postural stability. PMID:21589668
Ida, Mitsuru; Naito, Yusuke; Tanaka, Yuu; Matsunari, Yasunori; Inoue, Satoki; Kawaguchi, Masahiko
2017-08-01
The avoidance of postoperative functional disability is one of the most important concerns of patients facing surgery, but methods to evaluate disability have not been definitively established. The aim of our study was to evaluate the feasibility, reliability, and validity of the Japanese version of the 12-item World Health Organization Disability Assessment Schedule-2 (WHODAS 2.0-J) in preoperative patients. Individuals aged ≥55 years who were scheduled to undergo surgery in a tertiary-care hospital in Japan between April 2016 and September 2016 were eligible for enrolment in the study. All patients were assessed preoperatively using the WHODAS 2.0-J, the 8-Item Short Form (SF-8) questionnaire, and the Tokyo Metropolitan Institute of Gerontology Index (TMIG Index). The feasibility, reliability, and validity of WHODAS2.0-J were evaluated using response rate, Cronbach's alpha (a measure of reliability), and the correlation between the WHODAS 2.0-J and the SF-8 questionnaire and TMIG Index, respectively. A total of 934 patients were enrolled in the study during the study period, of whom 930 completed the WHODAS 2.0-J (response rate 99.5%) preoperatively. Reliability and validity were assessed in the 898 patients who completed all three assessment tools (WHODAS 2.0-J, SF-8 questionnaire, and TMIG Index) and for whom all demographic data were available. Cronbach's alpha was 0.92. The total score of the WHODAS 2.0-J showed a mild or moderate correlation with the SF-8 questionnaire and TMIG Index (r = -0.63 to -0.34). The WHODAS 2.0-J is a feasible, reliable, and valid instrument for evaluating preoperative functional disability in surgical patients.
Hughes, Christopher; Campbell, Jacob; Mukhopadhyay, Swagoto; McCormack, Susan; Silverman, Richard; Lalikos, Janice; Babigian, Alan; Castiglione, Charles
2017-09-01
Reconstructive surgical care can play a vital role in the resource-poor settings of low- and middle-income countries. Telemedicine platforms can improve the efficiency and effectiveness of surgical care. The purpose of this study is to determine whether remote digital video evaluations are reliable in the context of a short-term plastic surgical intervention. The setting for this study was a district hospital located in Latacunga, Ecuador. Participants were 27 consecutive patients who presented for operative repair of cleft lip and palate. We calculated kappa coefficients for reliability between in-person and remote digital video assessments for the classification of cleft lip and palate between two separate craniofacial surgeons. We hypothesized that the technology would be a reliable method of preoperative assessment for cleft disease. Of the 27 (81.4%) participants, 22 received operative treatment for their cleft disorder. Mean age was 11.1 ± 8.3 years. Patients presented with a spectrum of disorders, including cleft lip (24 of 27, 88.9%), cleft palate (19 of 27, 70.4%), and alveolar cleft (19 of 27, 70.4%). We found a 95.7% agreement between observers for cleft lip with substantial reliability (κ = .78, P < .01). There was an 82.6% agreement between observers for cleft palate, with a moderate interrater reliability (κ = .55, P = .01). We found only a 47.8% agreement between observers for alveolar cleft with a nonsignificant, weak kappa agreement (κ = .06, P = .74). Remote digital assessments are a reliable way to preoperatively diagnose cleft lip and palate in the context of short-term plastic surgical interventions in low- and middle-income countries. Future work will evaluate the potential for real-time, telemedicine assessments to reduce cost and improve clinical effectiveness in global plastic surgery.
Ahmed, Ashraf; Qayed, Khalil Ibrahim; Abdulrahman, Mahera; Tavares, Walter; Rosenfeld, Jack
2014-08-01
Numerous studies have shown that multiple mini-interviews (MMI) provides a standard, fair, and more reliable method for assessing applicants. This article presents the first MMI experience for selection of medical residents in the Middle East culture and an Arab country. In 2012, we started using the MMI in interviewing applicants to the residency program of Dubai Health Authority. This interview process consisted of eight, eight-minute structured interview scenarios. Applicants rotated through the stations, each with its own interviewer and scenario. They read the scenario and were requested to discuss the issues with the interviewers. Sociodemographic and station assessment data provided for each applicant were analyzed to determine whether the MMI was a reliable assessment of the non-clinical attributes in the present setting of an Arab country. One hundred and eighty-seven candidates from 27 different countries were interviewed for Dubai Residency Training Program using MMI. They were graduates of 5 medical universities within United Arab Emirates (UAE) and 60 different universities outside UAE. With this applicant's pool, a MMI with eight stations, produced absolute and relative reliability of 0.8 and 0.81, respectively. The person × station interaction contributed 63% of the variance components, the person contributed 34% of the variance components, and the station contributed 2% of the variance components. The MMI has been used in numerous universities in English speaking countries. The MMI evaluates non-clinical attributes and this study provides further evidence for its reliability but in a different country and culture. The MMI offers a fair and more reliable assessment of applicants to medical residency programs. The present data show that this assessment technique applied in a non-western country and Arab culture still produced reliable results.
ERIC Educational Resources Information Center
Ginsburg, Herbert P.; Lee, Young-Sun; Pappas, Sandra
2016-01-01
Formative assessment involves the gathering of information that can guide the teaching of individual or groups of children. This approach requires a sound understanding of children's thinking and learning, as well as an effective method for gaining the information. We propose that formative assessment should employ a version of clinical…
Using Self-Assessments to Detect Workshop Success: Do They Work?
ERIC Educational Resources Information Center
D'Eon, Marcel; Sadownik, Leslie; Harrison, Alexandra; Nation, Jill
2008-01-01
An accepted gold standard for measuring change in participant behavior is third-party observation. This method is highly resource intensive, and many small-scale evaluations may not be in a position to use this approach. This study was designed to assess the validity and reliably of aggregated group self-assessments as one way to measure workshop…
The Infant Motor Profile: A Standardized and Qualitative Method to Assess Motor Behaviour in Infancy
ERIC Educational Resources Information Center
Heineman, Kirsten R.; Bos, Arend F.; Hadders-Algra, Mijna
2008-01-01
A reliable and valid instrument to assess neuromotor condition in infancy is a prerequisite for early detection of developmental motor disorders. We developed a video-based assessment of motor behaviour, the Infant Motor Profile (IMP), to evaluate motor abilities, movement variability, ability to select motor strategies, movement symmetry, and…
Development and validation of the Myasthenia Gravis Impairment Index
Bril, Vera; Kapral, Moira; Kulkarni, Abhaya; Davis, Aileen M.
2016-01-01
Objective: We aimed to develop a measure of myasthenia gravis impairment using a previously developed framework and to evaluate reliability and validity, specifically face, content, and construct validity. Methods: The first draft of the Myasthenia Gravis Impairment Index (MGII) included examination items from available measures enriched with newly developed, patient-reported items, modified after patient input. International neuromuscular specialists evaluated face and content validity via an e-mail survey. Test–retest reliability was assessed in stable patients at a 3-week interval and interrater reliability was evaluated in the same day. Construct validity was assessed through correlations between the MGII and other measures and by comparing scores in different patient groups. Results: The first draft was assessed by 18 patients, and 72 specialists answered the survey. The second draft had 7 examination and 22 patient-reported items. Field testing included 200 patients, with 54 patients completing the reliability studies. Test–retest reliability of the total score was good (intraclass correlation coefficient 0.92; 95% confidence interval 0.79–0.94), as was interrater reliability of the examination component (intraclass correlation coefficient 0.81; 95% confidence interval 0.79–0.94). The MGII correlated well with comparison measures, with higher correlations with the MG–activities of daily living (r = 0.91) and MG-specific quality of life 15-item scale (r = 0.78). When assessing different patient groups, the scores followed expected patterns. Conclusions: The MGII was developed using a patient-centered framework of myasthenia-related impairments and incorporating patient input throughout the development process. It is reliable in an outpatient setting and has demonstrated construct validity. Responsiveness studies are under way. PMID:27402891